[Dev Catch Up #24] - A new GPT-4o level Function Calling model, Optimizing AI interference, NumPy 2.0, and more.

Bringing devs up to speed on the latest dev news from the trends including, a bunch of exciting developments and articles.

and

Jun 25, 2024

Welcome to the 24th edition of DevShorts, Dev Catch Up!

I write about developer stories and open source, partly from my work and experience interacting with people all over the globe.

Some recent issues from Dev Catch up:

Join 1000+ developers to hear stories from Open source and technology.

Must Read

The race for building the most powerful LLMs are in no mood of slowing down and small organizations are keeping up in the race and even beating the tech giants in this flaunt of technological superiority. Recently, the GPT-4o model has been unveiled by OpenAI and it has stunned the tech world with its features and performance of tasks. Moreover, it has performed amazingly in benchmarks. Now, Anthropic came out with its newest model, Claude 3.5 Sonnet, which is reportedly equal to or better than the likes of OpenAI’s GPT-4o or Google’s Gemini over a wide variety of tasks. The model has shown a good performance in the benchmark owing to the fact that the company has built a legitimate competitor in the space. The model is available to web and iOS users and will be available to developers as well. With the new model, Anthropic is also bringing a new feature named Artifacts which will allow the users to see, interact with, and edit the results of their Claude requests. This article from The Verge explains the new model in detail and sheds light into the new feature as well.
AI and ML has been introduced to the tech world with the sole purpose of making daily complex operations easy and doable. In that effort, LLMs will play a major role as AI researchers are concentrated on making a future where LLMs will enhance daily life, provide business productivity and entertainment, and help people with everything. All of these require highly efficient inference and Character.AI might have solved this issue. It has designed its model architecture, inference stack, and product with the aim to enable unique opportunities to optimize inference so that it can become cost-effective, scalable, and efficient. The organization has the ability to serve more than 20,000 inference queries per second and because of its development of a number of key innovations across its serving stack, it is able to sustainably serve models at this scale. This article from the research team of Character.AI tells more about efficient inference in detail and also gives a knowledge on the techniques and optimizations they have developed over the years.
AI is growing rapidly and as a result, the trend in the tech world is running deep. Developers are excited to experiment with different models and even build some with the readily available technologies. According to analysts, AI in the tech industry is here to stay and it is showing growth that will explode exponentially in the future. So, now is a good time for developers to build with Large Language Models or LLMs. There are provider APIs that make LLM more accessible not only to ML scientists or engineers but for everyone, to build intelligence into their products. While the barrier to entry for AI is lowered, creating effective products and systems still remains deceptively difficult. Many engineers struggle to find a starting point or even make the same mistakes multiple times. To reduce these shortcomings and help developers avoid mistakes and iterate faster, check out this state-of-the-art article written by the developers of AppliedLLMs, where they shared their knowledge and tips that will act as a practical guide for developers who are starting out or experienced with LLMs to build successful products with them.
LLMs are on the rise and certain abilities of LLMs are undergoing improvements in recent days. Improved model reasoning ability of the models is particularly valuable in the area of function calling. Function calling is the ability of the models to interact with external APIs or functions to perform specific tasks, enhancing their capabilities which will help them provide more accurate and actionable outputs. With improved function calling ability, Fireworks AI recently released their new model named Firefunction-v2. This model has better function calling capability for multiple functions than the other models around the same benchmark and with that, it can efficiently handle more complex functions. Based on the Llama 3 70B model, this model is faster and less expensive than the latest release of OpenAI’s GPT-4o. It is designed to handle real-world agentic use cases like chat assistants with its intelligence and generalizability. The whole thing involves extensive use of function calling alongside general chat and following of instructions. Learn more about this model from this article published by the engineering team of Fireworks.AI where they have talked about this model and its capabilities in detail and also showed how the model differed in benchmark performance along with its peers.

Now, we will head over to some of the news and articles that will be at a place of interest for developers and the tech community out there.

Good to know

If you are a Python developer, you must have used NumPy quite a lot in your everyday code. It is one of the fundamental packages for scientific computing in Python and recently, a new version of it, NumPy 2.0 got released with a bunch of upgrades and major changes. With the introduction of version 2.0, the Python API has gone through a thorough cleanup that makes it easier to learn and use NumPy. The new version brings out improved scalar promotion rules and also we see a new and powerful Dtype API and a new string dtype. Also, the new version gives windows compatibility enhancements and support for the python array API standard. Learn more about all the improvements and upgrades from this article published by the official NumPy organization where more information on the updates of the new version has been explained in detail.
Effective use of observability helps us understand our applications and systems in a much better light by helping us maintain the health of those applications and systems which results in better performance and availability. Anomaly detection is one of such crucial aspects of observability that helps in this process. Combined with observability, anomaly detection helps in proactive detecting of the issues within the system and helps them resolve effectively before it leaves an effect to the users or impact operations. The whole thing tends to manage your overall burden of cloud costs as well. Here is a well written article published by Karl Stoney, where he shared the importance of anomaly detection and how we can use Prometheus and Istio, two of the best Observability tools together to do anomaly detection on the response time of your operational services.
OpenTelemetry has changed the world of observability for the better forever and it has become the main topic during any discussion of OpenTelemetry in recent times. It is used for a whole range of observability operations from tracing to monitoring. Like this, it can also trace operations in gRPC streams. Tracing operations in gRPC streams are tricky and complex but you can do it with the help of OpenTelemetry. gRPC is an open-source remote procedure call framework developed by Google and is adopted by a bunch of organizations. It enables communication with the help of streaming. This article from Tracetest gives you a comprehensive guide where a system written in Go uses gRPC streams to send data to consumers and learn how to instrument it with traces using OpenTelemetry.
With all the hype in AI and the technological advancement happening in that space, it is high time that we focus on the work that has been done with AI in the open-source arena and I bet there’s a lot going on in that area without much light shedding into it. In this issue, we are highlighting a project that packs a nanoGPT pipeline in a spreadsheet. It was created to help with the understanding of the working principle of GPT. The mechanisms, calculations, matrices used in the spreadsheet are fully interactive and configurable. This will help the readers visualize the entire architectural structure and dataflow. The repository also contains resources for further learning on the matter. Check this small but interesting project out from the GitHub repository here and leave a star to show some support to the developer.

Lastly, we will take a look at some of the trending scoops that hold a special mention for the community.

Notable FYIs

Interested in attending a conference full of Kubernetes know-hows and filled with a plethora of technical expertise around the world of cloud-native? Then, KCD Munich is the conference for you to attend this summer. One of the best KCDs hosted over the years, this year too is filled with a number of amazing speakers. Book your tickets to experience the celebration of technology from the official CNCF site here.
AI, ML, and LLMs are in the top trends of technology these days and developers and tech enthusiasts get delighted with the new insights that come every week from this sector of the market. Interested folks might have come across the concept of context windows in LLM. It is the range or size of the text i.e. tokens, that the model can consider at once while generating responses or making predictions. Improvements in this context window can significantly improve a model’s reasoning capabilities with enhanced comprehension and improved information retrieval. This podcast from the Latent Space talks about the community’s losing faith in context windows for improved reasoning and how AI can be used as a partner in thought.
If you are a developer, you might have thought about building web applications in a production environment. To build those, having a fair bit of knowledge of the essentials is a must. ByteByteGo published a short article with an interactive diagram that discusses the 10 essential components available for a web application for production.
JavaScript has a ton of libraries but picking the right one for the right project is a daunting task. Choosing the best library by judging their performance and use-cases brings the best out of your projects by means of performance and efficiency. Here is an article published by The Newstack, which gives you the list of 10 JavaScript libraries to use in 2024 along with its features and use-cases.
When building websites, developers always aim to make them super-fast to provide great customer experience. If you are a developer who is building or wants to build a website, here is this short article from ByteByteGo which gives tips on loading a website at lightning speed through boosted frontend performance.

That’s it from us with this edition. We hope you are going away with a ton of new information. Lastly, share this newsletter with your colleagues and pals if you find it valuable and a subscription to the newsletter will be awesome if you are reading for the first time.

Dev Shorts

Discussion about this post