[Dev Catch Up #26] - LLM training at Meta, iOS 18 beta release, AI security with PCC, and more.
Bringing devs up to speed on the latest dev news from the trends including, a bunch of exciting developments and articles.
Welcome to the 26th edition of DevShorts, Dev Catch Up!
I write about developer stories and open source, partly from my work and experience interacting with people all over the globe.
Some recent issues from Dev Catch up:
Join 1000+ developers to hear stories from Open source and technology.
Must Read
The rise of GenAI and AI as a whole has led organizations to spend enormous amounts of time in AI research and model training. Training Large Language Models are not easy as they require computations on an extensive scale. Meta is on the forefront with their focus on AI research and development to solve increasingly complex problems. Despite their learnings, they faced this significant challenge on computation. Traditionally at Meta, AI model training involves training a large number of models requiring a comparatively small number of GPUs that ingest vast amounts of data to make accurate recommendations powering Meta products. But with the introduction of GenAI, the shift to fewer but incredibly large jobs is a reality and to facilitate and support the advent of GenAI at scale, there is a notable improvement in the hardware, software, and network infrastructure. Learn more on how Meta trains large language models at scale from this article published by the Meta engineering team, where the shortcomings of large-scale model training are discussed alongside the improvement in the infrastructure stack in detail.Â
iOS 18 is finally available with public preview and with this update Apple unveiled Apple Intelligence among public use. Although late in the race, Apple is trying to bridge the gap with other tech giants in terms of AI and ML. In this update, the photo application got a new UI redesign with the usage of AI to organize photos into trips and albums. Apple Intelligence charms with an updated Siri with better model training. Apple’s own browser Safari comes with the AI update to summarize news and other web articles, which will include generation of tables of contents. This update comes with ChatGPT integrated with iOS, since the Apple and OpenAI deal on the advancement of AI. Get a closer and in depth look at this update from the article published by CNBC, detailing the different features and the UI and UX improvements that each customer will receive with the installation of the update.
AI security is one of the topmost subjects of interest with the rapid advancement and usage of AI in different sectors of the industry. There are formidable challenges with the private AI processing in the cloud. For advanced features to reason over complex data with large complex foundation ML models, the powerful AI hardware can fulfill a user’s request but it will need unencrypted access to the user’s request and accompanying personal data. Apple recently launched Apple Intelligence, the personal intelligence system that brings power generative models to compact Apple products. They also faced the same problem and to tackle that, they introduced Private Cloud Compute or PCC. It is a groundbreaking cloud intelligence system which is designed specifically for private AI processing. It will make sure that the user data sent to PCC will not be accessible to anyone other than the user. With custom Apple silicon and a privacy hardened operating system, PCC is the most advanced security architecture ever deployed for cloud AI compute at scale. Get to know more about PCC from this article published by the Apple Security team, where they discussed various bits on the cloud intelligence system.
Now, we will head over to some of the news and articles that will be at a place of interest for developers and the tech community out there.
Good to know
OpenTelemetry Community Day has recently concluded and the conference covered a lot as usual with a lot of small, intimate, and interactive sessions. The event kicked off with opening remarks highlighting a bunch of data on the accomplishments of the community till date and the goals to be met for the rest of the year. There is profound discussion on using native OpenTelemetry Instrumentation to make better client libraries. A number of case-studies are presented on this topic that depicts the usage of the technology in different scenarios along with the outcome. Another amazing talk was on tuning the OTel collector performance through profiling and the session on Python exemplars in OTel sheds light on the usage of exemplars on metrics that serve as a great bridge and trace onboarding experience for devs, who are unfamiliar or skeptical with tracing. Lastly, there is the session on how OTel can help if something goes wrong with a GraphQL Query, which vividly discusses the power of tracing. Take knowledge on all of these sessions from this article published by Paigerduty, where slides for each session have been linked and explained with detail.Â
The use of LLMs is growing rapidly and to sustain this rapid growth, the supporting technological infrastructure is also improving over time. CUDA Kernel is one such programmed function that plays a critical role in enhancing the performance and efficiency of Large Language Models. These functions allow tasks to be executed in parallel on thousands of GPU cores, resulting in the significant increase of computational speed. FireAttention is a custom CUDA kernel that is optimized for Multi-Query Attention Models. It has specific optimization for F16 and F18 support in the new hardware, helping it run closely with the hardware memory bandwidth limit during generation for various batch sizes and sequence lengths. The latest addition in this family is FireAttention V2 which focuses particularly on making the long context LLM use-cases practical. This kernel pushes the boundary of both quality and inference speeds to enhance the performance of long context LLMs. Learn more about FireAttention V2 from this article published by the Fireworks AI team, in which they have explained the CUDA kernel in detail with different model use-cases and interactive graphs showing the improvements in throughput and latency.
In Site Reliability Engineering, load shedding refers to the concept of protecting the services from becoming overwhelmed due to excessive loads. This ensures the stability and the reliability of the system under high traffic conditions. The process involves the intentional dropping or deferring of less critical requests when the system faces heavy load which helps in preserving the performance and availability of critical services. Netflix has long been using the technique of prioritized load shedding to increase the reliability of the whole system. Now, they are bringing this strategy to individual service level, with the focus on video streaming control plane and data plane. This will further enhance the user experience and system resilience. Learn more on how Netflix is taking on prioritized load shedding on individual service level through this article posted by the engineering team of Netflix, where they have discussed the technique in detail with real-world examples.
Celebrating new open-source tools and projects is a tradition in our everyday issue and there is no exception to that. In this issue, we are focusing on SmoothMQ, the drop-in replacement for SQS or Simple Queue Service that has raked a large number of stars and is shining bright in the trending page. It comes with a smooth developer experience. It provides a functional UI with observability, tracing, message scheduling, and rate-limiting. Also, it lets you run a private SQS instance on any cloud provider. Have a look at the project created by Jay Goel from the GitHub repository here and leave a star to support the project.
Lastly, we will take a look at some of the trending scoops that hold a special mention for the community.Â
Notable FYIs
Developing GenAI applications is becoming the new norm with the extensive rise of artificial intelligence and to facilitate this, model selection becomes one of the most important aspects aiding the process. This is a challenging task, considering there are over 90,000 text-generation models in the market. Model selection should be based on outcomes, constraints, and success measures of the model, while also considering the architectural choices, costs, flexibility, privacy, and scalability while selecting a language model. This youtube video from OctaAI gives you a detailed understanding on how you can select the right LLM for your GenAI application by showing real-life examples that show the impact of fine-tuning models on lowering costs and improving performance.
Recognizing good LLM models comes from a rigorous process of evaluation by means of competing with LLMs of the same nature, topping the arena, or topping the leaderboard. All of these processes ultimately boil down to the conclusion that whether the model is beating the benchmark. While leaderboard and arena topping is a good way to measure models in terms of capability, competing with selected hyped models of the category might not be a good way to measure the models. Clementine Fourier, the lead maintainer of Hugging Face’s OpenLLM Leaderboard along with Latent Space, talks about why it is not a good idea to use LLMs as judges to measure the model capability in their latest podcast.
AI engineering is developing with rapid iterations and developers are trying to fit in and adapt by following some of the set trends of the market. Here is a detailed article from The NewStack that talks about three trends for AI engineering in the cloud that developers have adopted throughout the halfway of 2024.Â
Git is the default version control that every developer uses on a daily basis and knowing some of its most useful commands will help a developer immensely. Here is an explanatory article from the education team at GitHub where explanation and usage of the 12 commonly used Git commands is given.
A historic day for the Indian cloud-native community soared with the announcement of the KubeCon and CloudnativeCon late last year. This year, it's happening during the period of 11 to 12th December. If you want to deliver a session spreading your take on the Kubernetes and cloud native technology, you can do that by filling up the Call for Proposal from the official CNCF KubeCon India website here.
That’s it from us with this edition. We hope you are going away with a ton of new information. Lastly, share this newsletter with your colleagues and pals if you find it valuable and a subscription to the newsletter will be awesome if you are reading for the first time.