[Dev Catch Up # 83] - OpenAI's Apps SDK, and AgentKit, Gemini 2.5 Computer Use, Samsung's Tiny Recursion Model, Debug like a pro, odiff, Tokenex, Jules API, Customize Claude Code with Plugins & more!
Bringing devs up to speed on the latest dev news from the trends including, a bunch of exciting developments and articles
Welcome to the 83rd edition of DevShorts, Dev Catch Up!
For those who joined recently or are reading Dev Catch Up for the first time, I write about developer stories and open source, partly based on my work and experience interacting with people all over the globe.
Thanks for reading Dev Shorts! Subscribe for free to receive new posts and support my work.
Some recent issues from Dev Catch up:
Join 8300+ developers to hear stories from Open source and technology.
Must Read
OpenAI announced several new tools at DevDay 2025. These include the Apps SDK for building and testing apps in ChatGPT, API access for SORA-2, and AgentKit, a visual canvas for building multi-agent workflows. There were other updates too. Check OpenAI’s DevDay post on X for full details.
Google DeepMind introduced the Gemini 2.5 Computer Use model. It can now interact with UI. It performs UI actions like clicking and typing through the computer_use tool. It’s built for web browsers and also shows promise on mobile UI tasks. Check the Gemini 2.5 Computer Use model for more details.
Samsung AI has released Tiny Recursion Model (TRM). It shows that small models can do big reasoning tasks. The 7M-parameter TRM beats larger models like DeepSeek R1, o3-mini, and Gemini 2.5 Pro on reasoning benchmarks. Check the Tiny Recursive Model’s GitHub page for full details.
Every developer debugs code, but not everyone does it well. Good debugging starts with simple habits like writing clear bug reports, using small failing tests, and improving how you trace issues. Check the “Debug Like a Pro” Substack post to learn more.
OSS Highlight of the Week
This week we are featuring TermEverything. It is a Linux CLI program that lets you run any GUI app inside your terminal, even over SSH. You can open browsers, games, or full desktops right from the command line. Check the TermEverything GitHub repo to know more about how it works.
Good to know
If you want to learn system design, this GitHub repo is a great place to start. It has a curated list of system design courses and books. You’ll also find case studies on real-world system architectures. Check the Best System Design Resources repo to explore more.
If you often compare screenshots or generated images, Odiff can save you time. It’s a fast image comparison tool that checks pixel-by-pixel differences in milliseconds with high accuracy. Check the Odiff GitHub repo to learn how it works.
I explore useful tools and frameworks in Go every week. Tokenex is an open-source Go library that helps fetch and refresh cloud credentials and tokens. It is used for multi-cloud credential management, token exchange, secure service auth. Check the Tokenex blog post to learn more.
If you’re working with high-frequency or real-time analytics, Arc might catch your eye. It’s a time-series data warehouse built for speed. It handles around 1.9 million records per second. Check the Arc GitHub repo for benchmarks and setup details.
Auditing AI models is slow. Petri makes it faster. It’s an open-source agent from Anthropic. It creates its own test environments and runs multi-turn audits. It uses human-like prompts and tools. It finds risky behavior in minutes. Check the Petri GitHub repo to learn more.
Notable FYIs
If you use Claude Code, this update is worth knowing. It now supports plugins. With plugins, you can bundle slash commands, hooks, and MCP servers for reuse and sharing. Check Anthropic’s “Customize Claude Code with Plugins” post for more details.
OpenAI released Guardrails Python. A library to enforce rules on LLM responses. So, you don’t get unsafe or off-topic output. You define a schema, and it validates or corrects responses automatically. Check the OpenAI Guardrails Python GitHub repo to explore how it works.
High availability is a must for production applications. Strategies like load balancing, failover, and auto-scaling help keep systems online and reliable at all times. If you are interested to know these techniques, check this Substack post on Top Strategies for High Availability.
Last week, we covered Jules Tools, the terminal support for the Jules coding agent. This week, there is another update. Jules is now available as an API. It gives access to all Jules's capabilities, from writing to reviewing code. Check Google’s post on the Jules API to learn how to get started.
That’s it from us with this edition. We hope you are going away with a ton of new information. Lastly, share this newsletter with your colleagues and pals if you find it valuable. A subscription to the newsletter will be awesome if you are reading it for the first time.