[Dev Catch Up # 93] - GPT-5.2-Codex, ChatGPT Images, Gemini 3 Flash, Grok Voice Agent API, Google's A2UI, Claude-mem - Memory for Claude Code, Mistral OCR 3, Runme - Runnable Markdown and much more!
Bringing devs up to speed on the latest dev news from the trends including, a bunch of exciting developments and articles
Welcome to the 93rd edition of DevShorts, Dev Catch Up!
For those who joined recently or are reading Dev Catch Up for the first time, I write about developer stories and open source, partly based on my work and experience interacting with people all over the globe.
Thanks for reading Dev Shorts! Subscribe for free to receive new posts and support my work.
Some recent issues from Dev Catch up:
Join 8500+ developers to hear stories from Open source and technology.
Must Read
OpenAI has launched ChatGPT Images. It generates images faster and performs precise edits. It is now available via API as GPT Image 1.5. Alongside this, OpenAI also released GPT 5.2 Codex an agentic coding model built for long running tasks. Check both announcements for more details.
Google has released Gemini 3 Flash. It is the latest model in Gemini series built for speed with low cost. It is mainly for coding, agents, and everyday tasks. Gemini 3 Flash is now available across the Gemini app, API, and Vertex AI. Check Google’s Gemini 3 Flash announcement for more details.
xAI has launched the Grok Voice Agent API. It helps developers build voice agents with multilingual support, real time search, and tool calling. It uses the same tech that powers voices in Tesla Vehicles. It costs $0.05 per minute. Worth checking out if you are building voice apps.
Mistral has released Mistral OCR 3 for document parsing. It works better on forms, tables, handwriting, and low-quality scans. It outputs clean text or structured data. It costs about $1 per thousand pages with batch usage. Check Mistral OCR 3 for more details.
OSS Highlight of the Week
This week we are featuring Claude Mem. It gives Claude Code long term memory across sessions and restores useful context automatically. You can search past memory from Claude Desktop. It also includes privacy controls to exclude sensitive data and lets you control what context gets injected. If you are using Claude Code, check the Claude Mem GitHub Repo.
Good to know
I came across Runme DevOps Notebook. It lets you run shell commands and code blocks directly from Markdown files like READMEs. It supports many runtimes including Shell, Python, Ruby, JavaScript, TypeScript, PHP, and more. Check the Runme GitHub Repo for more details.
This ByteByteGo visual shows different database types for different needs. It includes vector search, graph databases, time series, in memory, and blob storage. It helps you understand which database fits which use case. Check the page to see the databases you should know by 2025.
I came across the article that explains agentic workflow patterns. It shows why prompts alone are not enough. It also explains how structure, iteration, and coordination help agents work more reliably. Check the post on Top AI Agentic Workflow Patterns to know more.
All AI tools work based on context. Context engineering has become important as agents handle longer tasks. This Substack post explains the anatomy of context, context retrieval strategies, and ways to manage long running workflows. Read Context Engineering 101 to learn more.
Notable FYIs
Black Forest Labs has released FLUX.2 max. It is their latest model for image generation. It focuses on consistent characters, precise edits, and clean visuals. Useful if you need reliable image outputs for production.
Google Labs has launched CC Agent. It is an AI agent that connects Gmail, Calendar, and Drive to give you a daily briefing. You can also email CC to get help during the day. Join the waitlist if you want a simple AI assistant for daily planning.
Alibaba has released Wan 2.6. It is a video generation model. It supports text to video, role consistency, audio video sync, and 1080p output. It can generate video up to 15s long. Worth checking if you are exploring video generation models.
Google has open sourced A2UI. It’s a protocol that lets AI agents send interactive UI elements like forms and cards. Instead of sending HTML or code, agents send simple JSON descriptions that apps render using their own native components. Check Google’s A2UI protocol for more details.
Meta has released SAM Audio. It is an AI model that can isolate sounds from complex audio using text, visual, or time-based prompts. You can extract things like vocals, instruments, or background noise with simple inputs. Useful for audio editing and podcasts related workflows.
A few months back, OpenAI released the Apps SDK to build apps for ChatGPT. Now developers can submit ChatGPT apps for review. It will be published in the ChatGPT app directory. If you want your app to reach ChatGPT users, you can submit it now.
That’s it from us with this edition. We hope you are going away with a ton of new information. Lastly, share this newsletter with your colleagues and pals if you find it valuable. A subscription to the newsletter will be awesome if you are reading it for the first time.



Nice roundup. The Claude Mem project caught my attention because context loss between coding sessions has been a real pain point for me lately. When switching between multiple projects, having to manually re-explain the arhitecture and conventions every time adds up. The privacy controls seem smartly designed too, especially for excluding sensitive API keys or client data from persistant memory. Curious how it handles context prioritization when you hit token limits tho, does it use recency or some kind of relevance scoring?