Essential GitHub Repositories for AI Engineers
A collection of GitHub repos for AI engineers
GitHub has become the go-to place for learning and building with AI. Developers open source their work, share frameworks, and publish research code that others can use right away.
In this blog, I’ve collected must-know GitHub repos grouped by category. These include everything from LLM fundamentals to RAG, MCP, Agents, Agentic frameworks, and coding agents that can sharpen your AI development journey. These repos are building blocks for any AI engineer looking to learn and build.
1. LLM Repos
LLMs-from-scratch: This repo shows how to build and train GPT-style models. It explains the process step by step with clear code.
Hands-On-Large-Language-Models: This repo has code for practical LLM tasks. It covers text classification, search, clustering, embeddings, and fine-tuning.
llm-course: This repo has an LLM course. It includes roadmaps and Colab notebooks covering fundamentals, fine-tuning, quantization, and deployment.
awesome-generative-ai-guide: This repo serves as a GenAI hub. It provides updates on research, interview prep, course material, and app-building guides.
Awesome-LLM: This repo collects papers and resources about LLMs. It also includes frameworks, tools, and benchmarks.
nanoGPT: This repo is by Andrej Karpathy. It gives a simple way to train and fine-tune medium-sized GPTs with clean code.
LLM-engineer-handbook: This repo has resources for AI engineers. It covers training, serving, fine-tuning, and running LLMs in production.
learn-ai-engineering: This repo is a resource for beginners. It teaches AI, LLMs, agents, prompts, and fine-tuning from scratch.
2. MCP Repos
Model Context Protocol: This repo has the base MCP implementation from Anthropic. It is the open standard to connect AI applications with external tools and data.
Awesome MCP Servers: This repo has a curated list of MCP servers. These servers help you connect MCP clients with external tools.
Awesome MCP Clients: This repo lists different MCP clients. You can explore them and see how they work with servers.
mcp-use: This repo provides a Python library called mcp-use. It connects any LLM with any MCP server.
MCP Containers: This repo gives you containerized MCP servers. They are simple to run and deploy.
mcp-ui: This repo has UI components for MCP. Servers can serve them, and clients can render them for interactive use.
MCPHost: This repo provides a CLI tool called MCPHost. It lets you run MCP servers and connect models like Claude, OpenAI, Gemini, and Ollama with external tools.
3. Agents Repos
AI Agents for Beginners: This repo has lessons with code examples. It helps you get started with building AI agents.
GenAI Agents: This repo provides implementations of AI agents. It covers basics, LangGraph workflows, multi-agent systems, and advanced applications.
Awesome AI Agents: This repo offers a curated list of AI agents. It includes projects across different categories and industries.
Prompt Engineering Guide: This repo has guides, papers, and notebooks. It focuses on prompt engineering techniques with large language models.
System Prompts and Models of AI Tools: This repo exposes system prompts and tools used by popular platforms. It covers Cursor, Claude Code, Lovable, and others.
500 AI Agents Projects: This repo contains over 500 AI agent projects. It has examples from healthcare, finance, education, retail, and more.
Agents Towards Production: This repo has step-by-step tutorials. It shows how to build GenAI agents that are ready for production.
Awesome AI Apps: This repo includes tutorials and examples. It shows how to build LLM-powered apps from chatbots to advanced agents.
AI Engineer Toolkit: This repo gives you projects and resources. It helps build production-grade AI apps with popular frameworks and tools.
4. Coding Agents Repos
Claude Code: This is Anthropic’s official repo. It is a popular coding agent for terminal-based AI assistance.
OpenAI Codex: This repo is OpenAI’s coding agent. It runs locally in your terminal and works as an alternative to cloud-based coding assistants.
Gemini CLI: This repo is Google’s command-line tool. It brings Gemini AI into your terminal for code analysis and automation.
OpenManus: This is an open-source coding agent. Developers use it when they want full control without vendor lock-in.
Goose: This repo is Block’s coding agent. It is capable of automating complex development tasks from start to finish.
opencode: An AI coding agent built for the terminal. It is open source and works with any provider or model via a simple client interface.
Crush: This repo is Charm’s coding agent. Developers like it for its clean interface and smooth integration with local tools.
Cline: This is a VS Code extension. It works as an autonomous coding agent that creates files, runs commands, and handles complex workflows.
Forge: This repo is an AI pair programming tool. It supports more than 300 models and allows multi-provider flexibility.
Open Deep Research: This repo is from LangChain. It is an open-source research agent. It works across many model providers, search tools, and MCP servers.
Void: It is an open-source coding editor. It is a free alternative to Cursor with checkpoint visualization.
Awesome Cursor Rules: This repo is a collection of config files. Cursor users rely on them to customize their editor experience.
Awesome Claude Code: It is community curated. It has Claude Code workflows and commands that improve productivity.
5. RAG Repos
RAG Techniques: It covers different advanced methods that boost retrieval and generation.
RAG From Scratch: This repo shows how to build RAG step by step with clear notebooks.
RAG Anything: It is an All-in-One RAG Framework. You get a flexible framework packed with tools.
RAG Time: Microsoft designed this as a 5-week Learning Journey to Mastering RAG.
FlashRAG: The toolkit includes datasets, algorithms, and a GUI for fast RAG research.
6. LLM Framework Repos
LangChain: A popular framework repo for building with LLMs. It supports agents, RAG, apps, memory, and many integrations.
LlamaIndex: Another popular framework repo for working with LLMs. It helps build agents, RAG systems, and LLM applications.
Haystack: It is an AI orchestration framework to build customizable, production-ready LLM applications.
Ollama: This is a framework to run open-source LLMs locally on your machine.
llama.cpp: This repo enables running LLMs locally using C and C++. It makes inference faster and lighter, even on modest hardware.
Unsloth: An open-source framework for LLM fine-tuning and reinforcement learning. It makes training faster and more efficient.
Guidance: A guidance language for controlling large language models.
DSPy: An open-source Python framework. It helps optimize prompts and modules automatically.
Transformers: It is the core library for pretrained models with APIs for training, fine tuning, and deployment.
7. Agentic Framework Repos
LangGraph: An orchestration framework for building, managing, and deploying long-running, stateful agents.
OpenAI Agent SDK: A framework from OpenAI. It supports building multi-agent workflows.
AutoGen: This is Microsoft’s agent framework. It helps create multi-agent AI applications.
SmolAgents: A lightweight framework from Hugging Face. It lets you build and run agents with minimal setup.
CrewAI: A framework for orchestrating AI agents. It is fast and flexible for building multi-agent workflows.
Conclusion
This blog highlights key GitHub repos every AI engineer should know. The list includes LLMs, RAG, MCP, agents, frameworks, and coding agents.
Each repo offers a chance to learn or build something useful. Some help you start small, while others let you scale into real apps.
Bookmark this list and refer back to it as you progress in your AI journey.
Happy Learning!