Unpacking LangChain: All You Need to Know to Get Started
Understanding LangChain basics from the point of view of a developer.
Large Language Models (LLMs) came into existence with the AI evolution and currently, they are the hottest trend in the technology market. While developing an AI application, LLMs become the most useful and to streamline the application development, we tend to use a bunch of frameworks that help in the process. In this article, we will focus on LangChain, one of such frameworks that helps in AI application development.
What is LangChain?
LangChain is a framework for developing applications that are powered by Large Language Models (LLMs). The framework consists of a number of packages.
To understand LangChain, you need to understand the following items:
LangChain architecture
LangChain components
LangChain working principle
Langchain Architecture
The langchain Architecture consists of a number of packages.
langchain-core: This package contains the base abstractions of different components and the ways to compose them together. The interfaces for core components are defined in this package without the unavailability of third-party integrations.
langchain: This is the main package and contains chains, agents, and retrieval strategies that make up the cognitive architecture of an application. With no third party integrations present, all chains, agents, and strategies are generic across all integrations.
langchain-community: This package contains all the third party integrations of various components of langchain which is maintained by the community. Dependencies in this package are made optional to keep the package as lightweight as possible.
LangGraph: This is an extension of langchain that helps build robust and stateful multifactor applications with LLM through modelling of steps as edges and nodes in a graph. It intregrates seamlessly with langchain but can be used on its own.
LangGraph-Cloud: This package is responsible for turning LangGraph applications into production-ready APIs and assistants.
LangServe: This package deploys LangChain chains as REST APIs.
LangSmith: A developer platform that helps in debugging, testing, evaluating, and monitoring LLM applications.
LangChain Components
The langchain
package consists of a number of components. They are listed below:
Chat Models: These are language models that use a sequence of messages as inputs and return chat messages as outputs. They support the assignment of specific roles for conversational messages which distinguishes those messages from system messages (messages from AI, users, instructions, etc.).
LLMs: Large Language Models that takes string as an input and returns string as the output. LangChain does not host any LLMs.
Messages: There are some language models that takes a list of messages as inputs and return a message as output. Messages have a
role
,content
, andresponse_metadata
property. Therole
describes the one saying the message, thecontent
describes the content of the message which can be a string or a list of dictionaries, and theresponse_metadata
contains additional metadata about the response.Prompt Templates: They help in translating user input and parameters into instructions for a language model. Prompt Templates help understand the model understand the context of the input and generate relevant language-based output. They take dictionary as an input and each key represents a variable in the prompt template to fill in. The two types of prompt templates are
StringPromptTemplates
andChatPromptTemplates
.Example Selectors: Including examples as part of the prompt is a common technique to achieve better performance of the model. Example Selectors are classes responsible for dynamic selection of examples and then formatting them into prompts.
Output Parsers: It is responsible for taking the output of a model and transform it to a more suitable format for performing downstream tasks. These are particularly useful during the generation of structured data from an LLM or to normalize output from chat models and LLMs.
Chat History:
ChatHistory
is a class in LangChain responsible for wrapping an arbitrary chain. This class keeps track of inputs and outputs of the underlying chain and append them as messages to the message database.Documents: An object in LangChain that contains information about some data. It contains two attributes:
page_content: str
implying that the content of the document is a string andmetadata: dict
, an arbitrary metadata that is associated with the document and can track document id, file name, etc.Document Loaders: These are classes that load document objects. Each
DocumentLoader
has their own specific parameters and can be invoked with the.load
method.Text Splitters: They help in splitting long document into smaller chunks to fit them into the model's context window. LangChain have built-in document transformers that make it easy to split, combine, filter, and otherwise manipulate documents.
Embedding Models: These create a vector representation of a piece of text. Representing a text in this way can help perform mathematical operations that allows searching of other pieces of texts that are similar in meaning.
Vector Stores: A vector store takes care of storing embedded data and perform vector search for the user. They can also store metadata about embedded vectors and support filtering on that metadata that allows more control over returned documents.
Retrievers: An interface that returns documents when there is an unstructured query. It is more general than a vector store and accept a string query as inputs and return a list of Documents as outputs.
Key-Value Stores: Stores data in a key-value format and having such form of storage for techniques like indexing and retrieval with multiple vectors per document or caching embeddings is extremely helpful.
Tools: They are utilities that are designed to be called by a model. Their inputs are designed to be generated by a model and outputs are designed to be passed back to models. With tools, models can control parts of a code or call out to external APIs. A tool consists of the
name
of the tool,description
that defines the work of the tool,JSON Schema
that helps in defining inputs to the tool, and afunction
. When bound to a model, thename
,description
, andJSON Schema
are provided as contexts to the model.Toolkits: A collection of tools that are designed to be used together for specific tasks with convenient loading methods. All toolkits expose a
get_tools
method that returns a list of tools.Agents: They are systems that use LangChain as a reasoning engine to help determine the actions to be taken along with the determining of inputs to those particular actions. The results from the actions are fed to the agents so that it can determine whether to conclude or perform more actions.
Callbacks: This is a system provided by LangChain that allows a developer to hook to the various stages of an LLM application. It is useful for logging, monitoring, streaming, and other tasks. Use the
callbacks
arguments that are available throughout the API to subscribe to these events.
LangChain Working Principle
The working principle of LangChain from a user's perspective is as follows:
In LangChain, the "chain" is responsible for creating a processing pipleline by putting AI actions together in order. Each action, or chain, is a necessary step in the pipeline for the completion of the set goal. To understand this, you need to understand the pattern of a Retrieval Augmented Generation (RAG) application. In this case, the pattern initiates with the user submitting a question. After that, an embedding gets created from the text followed by a search on the vector database to gain more context on the question. Afterwards, a prompt gets created by using the original question and the context obtained from the retrieval. Finally, the prompt is submitted to the LLM, which returns with a successful and intelligent completion of the prompt as a response. All of these steps in the application pattern need to happen in succession for the completion of the goal and in the case of an error, the entire processing stops.
The "chain" construct in LangChain attaches steps in a specific way with a specific configuration. All of its libraries follow the same construct making that makes it easy to move steps around and create powerful pipelines.
In Conclusion
LangChain is an open-source framework that helps in the development of AI applications. With its simplified and streamlined development process alongside robust customization of modules and agents, it facilitates efficient and productive environment for developers and caters around a versatile sector of the market.
Follow this detailed tutorial to get started with LangChain and explore the possibilities of development with this multipurpose framework. Also, subscribe to DevShorts for keeping yourself up-to-date with these useful articles and our weekly tech roundups!!