Python + LangChain

Rules for building LLM-powered applications with LangChain. Covers chains, agents, retrieval-augmented generation, vector stores, prompt engineering, and production deployment.

Python/

LangChain

Details

Language

Python

Framework

LangChain

Rules Content

AGENTS.md

Edit in Builder

Python + LangChain Agent Rules

Project Context

You are building LLM-powered features with LangChain — chains, RAG pipelines, tool-calling agents, and conversational workflows. Use LangChain Expression Language (LCEL) for all chain composition. Keep prompts versioned, LLM calls observable, and agent execution bounded.

Code Style & Structure

- Use Python 3.12+ type hints. Use Pydantic v2 models for all input/output schemas.

- Follow PEP 8. Format with `ruff format`. Lint with `ruff check`.

- Prefer `async def` for I/O-bound LLM and retrieval calls. Use `asyncio.gather` for parallel retrieval.

- Keep chain definitions (`src/chains/`) separate from API handlers, CLI scripts, and orchestration logic.

- Store prompt templates in `src/prompts/` as dedicated modules. Never define prompts inline in route files.

- Keep one tool per file in `src/tools/`. Tool docstrings must be precise — the LLM reads them to decide when to call the tool.

Project Structure

```

src/

chains/ # LCEL chain definitions per use case

prompts/ # ChatPromptTemplate definitions + few-shot examples

tools/ # @tool decorated functions, one per file

retrieval/

indexing.py # Document loading, chunking, embedding, upsert

retrieval.py # Retriever construction, hybrid search, reranking

schemas/ # Pydantic output models for structured LLM responses

config.py # Settings(BaseSettings) for model names, temperatures

callbacks/ # Custom callback handlers for observability

```

Chain Composition (LCEL)

- Build all chains with the pipe operator: `chain = prompt | model | output_parser`.

- Use `RunnableParallel` for concurrent branches: `RunnableParallel(context=retriever, question=RunnablePassthrough())`.

- Use `RunnablePassthrough.assign(key=fn)` to add computed keys to the chain's running dict.

- Add `.with_fallbacks([backup_chain])` on production chains. LLM API errors must not surface as 500s.

- Add `.with_retry(stop_after_attempt=3, wait_exponential_jitter=True)` on model invocations for transient errors.

- Use `.with_structured_output(PydanticModel)` to enforce typed LLM responses. Always validate with Pydantic.

- Never use legacy `LLMChain`, `ConversationChain`, or `SequentialChain` — they are superseded by LCEL.

Prompt Engineering

- Define all prompts with `ChatPromptTemplate.from_messages([(role, template), ...])`.

- Use `MessagesPlaceholder('history')` for conversation history injection.

- Never concatenate user input directly into prompt strings — always use template variables `{variable}`.

- Version prompt templates in code. Include a comment header with: purpose, expected inputs, output format.

- Define few-shot examples in `FewShotChatMessagePromptTemplate` with a `SemanticSimilarityExampleSelector`.

- Keep system prompts focused: define the AI's role, constraints, and exact output format.

RAG & Retrieval

- Use `RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=64)` as the default. Tune per document type.

- Embed and store in a persistent vector store (pgvector, Pinecone, Chroma) — never in-memory for production.

- Build retrievers with `.as_retriever(search_type='mmr', search_kwargs={'k': 5, 'fetch_k': 20})` for diversity.

- Implement hybrid search with `EnsembleRetriever([bm25_retriever, vector_retriever], weights=[0.4, 0.6])`.

- Always add a reranking step: use a cross-encoder or `CohereRerank` before passing docs to the LLM.

- Add metadata to documents at indexing time (source, date, section). Use it for retrieval filtering.

- Use `create_retrieval_chain(retriever, question_answer_chain)` for standard RAG. Return source documents.

Memory & Conversation History

- Use `RunnableWithMessageHistory` for conversational chains. Store history in Redis or PostgreSQL — never in memory.

- Use `trim_messages(messages, max_tokens=4000, token_counter=model)` to fit history within context window.

- Pass explicit `session_id` to support multiple concurrent conversations per user.

- For long-running sessions, summarize old messages with `ConversationSummaryBufferMemory` periodically.

Agents & Tools

- Use `create_react_agent` or `create_tool_calling_agent` (OpenAI function calling). Avoid legacy `initialize_agent`.

- Define tools with `@tool` decorator. The docstring is the tool description the model receives — be precise.

- Use Pydantic models as `args_schema` on tools for structured, validated tool inputs.

- Limit the agent's toolset to 5–10 focused tools. More tools degrade selection accuracy.

- Set `max_iterations=10` and `handle_parsing_errors=True` on `AgentExecutor`. Unbounded agents are a production risk.

- Validate tool inputs before execution. Sanitize tool outputs before passing them back to the agent.

Error Handling

- Catch `OutputParserException` and retry with a repair prompt: add the failed output and a correction instruction.

- Handle `langchain_core.exceptions.LangChainException` subclasses for rate limits, API errors, and context length exceeded.

- Log full invocation traces with LangSmith or a custom `BaseCallbackHandler` that records input, output, and latency.

- Validate structured outputs against the expected Pydantic schema before returning them downstream.

Cost & Observability

- Log `token_usage` from every LLM response in a callback handler. Aggregate cost per pipeline, user, and day.

- Use small/fast models (GPT-4o-mini, Claude Haiku) for classification, routing, and structured extraction.

- Reserve large models (GPT-4o, Claude Opus) for complex reasoning tasks. Default to the smaller model first.

- Cache deterministic LLM calls with `set_llm_cache(SQLiteCache('.cache.db'))` in development.

- Set `max_tokens` on every model call to prevent unexpectedly long, costly completions.

Testing

- Use `FakeListChatModel` or `FakeListLLM` for unit tests. Test chain logic, not LLM behavior.

- Test retrieval quality: measure recall@5 on a golden document-question dataset.

- Use `pytest-asyncio` for async chain tests. Test with `await chain.ainvoke(inputs)`.

- Mock vector store calls in retrieval unit tests. Test the full RAG pipeline integration-style against a small corpus.

Related Templates

Python + FastAPI

High-performance Python API development with FastAPI, Pydantic, and async patterns.

Python + Django

Django web development with class-based views, ORM best practices, and DRF.

Python + Flask

Lightweight Python web development with Flask and extensions.

Back to Templates

Python + LangChain

Rules for building LLM-powered applications with LangChain. Covers chains, agents, retrieval-augmented generation, vector stores, prompt engineering, and production deployment.

Python/

LangChain

Details

Language

Python

Framework

LangChain

Rules Content

AGENTS.md

Edit in Builder

Python + LangChain Agent Rules

Project Context

Code Style & Structure

- Use Python 3.12+ type hints. Use Pydantic v2 models for all input/output schemas.

- Follow PEP 8. Format with `ruff format`. Lint with `ruff check`.

- Prefer `async def` for I/O-bound LLM and retrieval calls. Use `asyncio.gather` for parallel retrieval.

- Keep chain definitions (`src/chains/`) separate from API handlers, CLI scripts, and orchestration logic.

- Store prompt templates in `src/prompts/` as dedicated modules. Never define prompts inline in route files.

- Keep one tool per file in `src/tools/`. Tool docstrings must be precise — the LLM reads them to decide when to call the tool.

Project Structure

```

src/

chains/ # LCEL chain definitions per use case

prompts/ # ChatPromptTemplate definitions + few-shot examples

tools/ # @tool decorated functions, one per file

retrieval/

indexing.py # Document loading, chunking, embedding, upsert

retrieval.py # Retriever construction, hybrid search, reranking

schemas/ # Pydantic output models for structured LLM responses

config.py # Settings(BaseSettings) for model names, temperatures

callbacks/ # Custom callback handlers for observability

```

Chain Composition (LCEL)

- Build all chains with the pipe operator: `chain = prompt | model | output_parser`.

- Use `RunnableParallel` for concurrent branches: `RunnableParallel(context=retriever, question=RunnablePassthrough())`.

- Use `RunnablePassthrough.assign(key=fn)` to add computed keys to the chain's running dict.

- Add `.with_fallbacks([backup_chain])` on production chains. LLM API errors must not surface as 500s.

- Add `.with_retry(stop_after_attempt=3, wait_exponential_jitter=True)` on model invocations for transient errors.

- Use `.with_structured_output(PydanticModel)` to enforce typed LLM responses. Always validate with Pydantic.

- Never use legacy `LLMChain`, `ConversationChain`, or `SequentialChain` — they are superseded by LCEL.

Prompt Engineering

- Define all prompts with `ChatPromptTemplate.from_messages([(role, template), ...])`.

- Use `MessagesPlaceholder('history')` for conversation history injection.

- Never concatenate user input directly into prompt strings — always use template variables `{variable}`.

- Version prompt templates in code. Include a comment header with: purpose, expected inputs, output format.

- Define few-shot examples in `FewShotChatMessagePromptTemplate` with a `SemanticSimilarityExampleSelector`.

- Keep system prompts focused: define the AI's role, constraints, and exact output format.

RAG & Retrieval

- Use `RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=64)` as the default. Tune per document type.

- Embed and store in a persistent vector store (pgvector, Pinecone, Chroma) — never in-memory for production.

- Build retrievers with `.as_retriever(search_type='mmr', search_kwargs={'k': 5, 'fetch_k': 20})` for diversity.

- Implement hybrid search with `EnsembleRetriever([bm25_retriever, vector_retriever], weights=[0.4, 0.6])`.

- Always add a reranking step: use a cross-encoder or `CohereRerank` before passing docs to the LLM.

- Add metadata to documents at indexing time (source, date, section). Use it for retrieval filtering.

- Use `create_retrieval_chain(retriever, question_answer_chain)` for standard RAG. Return source documents.

Memory & Conversation History

- Use `RunnableWithMessageHistory` for conversational chains. Store history in Redis or PostgreSQL — never in memory.

- Use `trim_messages(messages, max_tokens=4000, token_counter=model)` to fit history within context window.

- Pass explicit `session_id` to support multiple concurrent conversations per user.

- For long-running sessions, summarize old messages with `ConversationSummaryBufferMemory` periodically.

Agents & Tools

- Use `create_react_agent` or `create_tool_calling_agent` (OpenAI function calling). Avoid legacy `initialize_agent`.

- Define tools with `@tool` decorator. The docstring is the tool description the model receives — be precise.

- Use Pydantic models as `args_schema` on tools for structured, validated tool inputs.

- Limit the agent's toolset to 5–10 focused tools. More tools degrade selection accuracy.

- Set `max_iterations=10` and `handle_parsing_errors=True` on `AgentExecutor`. Unbounded agents are a production risk.

- Validate tool inputs before execution. Sanitize tool outputs before passing them back to the agent.

Error Handling

- Catch `OutputParserException` and retry with a repair prompt: add the failed output and a correction instruction.

- Handle `langchain_core.exceptions.LangChainException` subclasses for rate limits, API errors, and context length exceeded.

- Log full invocation traces with LangSmith or a custom `BaseCallbackHandler` that records input, output, and latency.

- Validate structured outputs against the expected Pydantic schema before returning them downstream.

Cost & Observability

- Log `token_usage` from every LLM response in a callback handler. Aggregate cost per pipeline, user, and day.

- Use small/fast models (GPT-4o-mini, Claude Haiku) for classification, routing, and structured extraction.

- Reserve large models (GPT-4o, Claude Opus) for complex reasoning tasks. Default to the smaller model first.

- Cache deterministic LLM calls with `set_llm_cache(SQLiteCache('.cache.db'))` in development.

- Set `max_tokens` on every model call to prevent unexpectedly long, costly completions.

Testing

- Use `FakeListChatModel` or `FakeListLLM` for unit tests. Test chain logic, not LLM behavior.

- Test retrieval quality: measure recall@5 on a golden document-question dataset.

- Use `pytest-asyncio` for async chain tests. Test with `await chain.ainvoke(inputs)`.

- Mock vector store calls in retrieval unit tests. Test the full RAG pipeline integration-style against a small corpus.

Related Templates

Python + FastAPI

High-performance Python API development with FastAPI, Pydantic, and async patterns.

Python + Django

Django web development with class-based views, ORM best practices, and DRF.

Python + Flask

Lightweight Python web development with Flask and extensions.