Agent RulesAgent Rules
Builder
Options
Browse all rules by language and framework
Templates
Pre-built rule sets ready to use
Popular Rules
Top community-ranked rules leaderboard
GuidesAnalyzePricingContact
Builder
OptionsTemplatesPopular Rules
GuidesAnalyzePricingContact

Product

  • Builder
  • Templates
  • Browse Rules
  • My Library

Learn

  • What are AI Agent Rules?
  • Guides
  • FAQ
  • About

Resources

  • Terms
  • Privacy Policy
  • Pricing
  • Contact
  • DMCA Policy

Support

Help keep this project free.

Agent RulesAgent Rules Builder
© 2026 Aurora Algorithm Inc.
Back to Templates

Python + LangChain

Rules for building LLM-powered applications with LangChain. Covers chains, agents, retrieval-augmented generation, vector stores, prompt engineering, and production deployment.

pythonPython/langchainLangChain
python
langchain
llm
rag
agents
ai
Customize in Builder

Details

Language
pythonPython
Framework
langchainLangChain

Rules Content

AGENTS.md
Edit in Builder

Python + LangChain Agent Rules

Project Context

You are building LLM-powered features with LangChain — chains, RAG pipelines, tool-calling agents, and conversational workflows. Use LangChain Expression Language (LCEL) for all chain composition. Keep prompts versioned, LLM calls observable, and agent execution bounded.

Code Style & Structure

- Use Python 3.12+ type hints. Use Pydantic v2 models for all input/output schemas.
- Follow PEP 8. Format with `ruff format`. Lint with `ruff check`.
- Prefer `async def` for I/O-bound LLM and retrieval calls. Use `asyncio.gather` for parallel retrieval.
- Keep chain definitions (`src/chains/`) separate from API handlers, CLI scripts, and orchestration logic.
- Store prompt templates in `src/prompts/` as dedicated modules. Never define prompts inline in route files.
- Keep one tool per file in `src/tools/`. Tool docstrings must be precise — the LLM reads them to decide when to call the tool.

Project Structure

```
src/
chains/ # LCEL chain definitions per use case
prompts/ # ChatPromptTemplate definitions + few-shot examples
tools/ # @tool decorated functions, one per file
retrieval/
indexing.py # Document loading, chunking, embedding, upsert
retrieval.py # Retriever construction, hybrid search, reranking
schemas/ # Pydantic output models for structured LLM responses
config.py # Settings(BaseSettings) for model names, temperatures
callbacks/ # Custom callback handlers for observability
```

Chain Composition (LCEL)

- Build all chains with the pipe operator: `chain = prompt | model | output_parser`.
- Use `RunnableParallel` for concurrent branches: `RunnableParallel(context=retriever, question=RunnablePassthrough())`.
- Use `RunnablePassthrough.assign(key=fn)` to add computed keys to the chain's running dict.
- Add `.with_fallbacks([backup_chain])` on production chains. LLM API errors must not surface as 500s.
- Add `.with_retry(stop_after_attempt=3, wait_exponential_jitter=True)` on model invocations for transient errors.
- Use `.with_structured_output(PydanticModel)` to enforce typed LLM responses. Always validate with Pydantic.
- Never use legacy `LLMChain`, `ConversationChain`, or `SequentialChain` — they are superseded by LCEL.

Prompt Engineering

- Define all prompts with `ChatPromptTemplate.from_messages([(role, template), ...])`.
- Use `MessagesPlaceholder('history')` for conversation history injection.
- Never concatenate user input directly into prompt strings — always use template variables `{variable}`.
- Version prompt templates in code. Include a comment header with: purpose, expected inputs, output format.
- Define few-shot examples in `FewShotChatMessagePromptTemplate` with a `SemanticSimilarityExampleSelector`.
- Keep system prompts focused: define the AI's role, constraints, and exact output format.

RAG & Retrieval

- Use `RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=64)` as the default. Tune per document type.
- Embed and store in a persistent vector store (pgvector, Pinecone, Chroma) — never in-memory for production.
- Build retrievers with `.as_retriever(search_type='mmr', search_kwargs={'k': 5, 'fetch_k': 20})` for diversity.
- Implement hybrid search with `EnsembleRetriever([bm25_retriever, vector_retriever], weights=[0.4, 0.6])`.
- Always add a reranking step: use a cross-encoder or `CohereRerank` before passing docs to the LLM.
- Add metadata to documents at indexing time (source, date, section). Use it for retrieval filtering.
- Use `create_retrieval_chain(retriever, question_answer_chain)` for standard RAG. Return source documents.

Memory & Conversation History

- Use `RunnableWithMessageHistory` for conversational chains. Store history in Redis or PostgreSQL — never in memory.
- Use `trim_messages(messages, max_tokens=4000, token_counter=model)` to fit history within context window.
- Pass explicit `session_id` to support multiple concurrent conversations per user.
- For long-running sessions, summarize old messages with `ConversationSummaryBufferMemory` periodically.

Agents & Tools

- Use `create_react_agent` or `create_tool_calling_agent` (OpenAI function calling). Avoid legacy `initialize_agent`.
- Define tools with `@tool` decorator. The docstring is the tool description the model receives — be precise.
- Use Pydantic models as `args_schema` on tools for structured, validated tool inputs.
- Limit the agent's toolset to 5–10 focused tools. More tools degrade selection accuracy.
- Set `max_iterations=10` and `handle_parsing_errors=True` on `AgentExecutor`. Unbounded agents are a production risk.
- Validate tool inputs before execution. Sanitize tool outputs before passing them back to the agent.

Error Handling

- Catch `OutputParserException` and retry with a repair prompt: add the failed output and a correction instruction.
- Handle `langchain_core.exceptions.LangChainException` subclasses for rate limits, API errors, and context length exceeded.
- Log full invocation traces with LangSmith or a custom `BaseCallbackHandler` that records input, output, and latency.
- Validate structured outputs against the expected Pydantic schema before returning them downstream.

Cost & Observability

- Log `token_usage` from every LLM response in a callback handler. Aggregate cost per pipeline, user, and day.
- Use small/fast models (GPT-4o-mini, Claude Haiku) for classification, routing, and structured extraction.
- Reserve large models (GPT-4o, Claude Opus) for complex reasoning tasks. Default to the smaller model first.
- Cache deterministic LLM calls with `set_llm_cache(SQLiteCache('.cache.db'))` in development.
- Set `max_tokens` on every model call to prevent unexpectedly long, costly completions.

Testing

- Use `FakeListChatModel` or `FakeListLLM` for unit tests. Test chain logic, not LLM behavior.
- Test retrieval quality: measure recall@5 on a golden document-question dataset.
- Use `pytest-asyncio` for async chain tests. Test with `await chain.ainvoke(inputs)`.
- Mock vector store calls in retrieval unit tests. Test the full RAG pipeline integration-style against a small corpus.

Related Templates

python

Python + FastAPI

High-performance Python API development with FastAPI, Pydantic, and async patterns.

python

Python + Django

Django web development with class-based views, ORM best practices, and DRF.

python

Python + Flask

Lightweight Python web development with Flask and extensions.