Beyond RAG: Andrej Karpathy’s Blueprint for LLM Knowledge Bases
- Andrej Karpathy, the co-founder of OpenAI and former Director of AI at Tesla, has introduced a new architectural approach to managing AI research and project memory called LLM...
- The system addresses a primary frustration in AI development known as the context-limit reset.
- For several years, RAG has been the standard for providing LLMs with proprietary data.
Andrej Karpathy, the co-founder of OpenAI and former Director of AI at Tesla, has introduced a new architectural approach to managing AI research and project memory called LLM Knowledge Bases
. On April 3, 2026, Karpathy described a system that utilizes an evolving Markdown library maintained by an AI, designed to bypass the limitations of traditional Retrieval-Augmented Generation (RAG).
The system addresses a primary frustration in AI development known as the context-limit reset. For developers using high-level LLM orchestration, hitting a token limit or ending a session can feel like a lobotomy
for a project, as the user must spend time and tokens re-explaining architectural nuances to a model that has lost its session memory.
The Engineering Shift from RAG to Markdown
For several years, RAG has been the standard for providing LLMs with proprietary data. This process involves chopping documents into chunks, converting them into mathematical vectors called embeddings, and storing them in databases such as Pinecone or Milvus. The system then uses cosine similarity search to retrieve relevant snippets.

Karpathy argues that vector search is a blunt instrument
that often misses precise structural relationships. In a large codebase, for example, a RAG system might retrieve snippets that sound similar but miss the specific line of logic governing the system. Karpathy’s alternative replaces opaque vector databases with human-readable Markdown (.md) files and explicit interlinking.
In this model, the LLM does not act as a search engine but as a research librarian
that actively authors and maintains a persistent record. This approach prioritizes structured knowledge over semantic similarity, making the resulting knowledge base auditable and transparent.
Three-Stage Knowledge Architecture
The architecture functions through three distinct operational stages:
- Data Ingest: Raw materials, including research papers, GitHub repositories, datasets, and web articles, are placed in a
raw/directory. Karpathy uses the Obsidian Web Clipper to convert web content into Markdown files, ensuring images are stored locally for the LLM to reference via vision capabilities. - The Compilation Step: The LLM reads the raw data and compiles it into a structured wiki. This includes writing summaries, identifying key concepts, authoring encyclopedia-style articles, and creating backlinks between related ideas.
- Active Maintenance: The system employs
linting
passes where the LLM scans the wiki for inconsistencies, missing data, or new connections, allowing the knowledge base to effectively heal itself.
By using Markdown as the source of truth, the system avoids the black box
problem of vector embeddings. Every claim the AI makes can be traced back to a specific file that a human can read, edit, or delete.
Enterprise Implications and Scaling
While Karpathy describes his current setup as a hacky collection of scripts
, the methodology suggests a new product category for the enterprise. Vamshi Reddy noted on April 3, 2026, that while every business has a raw/
directory of unstructured data, few have compiled it into a usable asset.
A corporate version of this system would move beyond searching Slack logs or PDF reports to actively authoring a Company Bible
that updates in real-time. However, Eugen Alpeza, CEO of Edra, noted that scaling this from personal research to enterprise operations involves significant challenges, including millions of records and contradictory tribal knowledge across teams.
To address scaling and integrity, some developers are moving toward multi-agent orchestration. The founder of Secondmate, @jumperz, has illustrated a Swarm Knowledge Base
using a 10-agent system. This setup includes a Quality Gate
using the Hermes model to score and validate draft articles before they are promoted to the live wiki, preventing hallucinations from infecting the collective memory.
Tooling and Philosophy
Karpathy utilizes Obsidian as the frontend for this system. The choice of Markdown ensures the data is not locked into a specific vendor, adhering to a file-over-app
philosophy. In this model, the user owns the data, and the AI serves as a sophisticated editor.
Lex Fridman has confirmed using a similar setup, extending the utility by having the system generate dynamic HTML and JavaScript for interactive data visualization. Fridman also uses the system to create temporary, focused mini-knowledge bases for voice-mode interaction during long runs.
Regarding performance, Karpathy indicates that for datasets of approximately 100 articles and 400,000 words, the LLM’s ability to navigate via summaries and index files is more efficient than the latency and retrieval noise
often introduced by complex RAG infrastructure.
Future Integration and Fine-Tuning
The long-term goal of the LLM Knowledge Base is to create a purified dataset for synthetic data generation and fine-tuning. As the wiki is continuously linted and refined, it becomes a high-quality training set.
Eventually, a user could fine-tune a smaller, more efficient model on their own wiki. This would allow the LLM to integrate the personal knowledge base directly into its weights, transforming a research project into a private, custom intelligence that does not rely solely on a context window.
