Deep Codebase Indexer

I was looking to build my own advanced coding multi agent system. This was the first step in that process.
For a LLM to answer questions from a codebase it needs right context. But just a simple semantic RAG doesn't work that good.
Here are the features i implemented in this project?:
- Incremental indexing: We don't want to index the whole codebase when something in the codebase changes. For this, I implemented an incremental indexing strategy in which we watch files in which changes occur (using SHA-256 hashing) and only indexing them again.
- Lexical Indexing: This is the keyword related search in the codebase. Address the need of exact-word matches.
- Semantic Indexing: This is the meaning related search in the codebase. Done using embeddings model and stored in vector database.
- Structural Indexing: The first two indexes treats codebase as tokens or concepts. But since code has a "Syntax" and "Structure", we can take advantage of that too. We use AST's for this.
- Graph Based Indexing: (Still in progress) This index captures the dynamic part of the codebase, the relationships and data flows that connect disparate parts into a functioning whole. I am trying to use CPG's for this.
Based on the user query, we can give different weights to different indexed to create a good context for the LLM to answer and dig further on the query.