The Limitation of Vector Databases
Standard RAG pipelines are static:
- User asks a question.
- System retrieves top-K chunks from Vector DB.
- LLM answers.
This fails in conversational workflows. If a user says "Change that to 50%," a static RAG system doesn't know what "that" refers to. It searches the vector DB for "50%" and returns garbage.
The Solution: Dynamic Context Management
dyncontext is a middleware layer that sits between your Vector DB and your LLM. It treats context as a living asset, prioritizing information based on recency, relevance, and semantic weight.
Hybrid Retrieval Strategy
We don't rely on embeddings alone. We implement a weighted scoring system crucial for "Sovereign AI" deployments where accuracy is non-negotiable.
from dyncontext import ContextManager
# Configure a hybrid retrieval strategy
cm = ContextManager(
semantic_weight=0.4, # Vector Similarity (The "Vibe")
keyword_weight=0.2, # BM25 (Exact Keyword Match)
recency_weight=0.2, # Time Decay (Newer is better)
tag_weight=0.2 # Metadata (Department/Project scope)
)
# Retrieve context that actually fits the conversation
context = await cm.get_context(
query="What's the liability cap?",
session_id="legal-case-884"
)
Intelligent Reranking & Caching
Retrieval is just step one. We implemented a Cross-Encoder Reranker to filter the results.
- Step 1: Retrieve 50 chunks (High Recall).
- Step 2: Rerank using a high-precision model.
- Step 3: Pass only the top 5 to the LLM (High Precision).
We also built a 3-Layer Cache (Embedding, Query, Session) to reduce latency by 70%. If a user asks the same question, we never hit the LLM API.
Telemetry & Observability
You cannot optimize what you cannot measure. dyncontext provides deep telemetry for every interaction, compatible with OpenTelemetry.
{
"retrieval_ms": 45,
"cache_hit": true,
"relevance_score": 0.92,
"tokens_saved": 1400
}
Why Use This Over LangChain?
LangChain is great for prototyping. dyncontext is built for production latency and cost optimization. It is provider-agnostic, working seamlessly with OpenAI, Anthropic, or Local Llama-3 deployments.
Integration
pip install dyncontext
View the Source: GitHub Repository