feat: Enhance SQL optimization tools with internal knowledge base and observability features

- Updated README.md to include new setup instructions for RAG and observability.
- Added internal knowledge base (KB) setup for SQL optimization team, supporting various document types.
- Implemented token usage logging in LLM tools to track costs and usage.
- Refactored SQL analysis and optimization prompts for clarity and consistency.
- Introduced filtering of external tools based on environment configuration.
- Enhanced conservative analysis agent with structured prompt for performance suggestions.
- Updated requirements.txt to include new dependencies for RAG functionality.
- Added internal KB helpers for building and attaching knowledge to agents.
This commit is contained in:
william.dias 2026-01-23 13:02:17 -03:00
parent c6dd91810b
commit 80d1f9d26a
14 changed files with 502 additions and 355 deletions

View file

@ -25,15 +25,31 @@ src/
1) Crie o ambiente e instale dependências: 1) Crie o ambiente e instale dependências:
- `pip install -r requirements.txt` - `pip install -r requirements.txt`
2) Configure variáveis de ambiente (exemplo em `sample.env`). 2) Configure variáveis de ambiente (exemplo em `sample.env` ou `.env`).
3) Execute o servidor: 3) Execute o servidor:
- `PYTHONPATH=src python -m main` - `./scripts/start.sh`
Acesse: Acesse:
- `http://localhost:8204/docs` (Swagger UI) - `http://localhost:8204/docs` (Swagger UI)
- `http://localhost:8204` (informações básicas da API) - `http://localhost:8204` (informações básicas da API)
## UI local (Agent UI)
Use o **Agent UI** (agno-agi/agent-ui) como front local:
1) Instale com o script oficial:
- `npx create-agent-ui@latest`
1) Inicie a UI:
- `pnpm dev`
1) Abra `http://localhost:3000` e ajuste o endpoint para `http://localhost:8204`.
Opcional: se o AgentOS usar autenticação, configure `OS_SECURITY_KEY` conforme o README do Agent UI.
## Fluxo do time ## Fluxo do time
1) **Gestor** recebe a requisição e valida o contexto (banco + SQL). 1) **Gestor** recebe a requisição e valida o contexto (banco + SQL).
@ -43,7 +59,23 @@ Acesse:
5) **Conservative Analyst** (se solicitado) gera análise sem reescrever a query. 5) **Conservative Analyst** (se solicitado) gera análise sem reescrever a query.
6) **Gestor** consolida e entrega. 6) **Gestor** consolida e entrega.
## RAG (KB interna)
- Coloque documentos em `kb/` (md/txt/sql/pdf).
- O RAG local usa Chroma + SentenceTransformers.
- Variáveis principais:
- `SQL_OPT_KB_PATH`, `SQL_OPT_KB_CHROMA_PATH`, `SQL_OPT_KB_DB_FILE`
- `SQL_OPT_KB_EMBEDDER_ID`
- `SQL_OPT_BLOCK_EXTERNAL_TOOLS=true` bloqueia ferramentas externas.
## Observabilidade de tokens/custos
- Ative com `LLM_LOG_USAGE=true`.
- Defina preços (USD por 1K tokens) com:
- `LLM_COST_INPUT_PER_1K`
- `LLM_COST_OUTPUT_PER_1K`
## Observações ## Observações
- Use o modelo configurado em variáveis de ambiente (ex.: OpenAI, Gemini, Groq, etc.). - Use o provedor configurado em `.env` (ex.: Ollama local, OpenAI, Gemini, Groq, etc.).
- O time é colaborativo e mantém histórico em SQLite (configurável via env). - O time é colaborativo e mantém histórico em SQLite (configurável via env).

View file

@ -124,6 +124,11 @@ Recomendação de ferramentas de mercado:
- **Langfuse** ou **Phoenix** para rastreio de prompts, custos e latência. - **Langfuse** ou **Phoenix** para rastreio de prompts, custos e latência.
- **Grafana/Prometheus** para dashboards executivos. - **Grafana/Prometheus** para dashboards executivos.
Status no POC:
- **Logging de tokens/custos** já implementado via `LLM_LOG_USAGE` e custos por 1K tokens.
- Métricas persistentes e dashboards (Grafana/Prometheus) permanecem como evolução.
Métricas mínimas: Métricas mínimas:
- Tokens por request e por área. - Tokens por request e por área.
@ -150,6 +155,11 @@ Métricas mínimas:
- Curadoria contínua com feedback dos times para melhorar a relevância. - Curadoria contínua com feedback dos times para melhorar a relevância.
- **Aumento de precisão**: respostas consistentes com políticas internas e padrões técnicos. - **Aumento de precisão**: respostas consistentes com políticas internas e padrões técnicos.
Status no POC:
- **RAG local** com base interna em `kb/` usando Chroma + SentenceTransformers.
- **Bloqueio de ferramentas externas** por padrão via `SQL_OPT_BLOCK_EXTERNAL_TOOLS=true`.
## 10) Stack definitiva (100% Agno) ## 10) Stack definitiva (100% Agno)
- **Agno** como framework único para orquestração, memória e tools. - **Agno** como framework único para orquestração, memória e tools.

18
kb/README.md Normal file
View file

@ -0,0 +1,18 @@
# Base de Conhecimento Interna (KB)
Coloque aqui documentos internos que devem ser usados no RAG.
Suportado (por padrão):
- Markdown (.md)
- Texto (.txt)
- SQL (.sql)
- PDF (.pdf)
Configurações via ambiente:
- SQL_OPT_KB_PATH (padrão: kb)
- SQL_OPT_KB_CHROMA_PATH (padrão: tmp/kb_chroma)
- SQL_OPT_KB_EMBEDDER_ID (padrão: sentence-transformers/all-MiniLM-L6-v2)
- SQL_OPT_KB_DB_FILE (padrão: tmp/sql_optimizer_kb.db)
- SQL_OPT_BLOCK_EXTERNAL_TOOLS (padrão: true)

View file

@ -30,3 +30,7 @@ oracledb==3.4.1
pymssql==2.3.11 pymssql==2.3.11
sqlparse==0.5.5 sqlparse==0.5.5
sqlglot==28.6.0 sqlglot==28.6.0
# RAG (local KB)
chromadb==0.6.3
sentence-transformers==3.4.1

View file

@ -15,3 +15,15 @@
# SQL Optimizer Team # SQL Optimizer Team
SQL_OPT_TEAM_DB_FILE=tmp/sql_optimizer_team.db SQL_OPT_TEAM_DB_FILE=tmp/sql_optimizer_team.db
SQL_OPT_TEAM_DEBUG_MODE=false SQL_OPT_TEAM_DEBUG_MODE=false
# Observabilidade de tokens/custos
LLM_LOG_USAGE=true
LLM_COST_INPUT_PER_1K=0
LLM_COST_OUTPUT_PER_1K=0
# RAG / KB interna
SQL_OPT_KB_PATH=kb
SQL_OPT_KB_CHROMA_PATH=tmp/kb_chroma
SQL_OPT_KB_DB_FILE=tmp/sql_optimizer_kb.db
SQL_OPT_KB_EMBEDDER_ID=sentence-transformers/all-MiniLM-L6-v2
SQL_OPT_BLOCK_EXTERNAL_TOOLS=true

View file

@ -7,6 +7,53 @@ import os
base_model = get_model() base_model = get_model()
CONSERVATIVE_ANALYSIS_PROMPT = """
You are an expert $database_name database analyst and performance specialist.
Your task is to ANALYZE the SQL query below and provide SUGGESTIONS for improvement.
CRITICAL: You must NOT rewrite or modify the query. Only provide analysis and suggestions.
$database_name SQL Query:
```sql
$query
```
Query Complexity Information:
- Columns: $column_count
- Tables: $table_count
- Subqueries: $subquery_count
- CASE statements: $case_count
- JOINs: $join_count
- Complexity Level: $complexity_level
Provide your analysis in the following structured format:
## PERFORMANCE ISSUES
List each performance issue found, with severity (CRITICAL/HIGH/MEDIUM/LOW):
- [SEVERITY] Issue description
- [SEVERITY] Issue description
## SUGGESTED INDEXES
List indexes that could improve this query:
- CREATE INDEX idx_name ON table(columns) -- Reason
## OPTIMIZATION SUGGESTIONS
List specific suggestions WITHOUT rewriting the query:
- Suggestion 1: Description of what could be improved and why
- Suggestion 2: Description of what could be improved and why
## RISK ASSESSMENT
- WITH (NOLOCK) usage: [Yes/No] - If yes, explain the risks
- Missing WHERE clause: [Yes/No] - If yes, explain the impact
- Implicit conversions: [Yes/No] - If yes, list them
## SUMMARY
Brief summary of the most important findings and priority order for addressing them.
Remember: DO NOT provide a rewritten query. Only analysis and suggestions.
""".strip()
_db_path = os.getenv("SQL_OPT_TEAM_DB_FILE", "tmp/sql_optimizer_team.db") _db_path = os.getenv("SQL_OPT_TEAM_DB_FILE", "tmp/sql_optimizer_team.db")
_debug_mode = os.getenv("SQL_OPT_TEAM_DEBUG_MODE", "false").strip().lower() in {"1", "true", "yes", "on"} _debug_mode = os.getenv("SQL_OPT_TEAM_DEBUG_MODE", "false").strip().lower() in {"1", "true", "yes", "on"}
@ -30,52 +77,7 @@ conservative_analysis_agent = Agent(
"- Solicite banco e SQL se não estiverem presentes.", "- Solicite banco e SQL se não estiverem presentes.",
"- Se o usuário fornecer um caminho de arquivo, use load_sql_from_file().", "- Se o usuário fornecer um caminho de arquivo, use load_sql_from_file().",
"- Use a template oficial abaixo para a análise conservadora (sem reescrever a SQL).", "- Use a template oficial abaixo para a análise conservadora (sem reescrever a SQL).",
""" CONSERVATIVE_ANALYSIS_PROMPT,
You are an expert $database_name database analyst and performance specialist.
Your task is to ANALYZE the SQL query below and provide SUGGESTIONS for improvement.
CRITICAL: You must NOT rewrite or modify the query. Only provide analysis and suggestions.
$database_name SQL Query:
```sql
$query
```
Query Complexity Information:
- Columns: $column_count
- Tables: $table_count
- Subqueries: $subquery_count
- CASE statements: $case_count
- JOINs: $join_count
- Complexity Level: $complexity_level
Provide your analysis in the following structured format:
## PERFORMANCE ISSUES
List each performance issue found, with severity (CRITICAL/HIGH/MEDIUM/LOW):
- [SEVERITY] Issue description
- [SEVERITY] Issue description
## SUGGESTED INDEXES
List indexes that could improve this query:
- CREATE INDEX idx_name ON table(columns) -- Reason
## OPTIMIZATION SUGGESTIONS
List specific suggestions WITHOUT rewriting the query:
- Suggestion 1: Description of what could be improved and why
- Suggestion 2: Description of what could be improved and why
## RISK ASSESSMENT
- WITH (NOLOCK) usage: [Yes/No] - If yes, explain the risks
- Missing WHERE clause: [Yes/No] - If yes, explain the impact
- Implicit conversions: [Yes/No] - If yes, list them
## SUMMARY
Brief summary of the most important findings and priority order for addressing them.
Remember: DO NOT provide a rewritten query. Only analysis and suggestions.
""".strip(),
"- NÃO reescreva a SQL em hipótese alguma.", "- NÃO reescreva a SQL em hipótese alguma.",
], ],
) )

View file

@ -1,18 +1,101 @@
from agno.agent import Agent from agno.agent import Agent
from agno.db.sqlite import SqliteDb from agno.db.sqlite import SqliteDb
from sql_optimizer_team.tools.engine.model_selector import get_model from sql_optimizer_team.tools.engine.model_selector import get_model
from sql_optimizer_team.tools.core_tools import explain_query_core
from sql_optimizer_team.tools.prompt_tools import supported_databases from sql_optimizer_team.tools.prompt_tools import supported_databases
from sql_optimizer_team.tools.sql_tools import load_sql_from_file, ensure_non_empty from sql_optimizer_team.tools.sql_tools import load_sql_from_file, ensure_non_empty
import os import os
base_model = get_model() base_model = get_model()
SQL_TO_NATURAL_PROMPT = """
You are an expert $database_name database analyst and performance specialist. Your task is to translate the SQL query below into a detailed, precise natural-language description that another agent will later use to reconstruct and optimize the query.
$database_name SQL Query:
```sql
$query
```
Your explanation must follow these requirements:
1. **Describe the overall purpose**
- Explain clearly what the query is intended to accomplish and why (retrieve data, update rows, aggregate information, validate existence, create structures, etc.).
2. **List ALL involved database objects**
Explicitly list every:
- Table
- View
- CTE (Common Table Expression)
- Subquery or derived table
- Function
- Stored procedure, if referenced
- Temporary table
- Schema-qualified object
Use the exact names as they appear in the query.
3. **Describe all essential operations**
Explicitly state, using exact column names:
- Columns retrieved or modified
- Join types, join conditions, and which objects participate
- Filters and conditions (WHERE, boolean logic, comparisons)
- Aggregations (SUM, COUNT, etc.)
- Grouping and HAVING clauses
- Sorting (ORDER BY)
- Window functions
- DISTINCT, TOP, LIMIT, OFFSET, pagination
- Any $database_name-specific features used$specific_features
4. **Maintain strict factual accuracy**
- Do NOT infer business meaning unless directly implied.
- Do NOT rename or paraphrase column names; repeat them exactly.
5. **Use clear, structured natural language**
- Provide a step-by-step explanation that makes every operation and purpose explicit.
- The output must be complete enough that the query can be reconstructed.
6. ** CRITICAL: Identify Performance Issues**
Flag any of these CRITICAL performance problems found in the query:
- **NO WHERE CLAUSE** (BE CAREFUL - AVOID FALSE POSITIVES):
* ONLY flag if the MAIN/OUTER SELECT has absolutely NO WHERE keyword with filtering conditions
* If query HAS 'WHERE' followed by conditions (even old-style JOINs in WHERE), DO NOT flag
* Subqueries/EXISTS having WHERE does NOT mean main query has no WHERE
* CROSS APPLY/LATERAL with internal WHERE counts as filtered
* If truly no WHERE: Flag as CRITICAL (causes FULL TABLE SCAN, no predicate pushdown)
- **Non-SARGable patterns**: Functions on indexed columns in WHERE/JOIN (e.g., YEAR(date), UPPER(col))
- **Leading wildcards**: LIKE '%value%' patterns that prevent index usage
- **Implicit conversions**: Type mismatches in comparisons
- **NOLOCK/WITH (NOLOCK) hints**: If query uses WITH (NOLOCK), WITH (nolock), WITH(NOLOCK), (NOLOCK), (nolock) or NOLOCK/nolock (any case) DO NOT REMOVE, but FLAG as **CRITICAL RISK**: "⚠️ WITH (NOLOCK) reads uncommitted/dirty data - CRITICAL: may cause INCORRECT FINANCIAL VALUES and data inconsistencies in production"
$analysis_requirements
Explanation:
""".strip()
_db_path = os.getenv("SQL_OPT_TEAM_DB_FILE", "tmp/sql_optimizer_team.db") _db_path = os.getenv("SQL_OPT_TEAM_DB_FILE", "tmp/sql_optimizer_team.db")
_debug_mode = os.getenv("SQL_OPT_TEAM_DEBUG_MODE", "false").strip().lower() in {"1", "true", "yes", "on"} _debug_mode = os.getenv("SQL_OPT_TEAM_DEBUG_MODE", "false").strip().lower() in {"1", "true", "yes", "on"}
db = SqliteDb(db_file=_db_path) db = SqliteDb(db_file=_db_path)
async def explain_query_tool(
database_type: str,
sql: str,
provider: str | None = None,
model: str | None = None,
temperature: float | None = None,
max_tokens: int | None = None,
api_key: str | None = None,
) -> dict[str, str]:
from sql_optimizer_team.tools.core_tools import explain_query_core
return await explain_query_core(
database_type=database_type,
sql=sql,
provider=provider,
model=model,
temperature=temperature,
max_tokens=max_tokens,
api_key=api_key,
)
sql_analyst_agent = Agent( sql_analyst_agent = Agent(
name="SQL Analyst", name="SQL Analyst",
role=( role=(
@ -20,7 +103,7 @@ sql_analyst_agent = Agent(
"A saída deve seguir exatamente a prompt original (SQL → natural) do projeto oracle-sql-query-optimizer." "A saída deve seguir exatamente a prompt original (SQL → natural) do projeto oracle-sql-query-optimizer."
), ),
model=base_model, model=base_model,
tools=[explain_query_core, load_sql_from_file, ensure_non_empty, supported_databases], tools=[explain_query_tool, load_sql_from_file, ensure_non_empty, supported_databases],
markdown=True, markdown=True,
add_history_to_context=True, add_history_to_context=True,
db=db, db=db,
@ -32,67 +115,7 @@ sql_analyst_agent = Agent(
"- Se o usuário fornecer um caminho de arquivo, use load_sql_from_file().", "- Se o usuário fornecer um caminho de arquivo, use load_sql_from_file().",
"- Preferência: use explain_query_core(database_type, sql) para gerar a explicação via core de negócio.", "- Preferência: use explain_query_core(database_type, sql) para gerar a explicação via core de negócio.",
"- Use a template oficial abaixo para estruturar a explicação (SQL → natural).", "- Use a template oficial abaixo para estruturar a explicação (SQL → natural).",
""" SQL_TO_NATURAL_PROMPT,
You are an expert $database_name database analyst and performance specialist. Your task is to translate the SQL query below into a detailed, precise natural-language description that another agent will later use to reconstruct and optimize the query.
$database_name SQL Query:
```sql
$query
```
Your explanation must follow these requirements:
1. **Describe the overall purpose**
- Explain clearly what the query is intended to accomplish and why (retrieve data, update rows, aggregate information, validate existence, create structures, etc.).
2. **List ALL involved database objects**
Explicitly list every:
- Table
- View
- CTE (Common Table Expression)
- Subquery or derived table
- Function
- Stored procedure, if referenced
- Temporary table
- Schema-qualified object
Use the exact names as they appear in the query.
3. **Describe all essential operations**
Explicitly state, using exact column names:
- Columns retrieved or modified
- Join types, join conditions, and which objects participate
- Filters and conditions (WHERE, boolean logic, comparisons)
- Aggregations (SUM, COUNT, etc.)
- Grouping and HAVING clauses
- Sorting (ORDER BY)
- Window functions
- DISTINCT, TOP, LIMIT, OFFSET, pagination
- Any $database_name-specific features used$specific_features
4. **Maintain strict factual accuracy**
- Do NOT infer business meaning unless directly implied.
- Do NOT rename or paraphrase column names; repeat them exactly.
5. **Use clear, structured natural language**
- Provide a step-by-step explanation that makes every operation and purpose explicit.
- The output must be complete enough that the query can be reconstructed.
6. ** CRITICAL: Identify Performance Issues**
Flag any of these CRITICAL performance problems found in the query:
- **NO WHERE CLAUSE** (BE CAREFUL - AVOID FALSE POSITIVES):
* ONLY flag if the MAIN/OUTER SELECT has absolutely NO WHERE keyword with filtering conditions
* If query HAS 'WHERE' followed by conditions (even old-style JOINs in WHERE), DO NOT flag
* Subqueries/EXISTS having WHERE does NOT mean main query has no WHERE
* CROSS APPLY/LATERAL with internal WHERE counts as filtered
* If truly no WHERE: Flag as CRITICAL (causes FULL TABLE SCAN, no predicate pushdown)
- **Non-SARGable patterns**: Functions on indexed columns in WHERE/JOIN (e.g., YEAR(date), UPPER(col))
- **Leading wildcards**: LIKE '%value%' patterns that prevent index usage
- **Implicit conversions**: Type mismatches in comparisons
- **NOLOCK/WITH (NOLOCK) hints**: If query uses WITH (NOLOCK), WITH (nolock), WITH(NOLOCK), (NOLOCK), (nolock) or NOLOCK/nolock (any case) DO NOT REMOVE, but FLAG as **CRITICAL RISK**: "⚠️ WITH (NOLOCK) reads uncommitted/dirty data - CRITICAL: may cause INCORRECT FINANCIAL VALUES and data inconsistencies in production"
$analysis_requirements
Explanation:
""".strip(),
"- Entregue apenas a explicação natural estruturada conforme a prompt; não reescreva a SQL.", "- Entregue apenas a explicação natural estruturada conforme a prompt; não reescreva a SQL.",
"- Identifique problemas críticos de performance conforme a prompt.", "- Identifique problemas críticos de performance conforme a prompt.",
], ],

View file

@ -1,18 +1,104 @@
from agno.agent import Agent from agno.agent import Agent
from agno.db.sqlite import SqliteDb from agno.db.sqlite import SqliteDb
from sql_optimizer_team.tools.engine.model_selector import get_model from sql_optimizer_team.tools.engine.model_selector import get_model
from sql_optimizer_team.tools.core_tools import optimize_query_core
from sql_optimizer_team.tools.prompt_tools import supported_databases from sql_optimizer_team.tools.prompt_tools import supported_databases
from sql_optimizer_team.tools.sql_tools import load_sql_from_file, ensure_non_empty from sql_optimizer_team.tools.sql_tools import load_sql_from_file, ensure_non_empty
import os import os
base_model = get_model() base_model = get_model()
NATURAL_TO_SQL_PROMPT = """
You are an expert $database_name SQL developer and query performance specialist.
Your task is to write an optimized SQL query based exclusively on the natural-language description provided below.
Description:
$explanation
CRITICAL RULES - READ BEFORE GENERATING SQL:
1. **PRESERVE ALL BUSINESS LOGIC EXACTLY**
- Every CASE WHEN statement must have IDENTICAL conditions and results
- Every calculated column must use IDENTICAL formulas
- Every subquery must query the SAME tables with SAME filters
- Do NOT simplify, merge, or "improve" business logic - even if it looks redundant
- If description mentions specific conditions (cd_tp_apolice = 2, etc.), preserve them EXACTLY
2. **PRESERVE ALL TABLES AND COLUMNS**
- Include EVERY table mentioned in the description
- Include EVERY column mentioned in the description
- Use EXACT column names as described (no renaming)
- Use EXACT table aliases as described
3. **Translate the full described logic into SQL**
- Implement all actions, operations, filters, joins, and conditions exactly as stated.
- Use every object and column referenced in the description, using their exact names.
- If the description mentions specific filter values (e.g., cd_tipo_endosso = 0), use those EXACT values
4. **Write optimized SQL while preserving semantics**
- Apply $database_name best practices for performance.
- Use indexing-aware filtering, efficient join strategies, and clear expressions.
- Implement aggregations, groupings, window functions, or pagination when described.
- Prefer performant constructs commonly recommended for $database_name workloads.
- OPTIMIZATION means structure/hints/indexes - NOT changing logic
5. **Use $database_name-specific syntax and features**
- Apply native functions, operators, optimizer behaviors, or hints when appropriate.
- Incorporate $specific_requirements if provided.
6. **Ensure logical fidelity - ZERO TOLERANCE FOR CHANGES**
- The SQL must reflect PRECISELY the behavior described
- Do NOT add logic not explicitly stated
- Do NOT omit any step described
- Do NOT infer or assume details beyond what is explicitly stated
- Do NOT "simplify" complex CASE statements
- Do NOT merge or combine separate calculated columns
7. **Self-Verification Checklist** (perform before outputting):
- [ ] All tables from description are present in query
- [ ] All columns from description are present in SELECT
- [ ] All CASE conditions match description exactly
- [ ] All subquery filters match description exactly
- [ ] All JOIN conditions match description exactly
- [ ] No business logic was simplified or changed
8. **Output format**
- Provide ONLY the final, optimized SQL query.
- Do NOT include explanations, comments, or extra text.
Optimized SQL Query:
""".strip()
_db_path = os.getenv("SQL_OPT_TEAM_DB_FILE", "tmp/sql_optimizer_team.db") _db_path = os.getenv("SQL_OPT_TEAM_DB_FILE", "tmp/sql_optimizer_team.db")
_debug_mode = os.getenv("SQL_OPT_TEAM_DEBUG_MODE", "false").strip().lower() in {"1", "true", "yes", "on"} _debug_mode = os.getenv("SQL_OPT_TEAM_DEBUG_MODE", "false").strip().lower() in {"1", "true", "yes", "on"}
db = SqliteDb(db_file=_db_path) db = SqliteDb(db_file=_db_path)
async def optimize_query_tool(
database_type: str,
sql: str,
provider: str | None = None,
model: str | None = None,
temperature: float | None = None,
max_tokens: int | None = None,
api_key: str | None = None,
output_dir: str | None = None,
no_review: bool = False,
) -> dict[str, str | dict[str, str]]:
from sql_optimizer_team.tools.core_tools import optimize_query_core
return await optimize_query_core(
database_type=database_type,
sql=sql,
provider=provider,
model=model,
temperature=temperature,
max_tokens=max_tokens,
api_key=api_key,
output_dir=output_dir,
no_review=no_review,
)
sql_optimizer_agent = Agent( sql_optimizer_agent = Agent(
name="SQL Optimizer", name="SQL Optimizer",
role=( role=(
@ -20,7 +106,7 @@ sql_optimizer_agent = Agent(
"mantendo 100% da lógica e entregando apenas a SQL otimizada." "mantendo 100% da lógica e entregando apenas a SQL otimizada."
), ),
model=base_model, model=base_model,
tools=[optimize_query_core, load_sql_from_file, ensure_non_empty, supported_databases], tools=[optimize_query_tool, load_sql_from_file, ensure_non_empty, supported_databases],
markdown=True, markdown=True,
add_history_to_context=True, add_history_to_context=True,
db=db, db=db,
@ -31,66 +117,7 @@ sql_optimizer_agent = Agent(
"- Exija banco alvo e SQL antes de otimizar.", "- Exija banco alvo e SQL antes de otimizar.",
"- Use optimize_query_core(database_type, sql) para executar o core de negócio.", "- Use optimize_query_core(database_type, sql) para executar o core de negócio.",
"- Use a template oficial abaixo para reescrever (natural → SQL) mantendo 100% da lógica.", "- Use a template oficial abaixo para reescrever (natural → SQL) mantendo 100% da lógica.",
""" NATURAL_TO_SQL_PROMPT,
You are an expert $database_name SQL developer and query performance specialist.
Your task is to write an optimized SQL query based exclusively on the natural-language description provided below.
Description:
$explanation
CRITICAL RULES - READ BEFORE GENERATING SQL:
1. **PRESERVE ALL BUSINESS LOGIC EXACTLY**
- Every CASE WHEN statement must have IDENTICAL conditions and results
- Every calculated column must use IDENTICAL formulas
- Every subquery must query the SAME tables with SAME filters
- Do NOT simplify, merge, or "improve" business logic - even if it looks redundant
- If description mentions specific conditions (cd_tp_apolice = 2, etc.), preserve them EXACTLY
2. **PRESERVE ALL TABLES AND COLUMNS**
- Include EVERY table mentioned in the description
- Include EVERY column mentioned in the description
- Use EXACT column names as described (no renaming)
- Use EXACT table aliases as described
3. **Translate the full described logic into SQL**
- Implement all actions, operations, filters, joins, and conditions exactly as stated.
- Use every object and column referenced in the description, using their exact names.
- If the description mentions specific filter values (e.g., cd_tipo_endosso = 0), use those EXACT values
4. **Write optimized SQL while preserving semantics**
- Apply $database_name best practices for performance.
- Use indexing-aware filtering, efficient join strategies, and clear expressions.
- Implement aggregations, groupings, window functions, or pagination when described.
- Prefer performant constructs commonly recommended for $database_name workloads.
- OPTIMIZATION means structure/hints/indexes - NOT changing logic
5. **Use $database_name-specific syntax and features**
- Apply native functions, operators, optimizer behaviors, or hints when appropriate.
- Incorporate $specific_requirements if provided.
6. **Ensure logical fidelity - ZERO TOLERANCE FOR CHANGES**
- The SQL must reflect PRECISELY the behavior described
- Do NOT add logic not explicitly stated
- Do NOT omit any step described
- Do NOT infer or assume details beyond what is explicitly stated
- Do NOT "simplify" complex CASE statements
- Do NOT merge or combine separate calculated columns
7. **Self-Verification Checklist** (perform before outputting):
- [ ] All tables from description are present in query
- [ ] All columns from description are present in SELECT
- [ ] All CASE conditions match description exactly
- [ ] All subquery filters match description exactly
- [ ] All JOIN conditions match description exactly
- [ ] No business logic was simplified or changed
8. **Output format**
- Provide ONLY the final, optimized SQL query.
- Do NOT include explanations, comments, or extra text.
Optimized SQL Query:
""".strip(),
"- Extraia e devolva SOMENTE optimized_query (sem explicações, sem markdown).", "- Extraia e devolva SOMENTE optimized_query (sem explicações, sem markdown).",
"- Preserve 100% da lógica, colunas, aliases, filtros, joins e subqueries.", "- Preserve 100% da lógica, colunas, aliases, filtros, joins e subqueries.",
], ],

View file

@ -0,0 +1,5 @@
"""Internal knowledge base helpers."""
from sql_optimizer_team.knowledge.internal_kb import build_internal_knowledge, attach_internal_knowledge
__all__ = ["build_internal_knowledge", "attach_internal_knowledge"]

View file

@ -0,0 +1,100 @@
"""Internal KB (RAG) setup for the SQL optimizer team."""
from __future__ import annotations
from dataclasses import dataclass
from pathlib import Path
import os
from agno.db.sqlite import SqliteDb
from agno.knowledge.knowledge import Knowledge
from agno.knowledge.embedder.sentence_transformer import SentenceTransformerEmbedder
from agno.vectordb.chroma import ChromaDb
from sql_optimizer_team.tools.engine.config.logger import get_logger
logger = get_logger(__name__)
@dataclass(frozen=True)
class InternalKBConfig:
kb_path: Path
chroma_path: Path
embedder_id: str
contents_db_file: Path
block_external: bool
def _load_config() -> InternalKBConfig:
kb_path = Path(os.getenv("SQL_OPT_KB_PATH", "kb")).resolve()
chroma_path = Path(os.getenv("SQL_OPT_KB_CHROMA_PATH", "tmp/kb_chroma")).resolve()
embedder_id = os.getenv(
"SQL_OPT_KB_EMBEDDER_ID",
"sentence-transformers/all-MiniLM-L6-v2",
).strip()
contents_db_file = Path(os.getenv("SQL_OPT_KB_DB_FILE", "tmp/sql_optimizer_kb.db")).resolve()
block_external = os.getenv("SQL_OPT_BLOCK_EXTERNAL_TOOLS", "true").strip().lower() in {"1", "true", "yes", "on"}
return InternalKBConfig(
kb_path=kb_path,
chroma_path=chroma_path,
embedder_id=embedder_id,
contents_db_file=contents_db_file,
block_external=block_external,
)
def build_internal_knowledge() -> Knowledge:
config = _load_config()
if config.block_external:
logger.info("External tools blocked for KB", kb_path=str(config.kb_path))
embedder = SentenceTransformerEmbedder(id=config.embedder_id)
vector_db = ChromaDb(
name="sql-optimizer-kb",
path=str(config.chroma_path),
persistent_client=True,
embedder=embedder,
)
contents_db = SqliteDb(db_file=str(config.contents_db_file))
knowledge = Knowledge(
name="internal-sql-kb",
description="Base de conhecimento interna para otimização de SQL",
vector_db=vector_db,
contents_db=contents_db,
max_results=6,
)
if not config.kb_path.exists():
logger.warning("KB path not found; skipping ingest", kb_path=str(config.kb_path))
return knowledge
if config.block_external and not config.kb_path.is_dir():
logger.warning("KB path is not a directory; skipping ingest", kb_path=str(config.kb_path))
return knowledge
try:
knowledge.insert(
path=str(config.kb_path),
include=["**/*.md", "**/*.txt", "**/*.sql", "**/*.pdf"],
exclude=["**/.git/**", "**/.venv/**", "**/__pycache__/**"],
upsert=True,
skip_if_exists=True,
)
logger.info("KB ingest complete", kb_path=str(config.kb_path))
except Exception as exc:
logger.error("KB ingest failed", error=str(exc))
return knowledge
def attach_internal_knowledge(knowledge: Knowledge, *agents: object) -> None:
for agent in agents:
try:
setattr(agent, "knowledge", knowledge)
setattr(agent, "add_knowledge_to_context", True)
setattr(agent, "search_knowledge", True)
setattr(agent, "update_knowledge", False)
except Exception as exc:
logger.warning("Failed to attach knowledge", agent=str(agent), error=str(exc))

View file

@ -2,13 +2,12 @@ from agno.team.team import Team
from agno.os.app import AgentOS from agno.os.app import AgentOS
from agno.db.sqlite import SqliteDb from agno.db.sqlite import SqliteDb
from dotenv import load_dotenv from dotenv import load_dotenv
from sql_optimizer_team.knowledge import build_internal_knowledge, attach_internal_knowledge
from sql_optimizer_team.tools.engine.model_selector import get_model from sql_optimizer_team.tools.engine.model_selector import get_model
from sql_optimizer_team.agents import ( from sql_optimizer_team.agents.sql_analyst_agent import sql_analyst_agent
sql_analyst_agent, from sql_optimizer_team.agents.sql_optimizer_agent import sql_optimizer_agent
sql_optimizer_agent, from sql_optimizer_team.agents.sql_quality_agent import sql_quality_agent
sql_quality_agent, from sql_optimizer_team.agents.conservative_analysis_agent import conservative_analysis_agent
conservative_analysis_agent,
)
import os import os
load_dotenv() load_dotenv()
@ -20,6 +19,39 @@ _debug_mode = os.getenv("SQL_OPT_TEAM_DEBUG_MODE", "false").strip().lower() in {
db = SqliteDb(db_file=_db_path) db = SqliteDb(db_file=_db_path)
_kb = build_internal_knowledge()
attach_internal_knowledge(
_kb,
sql_analyst_agent,
sql_optimizer_agent,
sql_quality_agent,
conservative_analysis_agent,
)
_block_external_tools = os.getenv("SQL_OPT_BLOCK_EXTERNAL_TOOLS", "true").strip().lower() in {"1", "true", "yes", "on"}
if _block_external_tools:
_allowed_tool_names = {
"explain_query_tool",
"optimize_query_tool",
"load_sql_from_file",
"ensure_non_empty",
"supported_databases",
"diff_sql",
}
def _filter_tools(agent) -> None:
if not getattr(agent, "tools", None):
return
filtered = []
for tool in agent.tools:
name = getattr(tool, "__name__", None) or getattr(tool, "name", None) or str(tool)
if name in _allowed_tool_names:
filtered.append(tool)
agent.tools = filtered
for _agent in [sql_analyst_agent, sql_optimizer_agent, sql_quality_agent, conservative_analysis_agent]:
_filter_tools(_agent)
sql_optimizer_team = Team( sql_optimizer_team = Team(
name="SQL Optimization Team", name="SQL Optimization Team",
model=base_model, model=base_model,

View file

@ -41,6 +41,7 @@ class AgnoLLMTool(BaseLLMTool):
result_text = self._extract_text(response) result_text = self._extract_text(response)
validated = self._validate_response(result_text) validated = self._validate_response(result_text)
self._log_response(validated) self._log_response(validated)
self._log_usage_from_response(response, prompt, validated)
return validated return validated
except Exception as e: except Exception as e:
self._log_error(e) self._log_error(e)

View file

@ -5,6 +5,8 @@ This module provides a base class with common functionality for all LLM tools.
from abc import ABC from abc import ABC
from typing import Any from typing import Any
import math
import os
from sql_optimizer_team.tools.engine.tools_api.llm_tool import LLMTool from sql_optimizer_team.tools.engine.tools_api.llm_tool import LLMTool
from sql_optimizer_team.tools.engine.types.tool_exceptions import LLMProviderError from sql_optimizer_team.tools.engine.types.tool_exceptions import LLMProviderError
@ -91,6 +93,49 @@ class BaseLLMTool(LLMTool, ABC):
**kwargs, **kwargs,
) )
def _estimate_tokens(self, text: str) -> int:
"""Best-effort token estimate when provider usage is unavailable."""
if not text:
return 0
return max(1, math.ceil(len(text) / 4))
def _log_usage_from_response(self, response_obj: Any, prompt: str, response_text: str) -> None:
"""Log token usage and cost if enabled.
Reads usage from ModelResponse when available, otherwise uses a rough estimate.
Cost is computed using env vars LLM_COST_INPUT_PER_1K and LLM_COST_OUTPUT_PER_1K.
"""
enabled = os.getenv("LLM_LOG_USAGE", "true").strip().lower() in {"1", "true", "yes", "on"}
if not enabled:
return
input_tokens = getattr(response_obj, "input_tokens", None)
output_tokens = getattr(response_obj, "output_tokens", None)
total_tokens = getattr(response_obj, "total_tokens", None)
if input_tokens is None:
input_tokens = self._estimate_tokens(prompt)
if output_tokens is None:
output_tokens = self._estimate_tokens(response_text)
if total_tokens is None and input_tokens is not None and output_tokens is not None:
total_tokens = input_tokens + output_tokens
cost_in = float(os.getenv("LLM_COST_INPUT_PER_1K", "0") or 0)
cost_out = float(os.getenv("LLM_COST_OUTPUT_PER_1K", "0") or 0)
cost_usd = None
if input_tokens is not None or output_tokens is not None:
cost_usd = (input_tokens or 0) * cost_in / 1000 + (output_tokens or 0) * cost_out / 1000
logger.info(
"LLM usage",
provider=self.provider_name,
model=self._model_name,
input_tokens=input_tokens,
output_tokens=output_tokens,
total_tokens=total_tokens,
cost_usd=cost_usd,
)
def _log_error(self, error: Exception, **kwargs: Any) -> None: def _log_error(self, error: Exception, **kwargs: Any) -> None:
"""Log LLM error. """Log LLM error.

View file

@ -7,185 +7,17 @@ reducing code duplication and ensuring consistency.
from abc import ABC, abstractmethod from abc import ABC, abstractmethod
from string import Template from string import Template
import importlib
from sql_optimizer_team.tools.engine.tools_api.prompt_tool import PromptGeneratorTool from sql_optimizer_team.tools.engine.tools_api.prompt_tool import PromptGeneratorTool
SQL_TO_NATURAL_TEMPLATE = Template("""
You are an expert $database_name database analyst and performance specialist. Your task is to translate the SQL query below into a detailed, precise natural-language description that another agent will later use to reconstruct and optimize the query.
$database_name SQL Query:
```sql
$query
```
Your explanation must follow these requirements:
1. **Describe the overall purpose**
- Explain clearly what the query is intended to accomplish and why (retrieve data, update rows, aggregate information, validate existence, create structures, etc.).
2. **List ALL involved database objects**
Explicitly list every:
- Table
- View
- CTE (Common Table Expression)
- Subquery or derived table
- Function
- Stored procedure, if referenced
- Temporary table
- Schema-qualified object
Use the exact names as they appear in the query.
3. **Describe all essential operations**
Explicitly state, using exact column names:
- Columns retrieved or modified
- Join types, join conditions, and which objects participate
- Filters and conditions (WHERE, boolean logic, comparisons)
- Aggregations (SUM, COUNT, etc.)
- Grouping and HAVING clauses
- Sorting (ORDER BY)
- Window functions
- DISTINCT, TOP, LIMIT, OFFSET, pagination
- Any $database_name-specific features used$specific_features
4. **Maintain strict factual accuracy**
- Do NOT infer business meaning unless directly implied.
- Do NOT rename or paraphrase column names; repeat them exactly.
5. **Use clear, structured natural language**
- Provide a step-by-step explanation that makes every operation and purpose explicit.
- The output must be complete enough that the query can be reconstructed.
6. ** CRITICAL: Identify Performance Issues**
Flag any of these CRITICAL performance problems found in the query:
- **NO WHERE CLAUSE** (BE CAREFUL - AVOID FALSE POSITIVES):
* ONLY flag if the MAIN/OUTER SELECT has absolutely NO WHERE keyword with filtering conditions
* If query HAS 'WHERE' followed by conditions (even old-style JOINs in WHERE), DO NOT flag
* Subqueries/EXISTS having WHERE does NOT mean main query has no WHERE
* CROSS APPLY/LATERAL with internal WHERE counts as filtered
* If truly no WHERE: Flag as CRITICAL (causes FULL TABLE SCAN, no predicate pushdown)
- **Non-SARGable patterns**: Functions on indexed columns in WHERE/JOIN (e.g., YEAR(date), UPPER(col))
- **Leading wildcards**: LIKE '%value%' patterns that prevent index usage
- **Implicit conversions**: Type mismatches in comparisons
- **NOLOCK/WITH (NOLOCK) hints**: If query uses WITH (NOLOCK), WITH (nolock), WITH(NOLOCK), (NOLOCK), (nolock) or NOLOCK/nolock (any case) DO NOT REMOVE, but FLAG as **CRITICAL RISK**: "⚠️ WITH (NOLOCK) reads uncommitted/dirty data - CRITICAL: may cause INCORRECT FINANCIAL VALUES and data inconsistencies in production"
$analysis_requirements
Explanation:
""")
NATURAL_TO_SQL_TEMPLATE = Template("""
You are an expert $database_name SQL developer and query performance specialist.
Your task is to write an optimized SQL query based exclusively on the natural-language description provided below.
Description:
$explanation
CRITICAL RULES - READ BEFORE GENERATING SQL:
1. **PRESERVE ALL BUSINESS LOGIC EXACTLY**
- Every CASE WHEN statement must have IDENTICAL conditions and results
- Every calculated column must use IDENTICAL formulas
- Every subquery must query the SAME tables with SAME filters
- Do NOT simplify, merge, or "improve" business logic - even if it looks redundant
- If description mentions specific conditions (cd_tp_apolice = 2, etc.), preserve them EXACTLY
2. **PRESERVE ALL TABLES AND COLUMNS**
- Include EVERY table mentioned in the description
- Include EVERY column mentioned in the description
- Use EXACT column names as described (no renaming)
- Use EXACT table aliases as described
3. **Translate the full described logic into SQL**
- Implement all actions, operations, filters, joins, and conditions exactly as stated.
- Use every object and column referenced in the description, using their exact names.
- If the description mentions specific filter values (e.g., cd_tipo_endosso = 0), use those EXACT values
4. **Write optimized SQL while preserving semantics**
- Apply $database_name best practices for performance.
- Use indexing-aware filtering, efficient join strategies, and clear expressions.
- Implement aggregations, groupings, window functions, or pagination when described.
- Prefer performant constructs commonly recommended for $database_name workloads.
- OPTIMIZATION means structure/hints/indexes - NOT changing logic
5. **Use $database_name-specific syntax and features**
- Apply native functions, operators, optimizer behaviors, or hints when appropriate.
- Incorporate $specific_requirements if provided.
6. **Ensure logical fidelity - ZERO TOLERANCE FOR CHANGES**
- The SQL must reflect PRECISELY the behavior described
- Do NOT add logic not explicitly stated
- Do NOT omit any step described
- Do NOT infer or assume details beyond what is explicitly stated
- Do NOT "simplify" complex CASE statements
- Do NOT merge or combine separate calculated columns
7. **Self-Verification Checklist** (perform before outputting):
- [ ] All tables from description are present in query
- [ ] All columns from description are present in SELECT
- [ ] All CASE conditions match description exactly
- [ ] All subquery filters match description exactly
- [ ] All JOIN conditions match description exactly
- [ ] No business logic was simplified or changed
8. **Output format**
- Provide ONLY the final, optimized SQL query.
- Do NOT include explanations, comments, or extra text.
Optimized SQL Query:
""")
CONSERVATIVE_ANALYSIS_TEMPLATE = Template("""
You are an expert $database_name database analyst and performance specialist.
Your task is to ANALYZE the SQL query below and provide SUGGESTIONS for improvement.
CRITICAL: You must NOT rewrite or modify the query. Only provide analysis and suggestions.
$database_name SQL Query:
```sql
$query
```
Query Complexity Information:
- Columns: $column_count
- Tables: $table_count
- Subqueries: $subquery_count
- CASE statements: $case_count
- JOINs: $join_count
- Complexity Level: $complexity_level
Provide your analysis in the following structured format:
## PERFORMANCE ISSUES
List each performance issue found, with severity (CRITICAL/HIGH/MEDIUM/LOW):
- [SEVERITY] Issue description
- [SEVERITY] Issue description
## SUGGESTED INDEXES
List indexes that could improve this query:
- CREATE INDEX idx_name ON table(columns) -- Reason
## OPTIMIZATION SUGGESTIONS
List specific suggestions WITHOUT rewriting the query:
- Suggestion 1: Description of what could be improved and why
- Suggestion 2: Description of what could be improved and why
## RISK ASSESSMENT
- WITH (NOLOCK) usage: [Yes/No] - If yes, explain the risks
- Missing WHERE clause: [Yes/No] - If yes, explain the impact
- Implicit conversions: [Yes/No] - If yes, list them
## SUMMARY
Brief summary of the most important findings and priority order for addressing them.
Remember: DO NOT provide a rewritten query. Only analysis and suggestions.
""")
def _render_sql_to_natural( def _render_sql_to_natural(
database_name: str, query: str, specific_features: str = "", analysis_requirements: str = "" database_name: str, query: str, specific_features: str = "", analysis_requirements: str = ""
) -> str: ) -> str:
return SQL_TO_NATURAL_TEMPLATE.substitute( module = importlib.import_module("sql_optimizer_team.agents.sql_analyst_agent")
template_text = getattr(module, "SQL_TO_NATURAL_PROMPT")
return Template(template_text).substitute(
database_name=database_name, database_name=database_name,
query=query, query=query,
specific_features=f"\n{specific_features}" if specific_features else "", specific_features=f"\n{specific_features}" if specific_features else "",
@ -196,7 +28,9 @@ def _render_sql_to_natural(
def _render_natural_to_sql( def _render_natural_to_sql(
database_name: str, explanation: str, specific_requirements: str database_name: str, explanation: str, specific_requirements: str
) -> str: ) -> str:
return NATURAL_TO_SQL_TEMPLATE.substitute( module = importlib.import_module("sql_optimizer_team.agents.sql_optimizer_agent")
template_text = getattr(module, "NATURAL_TO_SQL_PROMPT")
return Template(template_text).substitute(
database_name=database_name, database_name=database_name,
explanation=explanation, explanation=explanation,
specific_requirements="\n".join( specific_requirements="\n".join(
@ -215,7 +49,9 @@ def _render_conservative_analysis(
join_count: int = 0, join_count: int = 0,
complexity_level: str = "unknown", complexity_level: str = "unknown",
) -> str: ) -> str:
return CONSERVATIVE_ANALYSIS_TEMPLATE.substitute( module = importlib.import_module("sql_optimizer_team.agents.conservative_analysis_agent")
template_text = getattr(module, "CONSERVATIVE_ANALYSIS_PROMPT")
return Template(template_text).substitute(
database_name=database_name, database_name=database_name,
query=query, query=query,
column_count=column_count, column_count=column_count,