feat: Enhance SQL optimization tools with internal knowledge base and observability features
- Updated README.md to include new setup instructions for RAG and observability. - Added internal knowledge base (KB) setup for SQL optimization team, supporting various document types. - Implemented token usage logging in LLM tools to track costs and usage. - Refactored SQL analysis and optimization prompts for clarity and consistency. - Introduced filtering of external tools based on environment configuration. - Enhanced conservative analysis agent with structured prompt for performance suggestions. - Updated requirements.txt to include new dependencies for RAG functionality. - Added internal KB helpers for building and attaching knowledge to agents.
This commit is contained in:
parent
c6dd91810b
commit
80d1f9d26a
14 changed files with 502 additions and 355 deletions
38
README.md
38
README.md
|
|
@ -25,15 +25,31 @@ src/
|
||||||
|
|
||||||
1) Crie o ambiente e instale dependências:
|
1) Crie o ambiente e instale dependências:
|
||||||
- `pip install -r requirements.txt`
|
- `pip install -r requirements.txt`
|
||||||
2) Configure variáveis de ambiente (exemplo em `sample.env`).
|
2) Configure variáveis de ambiente (exemplo em `sample.env` ou `.env`).
|
||||||
3) Execute o servidor:
|
3) Execute o servidor:
|
||||||
- `PYTHONPATH=src python -m main`
|
- `./scripts/start.sh`
|
||||||
|
|
||||||
Acesse:
|
Acesse:
|
||||||
|
|
||||||
- `http://localhost:8204/docs` (Swagger UI)
|
- `http://localhost:8204/docs` (Swagger UI)
|
||||||
- `http://localhost:8204` (informações básicas da API)
|
- `http://localhost:8204` (informações básicas da API)
|
||||||
|
|
||||||
|
## UI local (Agent UI)
|
||||||
|
|
||||||
|
Use o **Agent UI** (agno-agi/agent-ui) como front local:
|
||||||
|
|
||||||
|
1) Instale com o script oficial:
|
||||||
|
|
||||||
|
- `npx create-agent-ui@latest`
|
||||||
|
|
||||||
|
1) Inicie a UI:
|
||||||
|
|
||||||
|
- `pnpm dev`
|
||||||
|
|
||||||
|
1) Abra `http://localhost:3000` e ajuste o endpoint para `http://localhost:8204`.
|
||||||
|
|
||||||
|
Opcional: se o AgentOS usar autenticação, configure `OS_SECURITY_KEY` conforme o README do Agent UI.
|
||||||
|
|
||||||
## Fluxo do time
|
## Fluxo do time
|
||||||
|
|
||||||
1) **Gestor** recebe a requisição e valida o contexto (banco + SQL).
|
1) **Gestor** recebe a requisição e valida o contexto (banco + SQL).
|
||||||
|
|
@ -43,7 +59,23 @@ Acesse:
|
||||||
5) **Conservative Analyst** (se solicitado) gera análise sem reescrever a query.
|
5) **Conservative Analyst** (se solicitado) gera análise sem reescrever a query.
|
||||||
6) **Gestor** consolida e entrega.
|
6) **Gestor** consolida e entrega.
|
||||||
|
|
||||||
|
## RAG (KB interna)
|
||||||
|
|
||||||
|
- Coloque documentos em `kb/` (md/txt/sql/pdf).
|
||||||
|
- O RAG local usa Chroma + SentenceTransformers.
|
||||||
|
- Variáveis principais:
|
||||||
|
- `SQL_OPT_KB_PATH`, `SQL_OPT_KB_CHROMA_PATH`, `SQL_OPT_KB_DB_FILE`
|
||||||
|
- `SQL_OPT_KB_EMBEDDER_ID`
|
||||||
|
- `SQL_OPT_BLOCK_EXTERNAL_TOOLS=true` bloqueia ferramentas externas.
|
||||||
|
|
||||||
|
## Observabilidade de tokens/custos
|
||||||
|
|
||||||
|
- Ative com `LLM_LOG_USAGE=true`.
|
||||||
|
- Defina preços (USD por 1K tokens) com:
|
||||||
|
- `LLM_COST_INPUT_PER_1K`
|
||||||
|
- `LLM_COST_OUTPUT_PER_1K`
|
||||||
|
|
||||||
## Observações
|
## Observações
|
||||||
|
|
||||||
- Use o modelo configurado em variáveis de ambiente (ex.: OpenAI, Gemini, Groq, etc.).
|
- Use o provedor configurado em `.env` (ex.: Ollama local, OpenAI, Gemini, Groq, etc.).
|
||||||
- O time é colaborativo e mantém histórico em SQLite (configurável via env).
|
- O time é colaborativo e mantém histórico em SQLite (configurável via env).
|
||||||
|
|
|
||||||
|
|
@ -124,6 +124,11 @@ Recomendação de ferramentas de mercado:
|
||||||
- **Langfuse** ou **Phoenix** para rastreio de prompts, custos e latência.
|
- **Langfuse** ou **Phoenix** para rastreio de prompts, custos e latência.
|
||||||
- **Grafana/Prometheus** para dashboards executivos.
|
- **Grafana/Prometheus** para dashboards executivos.
|
||||||
|
|
||||||
|
Status no POC:
|
||||||
|
|
||||||
|
- **Logging de tokens/custos** já implementado via `LLM_LOG_USAGE` e custos por 1K tokens.
|
||||||
|
- Métricas persistentes e dashboards (Grafana/Prometheus) permanecem como evolução.
|
||||||
|
|
||||||
Métricas mínimas:
|
Métricas mínimas:
|
||||||
|
|
||||||
- Tokens por request e por área.
|
- Tokens por request e por área.
|
||||||
|
|
@ -150,6 +155,11 @@ Métricas mínimas:
|
||||||
- Curadoria contínua com feedback dos times para melhorar a relevância.
|
- Curadoria contínua com feedback dos times para melhorar a relevância.
|
||||||
- **Aumento de precisão**: respostas consistentes com políticas internas e padrões técnicos.
|
- **Aumento de precisão**: respostas consistentes com políticas internas e padrões técnicos.
|
||||||
|
|
||||||
|
Status no POC:
|
||||||
|
|
||||||
|
- **RAG local** com base interna em `kb/` usando Chroma + SentenceTransformers.
|
||||||
|
- **Bloqueio de ferramentas externas** por padrão via `SQL_OPT_BLOCK_EXTERNAL_TOOLS=true`.
|
||||||
|
|
||||||
## 10) Stack definitiva (100% Agno)
|
## 10) Stack definitiva (100% Agno)
|
||||||
|
|
||||||
- **Agno** como framework único para orquestração, memória e tools.
|
- **Agno** como framework único para orquestração, memória e tools.
|
||||||
|
|
|
||||||
18
kb/README.md
Normal file
18
kb/README.md
Normal file
|
|
@ -0,0 +1,18 @@
|
||||||
|
# Base de Conhecimento Interna (KB)
|
||||||
|
|
||||||
|
Coloque aqui documentos internos que devem ser usados no RAG.
|
||||||
|
|
||||||
|
Suportado (por padrão):
|
||||||
|
|
||||||
|
- Markdown (.md)
|
||||||
|
- Texto (.txt)
|
||||||
|
- SQL (.sql)
|
||||||
|
- PDF (.pdf)
|
||||||
|
|
||||||
|
Configurações via ambiente:
|
||||||
|
|
||||||
|
- SQL_OPT_KB_PATH (padrão: kb)
|
||||||
|
- SQL_OPT_KB_CHROMA_PATH (padrão: tmp/kb_chroma)
|
||||||
|
- SQL_OPT_KB_EMBEDDER_ID (padrão: sentence-transformers/all-MiniLM-L6-v2)
|
||||||
|
- SQL_OPT_KB_DB_FILE (padrão: tmp/sql_optimizer_kb.db)
|
||||||
|
- SQL_OPT_BLOCK_EXTERNAL_TOOLS (padrão: true)
|
||||||
|
|
@ -30,3 +30,7 @@ oracledb==3.4.1
|
||||||
pymssql==2.3.11
|
pymssql==2.3.11
|
||||||
sqlparse==0.5.5
|
sqlparse==0.5.5
|
||||||
sqlglot==28.6.0
|
sqlglot==28.6.0
|
||||||
|
|
||||||
|
# RAG (local KB)
|
||||||
|
chromadb==0.6.3
|
||||||
|
sentence-transformers==3.4.1
|
||||||
|
|
|
||||||
12
sample.env
12
sample.env
|
|
@ -15,3 +15,15 @@
|
||||||
# SQL Optimizer Team
|
# SQL Optimizer Team
|
||||||
SQL_OPT_TEAM_DB_FILE=tmp/sql_optimizer_team.db
|
SQL_OPT_TEAM_DB_FILE=tmp/sql_optimizer_team.db
|
||||||
SQL_OPT_TEAM_DEBUG_MODE=false
|
SQL_OPT_TEAM_DEBUG_MODE=false
|
||||||
|
|
||||||
|
# Observabilidade de tokens/custos
|
||||||
|
LLM_LOG_USAGE=true
|
||||||
|
LLM_COST_INPUT_PER_1K=0
|
||||||
|
LLM_COST_OUTPUT_PER_1K=0
|
||||||
|
|
||||||
|
# RAG / KB interna
|
||||||
|
SQL_OPT_KB_PATH=kb
|
||||||
|
SQL_OPT_KB_CHROMA_PATH=tmp/kb_chroma
|
||||||
|
SQL_OPT_KB_DB_FILE=tmp/sql_optimizer_kb.db
|
||||||
|
SQL_OPT_KB_EMBEDDER_ID=sentence-transformers/all-MiniLM-L6-v2
|
||||||
|
SQL_OPT_BLOCK_EXTERNAL_TOOLS=true
|
||||||
|
|
|
||||||
|
|
@ -7,6 +7,53 @@ import os
|
||||||
|
|
||||||
base_model = get_model()
|
base_model = get_model()
|
||||||
|
|
||||||
|
CONSERVATIVE_ANALYSIS_PROMPT = """
|
||||||
|
You are an expert $database_name database analyst and performance specialist.
|
||||||
|
|
||||||
|
Your task is to ANALYZE the SQL query below and provide SUGGESTIONS for improvement.
|
||||||
|
|
||||||
|
⚠️ CRITICAL: You must NOT rewrite or modify the query. Only provide analysis and suggestions.
|
||||||
|
|
||||||
|
$database_name SQL Query:
|
||||||
|
```sql
|
||||||
|
$query
|
||||||
|
```
|
||||||
|
|
||||||
|
Query Complexity Information:
|
||||||
|
- Columns: $column_count
|
||||||
|
- Tables: $table_count
|
||||||
|
- Subqueries: $subquery_count
|
||||||
|
- CASE statements: $case_count
|
||||||
|
- JOINs: $join_count
|
||||||
|
- Complexity Level: $complexity_level
|
||||||
|
|
||||||
|
Provide your analysis in the following structured format:
|
||||||
|
|
||||||
|
## PERFORMANCE ISSUES
|
||||||
|
List each performance issue found, with severity (CRITICAL/HIGH/MEDIUM/LOW):
|
||||||
|
- [SEVERITY] Issue description
|
||||||
|
- [SEVERITY] Issue description
|
||||||
|
|
||||||
|
## SUGGESTED INDEXES
|
||||||
|
List indexes that could improve this query:
|
||||||
|
- CREATE INDEX idx_name ON table(columns) -- Reason
|
||||||
|
|
||||||
|
## OPTIMIZATION SUGGESTIONS
|
||||||
|
List specific suggestions WITHOUT rewriting the query:
|
||||||
|
- Suggestion 1: Description of what could be improved and why
|
||||||
|
- Suggestion 2: Description of what could be improved and why
|
||||||
|
|
||||||
|
## RISK ASSESSMENT
|
||||||
|
- WITH (NOLOCK) usage: [Yes/No] - If yes, explain the risks
|
||||||
|
- Missing WHERE clause: [Yes/No] - If yes, explain the impact
|
||||||
|
- Implicit conversions: [Yes/No] - If yes, list them
|
||||||
|
|
||||||
|
## SUMMARY
|
||||||
|
Brief summary of the most important findings and priority order for addressing them.
|
||||||
|
|
||||||
|
Remember: DO NOT provide a rewritten query. Only analysis and suggestions.
|
||||||
|
""".strip()
|
||||||
|
|
||||||
_db_path = os.getenv("SQL_OPT_TEAM_DB_FILE", "tmp/sql_optimizer_team.db")
|
_db_path = os.getenv("SQL_OPT_TEAM_DB_FILE", "tmp/sql_optimizer_team.db")
|
||||||
_debug_mode = os.getenv("SQL_OPT_TEAM_DEBUG_MODE", "false").strip().lower() in {"1", "true", "yes", "on"}
|
_debug_mode = os.getenv("SQL_OPT_TEAM_DEBUG_MODE", "false").strip().lower() in {"1", "true", "yes", "on"}
|
||||||
|
|
||||||
|
|
@ -30,52 +77,7 @@ conservative_analysis_agent = Agent(
|
||||||
"- Solicite banco e SQL se não estiverem presentes.",
|
"- Solicite banco e SQL se não estiverem presentes.",
|
||||||
"- Se o usuário fornecer um caminho de arquivo, use load_sql_from_file().",
|
"- Se o usuário fornecer um caminho de arquivo, use load_sql_from_file().",
|
||||||
"- Use a template oficial abaixo para a análise conservadora (sem reescrever a SQL).",
|
"- Use a template oficial abaixo para a análise conservadora (sem reescrever a SQL).",
|
||||||
"""
|
CONSERVATIVE_ANALYSIS_PROMPT,
|
||||||
You are an expert $database_name database analyst and performance specialist.
|
|
||||||
|
|
||||||
Your task is to ANALYZE the SQL query below and provide SUGGESTIONS for improvement.
|
|
||||||
|
|
||||||
⚠️ CRITICAL: You must NOT rewrite or modify the query. Only provide analysis and suggestions.
|
|
||||||
|
|
||||||
$database_name SQL Query:
|
|
||||||
```sql
|
|
||||||
$query
|
|
||||||
```
|
|
||||||
|
|
||||||
Query Complexity Information:
|
|
||||||
- Columns: $column_count
|
|
||||||
- Tables: $table_count
|
|
||||||
- Subqueries: $subquery_count
|
|
||||||
- CASE statements: $case_count
|
|
||||||
- JOINs: $join_count
|
|
||||||
- Complexity Level: $complexity_level
|
|
||||||
|
|
||||||
Provide your analysis in the following structured format:
|
|
||||||
|
|
||||||
## PERFORMANCE ISSUES
|
|
||||||
List each performance issue found, with severity (CRITICAL/HIGH/MEDIUM/LOW):
|
|
||||||
- [SEVERITY] Issue description
|
|
||||||
- [SEVERITY] Issue description
|
|
||||||
|
|
||||||
## SUGGESTED INDEXES
|
|
||||||
List indexes that could improve this query:
|
|
||||||
- CREATE INDEX idx_name ON table(columns) -- Reason
|
|
||||||
|
|
||||||
## OPTIMIZATION SUGGESTIONS
|
|
||||||
List specific suggestions WITHOUT rewriting the query:
|
|
||||||
- Suggestion 1: Description of what could be improved and why
|
|
||||||
- Suggestion 2: Description of what could be improved and why
|
|
||||||
|
|
||||||
## RISK ASSESSMENT
|
|
||||||
- WITH (NOLOCK) usage: [Yes/No] - If yes, explain the risks
|
|
||||||
- Missing WHERE clause: [Yes/No] - If yes, explain the impact
|
|
||||||
- Implicit conversions: [Yes/No] - If yes, list them
|
|
||||||
|
|
||||||
## SUMMARY
|
|
||||||
Brief summary of the most important findings and priority order for addressing them.
|
|
||||||
|
|
||||||
Remember: DO NOT provide a rewritten query. Only analysis and suggestions.
|
|
||||||
""".strip(),
|
|
||||||
"- NÃO reescreva a SQL em hipótese alguma.",
|
"- NÃO reescreva a SQL em hipótese alguma.",
|
||||||
],
|
],
|
||||||
)
|
)
|
||||||
|
|
|
||||||
|
|
@ -1,18 +1,101 @@
|
||||||
from agno.agent import Agent
|
from agno.agent import Agent
|
||||||
from agno.db.sqlite import SqliteDb
|
from agno.db.sqlite import SqliteDb
|
||||||
from sql_optimizer_team.tools.engine.model_selector import get_model
|
from sql_optimizer_team.tools.engine.model_selector import get_model
|
||||||
from sql_optimizer_team.tools.core_tools import explain_query_core
|
|
||||||
from sql_optimizer_team.tools.prompt_tools import supported_databases
|
from sql_optimizer_team.tools.prompt_tools import supported_databases
|
||||||
from sql_optimizer_team.tools.sql_tools import load_sql_from_file, ensure_non_empty
|
from sql_optimizer_team.tools.sql_tools import load_sql_from_file, ensure_non_empty
|
||||||
import os
|
import os
|
||||||
|
|
||||||
base_model = get_model()
|
base_model = get_model()
|
||||||
|
|
||||||
|
SQL_TO_NATURAL_PROMPT = """
|
||||||
|
You are an expert $database_name database analyst and performance specialist. Your task is to translate the SQL query below into a detailed, precise natural-language description that another agent will later use to reconstruct and optimize the query.
|
||||||
|
|
||||||
|
$database_name SQL Query:
|
||||||
|
```sql
|
||||||
|
$query
|
||||||
|
```
|
||||||
|
|
||||||
|
Your explanation must follow these requirements:
|
||||||
|
|
||||||
|
1. **Describe the overall purpose**
|
||||||
|
- Explain clearly what the query is intended to accomplish and why (retrieve data, update rows, aggregate information, validate existence, create structures, etc.).
|
||||||
|
|
||||||
|
2. **List ALL involved database objects**
|
||||||
|
Explicitly list every:
|
||||||
|
- Table
|
||||||
|
- View
|
||||||
|
- CTE (Common Table Expression)
|
||||||
|
- Subquery or derived table
|
||||||
|
- Function
|
||||||
|
- Stored procedure, if referenced
|
||||||
|
- Temporary table
|
||||||
|
- Schema-qualified object
|
||||||
|
Use the exact names as they appear in the query.
|
||||||
|
|
||||||
|
3. **Describe all essential operations**
|
||||||
|
Explicitly state, using exact column names:
|
||||||
|
- Columns retrieved or modified
|
||||||
|
- Join types, join conditions, and which objects participate
|
||||||
|
- Filters and conditions (WHERE, boolean logic, comparisons)
|
||||||
|
- Aggregations (SUM, COUNT, etc.)
|
||||||
|
- Grouping and HAVING clauses
|
||||||
|
- Sorting (ORDER BY)
|
||||||
|
- Window functions
|
||||||
|
- DISTINCT, TOP, LIMIT, OFFSET, pagination
|
||||||
|
- Any $database_name-specific features used$specific_features
|
||||||
|
|
||||||
|
4. **Maintain strict factual accuracy**
|
||||||
|
- Do NOT infer business meaning unless directly implied.
|
||||||
|
- Do NOT rename or paraphrase column names; repeat them exactly.
|
||||||
|
|
||||||
|
5. **Use clear, structured natural language**
|
||||||
|
- Provide a step-by-step explanation that makes every operation and purpose explicit.
|
||||||
|
- The output must be complete enough that the query can be reconstructed.
|
||||||
|
|
||||||
|
6. **⚠️ CRITICAL: Identify Performance Issues**
|
||||||
|
Flag any of these CRITICAL performance problems found in the query:
|
||||||
|
- **NO WHERE CLAUSE** (BE CAREFUL - AVOID FALSE POSITIVES):
|
||||||
|
* ONLY flag if the MAIN/OUTER SELECT has absolutely NO WHERE keyword with filtering conditions
|
||||||
|
* If query HAS 'WHERE' followed by conditions (even old-style JOINs in WHERE), DO NOT flag
|
||||||
|
* Subqueries/EXISTS having WHERE does NOT mean main query has no WHERE
|
||||||
|
* CROSS APPLY/LATERAL with internal WHERE counts as filtered
|
||||||
|
* If truly no WHERE: Flag as CRITICAL (causes FULL TABLE SCAN, no predicate pushdown)
|
||||||
|
- **Non-SARGable patterns**: Functions on indexed columns in WHERE/JOIN (e.g., YEAR(date), UPPER(col))
|
||||||
|
- **Leading wildcards**: LIKE '%value%' patterns that prevent index usage
|
||||||
|
- **Implicit conversions**: Type mismatches in comparisons
|
||||||
|
- **NOLOCK/WITH (NOLOCK) hints**: If query uses WITH (NOLOCK), WITH (nolock), WITH(NOLOCK), (NOLOCK), (nolock) or NOLOCK/nolock (any case) → DO NOT REMOVE, but FLAG as **CRITICAL RISK**: "⚠️ WITH (NOLOCK) reads uncommitted/dirty data - CRITICAL: may cause INCORRECT FINANCIAL VALUES and data inconsistencies in production"
|
||||||
|
$analysis_requirements
|
||||||
|
|
||||||
|
Explanation:
|
||||||
|
""".strip()
|
||||||
|
|
||||||
_db_path = os.getenv("SQL_OPT_TEAM_DB_FILE", "tmp/sql_optimizer_team.db")
|
_db_path = os.getenv("SQL_OPT_TEAM_DB_FILE", "tmp/sql_optimizer_team.db")
|
||||||
_debug_mode = os.getenv("SQL_OPT_TEAM_DEBUG_MODE", "false").strip().lower() in {"1", "true", "yes", "on"}
|
_debug_mode = os.getenv("SQL_OPT_TEAM_DEBUG_MODE", "false").strip().lower() in {"1", "true", "yes", "on"}
|
||||||
|
|
||||||
db = SqliteDb(db_file=_db_path)
|
db = SqliteDb(db_file=_db_path)
|
||||||
|
|
||||||
|
|
||||||
|
async def explain_query_tool(
|
||||||
|
database_type: str,
|
||||||
|
sql: str,
|
||||||
|
provider: str | None = None,
|
||||||
|
model: str | None = None,
|
||||||
|
temperature: float | None = None,
|
||||||
|
max_tokens: int | None = None,
|
||||||
|
api_key: str | None = None,
|
||||||
|
) -> dict[str, str]:
|
||||||
|
from sql_optimizer_team.tools.core_tools import explain_query_core
|
||||||
|
|
||||||
|
return await explain_query_core(
|
||||||
|
database_type=database_type,
|
||||||
|
sql=sql,
|
||||||
|
provider=provider,
|
||||||
|
model=model,
|
||||||
|
temperature=temperature,
|
||||||
|
max_tokens=max_tokens,
|
||||||
|
api_key=api_key,
|
||||||
|
)
|
||||||
|
|
||||||
sql_analyst_agent = Agent(
|
sql_analyst_agent = Agent(
|
||||||
name="SQL Analyst",
|
name="SQL Analyst",
|
||||||
role=(
|
role=(
|
||||||
|
|
@ -20,7 +103,7 @@ sql_analyst_agent = Agent(
|
||||||
"A saída deve seguir exatamente a prompt original (SQL → natural) do projeto oracle-sql-query-optimizer."
|
"A saída deve seguir exatamente a prompt original (SQL → natural) do projeto oracle-sql-query-optimizer."
|
||||||
),
|
),
|
||||||
model=base_model,
|
model=base_model,
|
||||||
tools=[explain_query_core, load_sql_from_file, ensure_non_empty, supported_databases],
|
tools=[explain_query_tool, load_sql_from_file, ensure_non_empty, supported_databases],
|
||||||
markdown=True,
|
markdown=True,
|
||||||
add_history_to_context=True,
|
add_history_to_context=True,
|
||||||
db=db,
|
db=db,
|
||||||
|
|
@ -32,67 +115,7 @@ sql_analyst_agent = Agent(
|
||||||
"- Se o usuário fornecer um caminho de arquivo, use load_sql_from_file().",
|
"- Se o usuário fornecer um caminho de arquivo, use load_sql_from_file().",
|
||||||
"- Preferência: use explain_query_core(database_type, sql) para gerar a explicação via core de negócio.",
|
"- Preferência: use explain_query_core(database_type, sql) para gerar a explicação via core de negócio.",
|
||||||
"- Use a template oficial abaixo para estruturar a explicação (SQL → natural).",
|
"- Use a template oficial abaixo para estruturar a explicação (SQL → natural).",
|
||||||
"""
|
SQL_TO_NATURAL_PROMPT,
|
||||||
You are an expert $database_name database analyst and performance specialist. Your task is to translate the SQL query below into a detailed, precise natural-language description that another agent will later use to reconstruct and optimize the query.
|
|
||||||
|
|
||||||
$database_name SQL Query:
|
|
||||||
```sql
|
|
||||||
$query
|
|
||||||
```
|
|
||||||
|
|
||||||
Your explanation must follow these requirements:
|
|
||||||
|
|
||||||
1. **Describe the overall purpose**
|
|
||||||
- Explain clearly what the query is intended to accomplish and why (retrieve data, update rows, aggregate information, validate existence, create structures, etc.).
|
|
||||||
|
|
||||||
2. **List ALL involved database objects**
|
|
||||||
Explicitly list every:
|
|
||||||
- Table
|
|
||||||
- View
|
|
||||||
- CTE (Common Table Expression)
|
|
||||||
- Subquery or derived table
|
|
||||||
- Function
|
|
||||||
- Stored procedure, if referenced
|
|
||||||
- Temporary table
|
|
||||||
- Schema-qualified object
|
|
||||||
Use the exact names as they appear in the query.
|
|
||||||
|
|
||||||
3. **Describe all essential operations**
|
|
||||||
Explicitly state, using exact column names:
|
|
||||||
- Columns retrieved or modified
|
|
||||||
- Join types, join conditions, and which objects participate
|
|
||||||
- Filters and conditions (WHERE, boolean logic, comparisons)
|
|
||||||
- Aggregations (SUM, COUNT, etc.)
|
|
||||||
- Grouping and HAVING clauses
|
|
||||||
- Sorting (ORDER BY)
|
|
||||||
- Window functions
|
|
||||||
- DISTINCT, TOP, LIMIT, OFFSET, pagination
|
|
||||||
- Any $database_name-specific features used$specific_features
|
|
||||||
|
|
||||||
4. **Maintain strict factual accuracy**
|
|
||||||
- Do NOT infer business meaning unless directly implied.
|
|
||||||
- Do NOT rename or paraphrase column names; repeat them exactly.
|
|
||||||
|
|
||||||
5. **Use clear, structured natural language**
|
|
||||||
- Provide a step-by-step explanation that makes every operation and purpose explicit.
|
|
||||||
- The output must be complete enough that the query can be reconstructed.
|
|
||||||
|
|
||||||
6. **⚠️ CRITICAL: Identify Performance Issues**
|
|
||||||
Flag any of these CRITICAL performance problems found in the query:
|
|
||||||
- **NO WHERE CLAUSE** (BE CAREFUL - AVOID FALSE POSITIVES):
|
|
||||||
* ONLY flag if the MAIN/OUTER SELECT has absolutely NO WHERE keyword with filtering conditions
|
|
||||||
* If query HAS 'WHERE' followed by conditions (even old-style JOINs in WHERE), DO NOT flag
|
|
||||||
* Subqueries/EXISTS having WHERE does NOT mean main query has no WHERE
|
|
||||||
* CROSS APPLY/LATERAL with internal WHERE counts as filtered
|
|
||||||
* If truly no WHERE: Flag as CRITICAL (causes FULL TABLE SCAN, no predicate pushdown)
|
|
||||||
- **Non-SARGable patterns**: Functions on indexed columns in WHERE/JOIN (e.g., YEAR(date), UPPER(col))
|
|
||||||
- **Leading wildcards**: LIKE '%value%' patterns that prevent index usage
|
|
||||||
- **Implicit conversions**: Type mismatches in comparisons
|
|
||||||
- **NOLOCK/WITH (NOLOCK) hints**: If query uses WITH (NOLOCK), WITH (nolock), WITH(NOLOCK), (NOLOCK), (nolock) or NOLOCK/nolock (any case) → DO NOT REMOVE, but FLAG as **CRITICAL RISK**: "⚠️ WITH (NOLOCK) reads uncommitted/dirty data - CRITICAL: may cause INCORRECT FINANCIAL VALUES and data inconsistencies in production"
|
|
||||||
$analysis_requirements
|
|
||||||
|
|
||||||
Explanation:
|
|
||||||
""".strip(),
|
|
||||||
"- Entregue apenas a explicação natural estruturada conforme a prompt; não reescreva a SQL.",
|
"- Entregue apenas a explicação natural estruturada conforme a prompt; não reescreva a SQL.",
|
||||||
"- Identifique problemas críticos de performance conforme a prompt.",
|
"- Identifique problemas críticos de performance conforme a prompt.",
|
||||||
],
|
],
|
||||||
|
|
|
||||||
|
|
@ -1,18 +1,104 @@
|
||||||
from agno.agent import Agent
|
from agno.agent import Agent
|
||||||
from agno.db.sqlite import SqliteDb
|
from agno.db.sqlite import SqliteDb
|
||||||
from sql_optimizer_team.tools.engine.model_selector import get_model
|
from sql_optimizer_team.tools.engine.model_selector import get_model
|
||||||
from sql_optimizer_team.tools.core_tools import optimize_query_core
|
|
||||||
from sql_optimizer_team.tools.prompt_tools import supported_databases
|
from sql_optimizer_team.tools.prompt_tools import supported_databases
|
||||||
from sql_optimizer_team.tools.sql_tools import load_sql_from_file, ensure_non_empty
|
from sql_optimizer_team.tools.sql_tools import load_sql_from_file, ensure_non_empty
|
||||||
import os
|
import os
|
||||||
|
|
||||||
base_model = get_model()
|
base_model = get_model()
|
||||||
|
|
||||||
|
NATURAL_TO_SQL_PROMPT = """
|
||||||
|
You are an expert $database_name SQL developer and query performance specialist.
|
||||||
|
Your task is to write an optimized SQL query based exclusively on the natural-language description provided below.
|
||||||
|
|
||||||
|
Description:
|
||||||
|
$explanation
|
||||||
|
|
||||||
|
⚠️ CRITICAL RULES - READ BEFORE GENERATING SQL:
|
||||||
|
|
||||||
|
1. **PRESERVE ALL BUSINESS LOGIC EXACTLY**
|
||||||
|
- Every CASE WHEN statement must have IDENTICAL conditions and results
|
||||||
|
- Every calculated column must use IDENTICAL formulas
|
||||||
|
- Every subquery must query the SAME tables with SAME filters
|
||||||
|
- Do NOT simplify, merge, or "improve" business logic - even if it looks redundant
|
||||||
|
- If description mentions specific conditions (cd_tp_apolice = 2, etc.), preserve them EXACTLY
|
||||||
|
|
||||||
|
2. **PRESERVE ALL TABLES AND COLUMNS**
|
||||||
|
- Include EVERY table mentioned in the description
|
||||||
|
- Include EVERY column mentioned in the description
|
||||||
|
- Use EXACT column names as described (no renaming)
|
||||||
|
- Use EXACT table aliases as described
|
||||||
|
|
||||||
|
3. **Translate the full described logic into SQL**
|
||||||
|
- Implement all actions, operations, filters, joins, and conditions exactly as stated.
|
||||||
|
- Use every object and column referenced in the description, using their exact names.
|
||||||
|
- If the description mentions specific filter values (e.g., cd_tipo_endosso = 0), use those EXACT values
|
||||||
|
|
||||||
|
4. **Write optimized SQL while preserving semantics**
|
||||||
|
- Apply $database_name best practices for performance.
|
||||||
|
- Use indexing-aware filtering, efficient join strategies, and clear expressions.
|
||||||
|
- Implement aggregations, groupings, window functions, or pagination when described.
|
||||||
|
- Prefer performant constructs commonly recommended for $database_name workloads.
|
||||||
|
- OPTIMIZATION means structure/hints/indexes - NOT changing logic
|
||||||
|
|
||||||
|
5. **Use $database_name-specific syntax and features**
|
||||||
|
- Apply native functions, operators, optimizer behaviors, or hints when appropriate.
|
||||||
|
- Incorporate $specific_requirements if provided.
|
||||||
|
|
||||||
|
6. **Ensure logical fidelity - ZERO TOLERANCE FOR CHANGES**
|
||||||
|
- The SQL must reflect PRECISELY the behavior described
|
||||||
|
- Do NOT add logic not explicitly stated
|
||||||
|
- Do NOT omit any step described
|
||||||
|
- Do NOT infer or assume details beyond what is explicitly stated
|
||||||
|
- Do NOT "simplify" complex CASE statements
|
||||||
|
- Do NOT merge or combine separate calculated columns
|
||||||
|
|
||||||
|
7. **Self-Verification Checklist** (perform before outputting):
|
||||||
|
- [ ] All tables from description are present in query
|
||||||
|
- [ ] All columns from description are present in SELECT
|
||||||
|
- [ ] All CASE conditions match description exactly
|
||||||
|
- [ ] All subquery filters match description exactly
|
||||||
|
- [ ] All JOIN conditions match description exactly
|
||||||
|
- [ ] No business logic was simplified or changed
|
||||||
|
|
||||||
|
8. **Output format**
|
||||||
|
- Provide ONLY the final, optimized SQL query.
|
||||||
|
- Do NOT include explanations, comments, or extra text.
|
||||||
|
|
||||||
|
Optimized SQL Query:
|
||||||
|
""".strip()
|
||||||
|
|
||||||
_db_path = os.getenv("SQL_OPT_TEAM_DB_FILE", "tmp/sql_optimizer_team.db")
|
_db_path = os.getenv("SQL_OPT_TEAM_DB_FILE", "tmp/sql_optimizer_team.db")
|
||||||
_debug_mode = os.getenv("SQL_OPT_TEAM_DEBUG_MODE", "false").strip().lower() in {"1", "true", "yes", "on"}
|
_debug_mode = os.getenv("SQL_OPT_TEAM_DEBUG_MODE", "false").strip().lower() in {"1", "true", "yes", "on"}
|
||||||
|
|
||||||
db = SqliteDb(db_file=_db_path)
|
db = SqliteDb(db_file=_db_path)
|
||||||
|
|
||||||
|
|
||||||
|
async def optimize_query_tool(
|
||||||
|
database_type: str,
|
||||||
|
sql: str,
|
||||||
|
provider: str | None = None,
|
||||||
|
model: str | None = None,
|
||||||
|
temperature: float | None = None,
|
||||||
|
max_tokens: int | None = None,
|
||||||
|
api_key: str | None = None,
|
||||||
|
output_dir: str | None = None,
|
||||||
|
no_review: bool = False,
|
||||||
|
) -> dict[str, str | dict[str, str]]:
|
||||||
|
from sql_optimizer_team.tools.core_tools import optimize_query_core
|
||||||
|
|
||||||
|
return await optimize_query_core(
|
||||||
|
database_type=database_type,
|
||||||
|
sql=sql,
|
||||||
|
provider=provider,
|
||||||
|
model=model,
|
||||||
|
temperature=temperature,
|
||||||
|
max_tokens=max_tokens,
|
||||||
|
api_key=api_key,
|
||||||
|
output_dir=output_dir,
|
||||||
|
no_review=no_review,
|
||||||
|
)
|
||||||
|
|
||||||
sql_optimizer_agent = Agent(
|
sql_optimizer_agent = Agent(
|
||||||
name="SQL Optimizer",
|
name="SQL Optimizer",
|
||||||
role=(
|
role=(
|
||||||
|
|
@ -20,7 +106,7 @@ sql_optimizer_agent = Agent(
|
||||||
"mantendo 100% da lógica e entregando apenas a SQL otimizada."
|
"mantendo 100% da lógica e entregando apenas a SQL otimizada."
|
||||||
),
|
),
|
||||||
model=base_model,
|
model=base_model,
|
||||||
tools=[optimize_query_core, load_sql_from_file, ensure_non_empty, supported_databases],
|
tools=[optimize_query_tool, load_sql_from_file, ensure_non_empty, supported_databases],
|
||||||
markdown=True,
|
markdown=True,
|
||||||
add_history_to_context=True,
|
add_history_to_context=True,
|
||||||
db=db,
|
db=db,
|
||||||
|
|
@ -31,66 +117,7 @@ sql_optimizer_agent = Agent(
|
||||||
"- Exija banco alvo e SQL antes de otimizar.",
|
"- Exija banco alvo e SQL antes de otimizar.",
|
||||||
"- Use optimize_query_core(database_type, sql) para executar o core de negócio.",
|
"- Use optimize_query_core(database_type, sql) para executar o core de negócio.",
|
||||||
"- Use a template oficial abaixo para reescrever (natural → SQL) mantendo 100% da lógica.",
|
"- Use a template oficial abaixo para reescrever (natural → SQL) mantendo 100% da lógica.",
|
||||||
"""
|
NATURAL_TO_SQL_PROMPT,
|
||||||
You are an expert $database_name SQL developer and query performance specialist.
|
|
||||||
Your task is to write an optimized SQL query based exclusively on the natural-language description provided below.
|
|
||||||
|
|
||||||
Description:
|
|
||||||
$explanation
|
|
||||||
|
|
||||||
⚠️ CRITICAL RULES - READ BEFORE GENERATING SQL:
|
|
||||||
|
|
||||||
1. **PRESERVE ALL BUSINESS LOGIC EXACTLY**
|
|
||||||
- Every CASE WHEN statement must have IDENTICAL conditions and results
|
|
||||||
- Every calculated column must use IDENTICAL formulas
|
|
||||||
- Every subquery must query the SAME tables with SAME filters
|
|
||||||
- Do NOT simplify, merge, or "improve" business logic - even if it looks redundant
|
|
||||||
- If description mentions specific conditions (cd_tp_apolice = 2, etc.), preserve them EXACTLY
|
|
||||||
|
|
||||||
2. **PRESERVE ALL TABLES AND COLUMNS**
|
|
||||||
- Include EVERY table mentioned in the description
|
|
||||||
- Include EVERY column mentioned in the description
|
|
||||||
- Use EXACT column names as described (no renaming)
|
|
||||||
- Use EXACT table aliases as described
|
|
||||||
|
|
||||||
3. **Translate the full described logic into SQL**
|
|
||||||
- Implement all actions, operations, filters, joins, and conditions exactly as stated.
|
|
||||||
- Use every object and column referenced in the description, using their exact names.
|
|
||||||
- If the description mentions specific filter values (e.g., cd_tipo_endosso = 0), use those EXACT values
|
|
||||||
|
|
||||||
4. **Write optimized SQL while preserving semantics**
|
|
||||||
- Apply $database_name best practices for performance.
|
|
||||||
- Use indexing-aware filtering, efficient join strategies, and clear expressions.
|
|
||||||
- Implement aggregations, groupings, window functions, or pagination when described.
|
|
||||||
- Prefer performant constructs commonly recommended for $database_name workloads.
|
|
||||||
- OPTIMIZATION means structure/hints/indexes - NOT changing logic
|
|
||||||
|
|
||||||
5. **Use $database_name-specific syntax and features**
|
|
||||||
- Apply native functions, operators, optimizer behaviors, or hints when appropriate.
|
|
||||||
- Incorporate $specific_requirements if provided.
|
|
||||||
|
|
||||||
6. **Ensure logical fidelity - ZERO TOLERANCE FOR CHANGES**
|
|
||||||
- The SQL must reflect PRECISELY the behavior described
|
|
||||||
- Do NOT add logic not explicitly stated
|
|
||||||
- Do NOT omit any step described
|
|
||||||
- Do NOT infer or assume details beyond what is explicitly stated
|
|
||||||
- Do NOT "simplify" complex CASE statements
|
|
||||||
- Do NOT merge or combine separate calculated columns
|
|
||||||
|
|
||||||
7. **Self-Verification Checklist** (perform before outputting):
|
|
||||||
- [ ] All tables from description are present in query
|
|
||||||
- [ ] All columns from description are present in SELECT
|
|
||||||
- [ ] All CASE conditions match description exactly
|
|
||||||
- [ ] All subquery filters match description exactly
|
|
||||||
- [ ] All JOIN conditions match description exactly
|
|
||||||
- [ ] No business logic was simplified or changed
|
|
||||||
|
|
||||||
8. **Output format**
|
|
||||||
- Provide ONLY the final, optimized SQL query.
|
|
||||||
- Do NOT include explanations, comments, or extra text.
|
|
||||||
|
|
||||||
Optimized SQL Query:
|
|
||||||
""".strip(),
|
|
||||||
"- Extraia e devolva SOMENTE optimized_query (sem explicações, sem markdown).",
|
"- Extraia e devolva SOMENTE optimized_query (sem explicações, sem markdown).",
|
||||||
"- Preserve 100% da lógica, colunas, aliases, filtros, joins e subqueries.",
|
"- Preserve 100% da lógica, colunas, aliases, filtros, joins e subqueries.",
|
||||||
],
|
],
|
||||||
|
|
|
||||||
5
src/sql_optimizer_team/knowledge/__init__.py
Normal file
5
src/sql_optimizer_team/knowledge/__init__.py
Normal file
|
|
@ -0,0 +1,5 @@
|
||||||
|
"""Internal knowledge base helpers."""
|
||||||
|
|
||||||
|
from sql_optimizer_team.knowledge.internal_kb import build_internal_knowledge, attach_internal_knowledge
|
||||||
|
|
||||||
|
__all__ = ["build_internal_knowledge", "attach_internal_knowledge"]
|
||||||
100
src/sql_optimizer_team/knowledge/internal_kb.py
Normal file
100
src/sql_optimizer_team/knowledge/internal_kb.py
Normal file
|
|
@ -0,0 +1,100 @@
|
||||||
|
"""Internal KB (RAG) setup for the SQL optimizer team."""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from pathlib import Path
|
||||||
|
import os
|
||||||
|
|
||||||
|
from agno.db.sqlite import SqliteDb
|
||||||
|
from agno.knowledge.knowledge import Knowledge
|
||||||
|
from agno.knowledge.embedder.sentence_transformer import SentenceTransformerEmbedder
|
||||||
|
from agno.vectordb.chroma import ChromaDb
|
||||||
|
|
||||||
|
from sql_optimizer_team.tools.engine.config.logger import get_logger
|
||||||
|
|
||||||
|
logger = get_logger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class InternalKBConfig:
|
||||||
|
kb_path: Path
|
||||||
|
chroma_path: Path
|
||||||
|
embedder_id: str
|
||||||
|
contents_db_file: Path
|
||||||
|
block_external: bool
|
||||||
|
|
||||||
|
|
||||||
|
def _load_config() -> InternalKBConfig:
|
||||||
|
kb_path = Path(os.getenv("SQL_OPT_KB_PATH", "kb")).resolve()
|
||||||
|
chroma_path = Path(os.getenv("SQL_OPT_KB_CHROMA_PATH", "tmp/kb_chroma")).resolve()
|
||||||
|
embedder_id = os.getenv(
|
||||||
|
"SQL_OPT_KB_EMBEDDER_ID",
|
||||||
|
"sentence-transformers/all-MiniLM-L6-v2",
|
||||||
|
).strip()
|
||||||
|
contents_db_file = Path(os.getenv("SQL_OPT_KB_DB_FILE", "tmp/sql_optimizer_kb.db")).resolve()
|
||||||
|
block_external = os.getenv("SQL_OPT_BLOCK_EXTERNAL_TOOLS", "true").strip().lower() in {"1", "true", "yes", "on"}
|
||||||
|
return InternalKBConfig(
|
||||||
|
kb_path=kb_path,
|
||||||
|
chroma_path=chroma_path,
|
||||||
|
embedder_id=embedder_id,
|
||||||
|
contents_db_file=contents_db_file,
|
||||||
|
block_external=block_external,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def build_internal_knowledge() -> Knowledge:
|
||||||
|
config = _load_config()
|
||||||
|
|
||||||
|
if config.block_external:
|
||||||
|
logger.info("External tools blocked for KB", kb_path=str(config.kb_path))
|
||||||
|
|
||||||
|
embedder = SentenceTransformerEmbedder(id=config.embedder_id)
|
||||||
|
vector_db = ChromaDb(
|
||||||
|
name="sql-optimizer-kb",
|
||||||
|
path=str(config.chroma_path),
|
||||||
|
persistent_client=True,
|
||||||
|
embedder=embedder,
|
||||||
|
)
|
||||||
|
contents_db = SqliteDb(db_file=str(config.contents_db_file))
|
||||||
|
|
||||||
|
knowledge = Knowledge(
|
||||||
|
name="internal-sql-kb",
|
||||||
|
description="Base de conhecimento interna para otimização de SQL",
|
||||||
|
vector_db=vector_db,
|
||||||
|
contents_db=contents_db,
|
||||||
|
max_results=6,
|
||||||
|
)
|
||||||
|
|
||||||
|
if not config.kb_path.exists():
|
||||||
|
logger.warning("KB path not found; skipping ingest", kb_path=str(config.kb_path))
|
||||||
|
return knowledge
|
||||||
|
|
||||||
|
if config.block_external and not config.kb_path.is_dir():
|
||||||
|
logger.warning("KB path is not a directory; skipping ingest", kb_path=str(config.kb_path))
|
||||||
|
return knowledge
|
||||||
|
|
||||||
|
try:
|
||||||
|
knowledge.insert(
|
||||||
|
path=str(config.kb_path),
|
||||||
|
include=["**/*.md", "**/*.txt", "**/*.sql", "**/*.pdf"],
|
||||||
|
exclude=["**/.git/**", "**/.venv/**", "**/__pycache__/**"],
|
||||||
|
upsert=True,
|
||||||
|
skip_if_exists=True,
|
||||||
|
)
|
||||||
|
logger.info("KB ingest complete", kb_path=str(config.kb_path))
|
||||||
|
except Exception as exc:
|
||||||
|
logger.error("KB ingest failed", error=str(exc))
|
||||||
|
|
||||||
|
return knowledge
|
||||||
|
|
||||||
|
|
||||||
|
def attach_internal_knowledge(knowledge: Knowledge, *agents: object) -> None:
|
||||||
|
for agent in agents:
|
||||||
|
try:
|
||||||
|
setattr(agent, "knowledge", knowledge)
|
||||||
|
setattr(agent, "add_knowledge_to_context", True)
|
||||||
|
setattr(agent, "search_knowledge", True)
|
||||||
|
setattr(agent, "update_knowledge", False)
|
||||||
|
except Exception as exc:
|
||||||
|
logger.warning("Failed to attach knowledge", agent=str(agent), error=str(exc))
|
||||||
|
|
@ -2,13 +2,12 @@ from agno.team.team import Team
|
||||||
from agno.os.app import AgentOS
|
from agno.os.app import AgentOS
|
||||||
from agno.db.sqlite import SqliteDb
|
from agno.db.sqlite import SqliteDb
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
|
from sql_optimizer_team.knowledge import build_internal_knowledge, attach_internal_knowledge
|
||||||
from sql_optimizer_team.tools.engine.model_selector import get_model
|
from sql_optimizer_team.tools.engine.model_selector import get_model
|
||||||
from sql_optimizer_team.agents import (
|
from sql_optimizer_team.agents.sql_analyst_agent import sql_analyst_agent
|
||||||
sql_analyst_agent,
|
from sql_optimizer_team.agents.sql_optimizer_agent import sql_optimizer_agent
|
||||||
sql_optimizer_agent,
|
from sql_optimizer_team.agents.sql_quality_agent import sql_quality_agent
|
||||||
sql_quality_agent,
|
from sql_optimizer_team.agents.conservative_analysis_agent import conservative_analysis_agent
|
||||||
conservative_analysis_agent,
|
|
||||||
)
|
|
||||||
import os
|
import os
|
||||||
|
|
||||||
load_dotenv()
|
load_dotenv()
|
||||||
|
|
@ -20,6 +19,39 @@ _debug_mode = os.getenv("SQL_OPT_TEAM_DEBUG_MODE", "false").strip().lower() in {
|
||||||
|
|
||||||
db = SqliteDb(db_file=_db_path)
|
db = SqliteDb(db_file=_db_path)
|
||||||
|
|
||||||
|
_kb = build_internal_knowledge()
|
||||||
|
attach_internal_knowledge(
|
||||||
|
_kb,
|
||||||
|
sql_analyst_agent,
|
||||||
|
sql_optimizer_agent,
|
||||||
|
sql_quality_agent,
|
||||||
|
conservative_analysis_agent,
|
||||||
|
)
|
||||||
|
|
||||||
|
_block_external_tools = os.getenv("SQL_OPT_BLOCK_EXTERNAL_TOOLS", "true").strip().lower() in {"1", "true", "yes", "on"}
|
||||||
|
if _block_external_tools:
|
||||||
|
_allowed_tool_names = {
|
||||||
|
"explain_query_tool",
|
||||||
|
"optimize_query_tool",
|
||||||
|
"load_sql_from_file",
|
||||||
|
"ensure_non_empty",
|
||||||
|
"supported_databases",
|
||||||
|
"diff_sql",
|
||||||
|
}
|
||||||
|
|
||||||
|
def _filter_tools(agent) -> None:
|
||||||
|
if not getattr(agent, "tools", None):
|
||||||
|
return
|
||||||
|
filtered = []
|
||||||
|
for tool in agent.tools:
|
||||||
|
name = getattr(tool, "__name__", None) or getattr(tool, "name", None) or str(tool)
|
||||||
|
if name in _allowed_tool_names:
|
||||||
|
filtered.append(tool)
|
||||||
|
agent.tools = filtered
|
||||||
|
|
||||||
|
for _agent in [sql_analyst_agent, sql_optimizer_agent, sql_quality_agent, conservative_analysis_agent]:
|
||||||
|
_filter_tools(_agent)
|
||||||
|
|
||||||
sql_optimizer_team = Team(
|
sql_optimizer_team = Team(
|
||||||
name="SQL Optimization Team",
|
name="SQL Optimization Team",
|
||||||
model=base_model,
|
model=base_model,
|
||||||
|
|
|
||||||
|
|
@ -41,6 +41,7 @@ class AgnoLLMTool(BaseLLMTool):
|
||||||
result_text = self._extract_text(response)
|
result_text = self._extract_text(response)
|
||||||
validated = self._validate_response(result_text)
|
validated = self._validate_response(result_text)
|
||||||
self._log_response(validated)
|
self._log_response(validated)
|
||||||
|
self._log_usage_from_response(response, prompt, validated)
|
||||||
return validated
|
return validated
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
self._log_error(e)
|
self._log_error(e)
|
||||||
|
|
|
||||||
|
|
@ -5,6 +5,8 @@ This module provides a base class with common functionality for all LLM tools.
|
||||||
|
|
||||||
from abc import ABC
|
from abc import ABC
|
||||||
from typing import Any
|
from typing import Any
|
||||||
|
import math
|
||||||
|
import os
|
||||||
|
|
||||||
from sql_optimizer_team.tools.engine.tools_api.llm_tool import LLMTool
|
from sql_optimizer_team.tools.engine.tools_api.llm_tool import LLMTool
|
||||||
from sql_optimizer_team.tools.engine.types.tool_exceptions import LLMProviderError
|
from sql_optimizer_team.tools.engine.types.tool_exceptions import LLMProviderError
|
||||||
|
|
@ -91,6 +93,49 @@ class BaseLLMTool(LLMTool, ABC):
|
||||||
**kwargs,
|
**kwargs,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
def _estimate_tokens(self, text: str) -> int:
|
||||||
|
"""Best-effort token estimate when provider usage is unavailable."""
|
||||||
|
if not text:
|
||||||
|
return 0
|
||||||
|
return max(1, math.ceil(len(text) / 4))
|
||||||
|
|
||||||
|
def _log_usage_from_response(self, response_obj: Any, prompt: str, response_text: str) -> None:
|
||||||
|
"""Log token usage and cost if enabled.
|
||||||
|
|
||||||
|
Reads usage from ModelResponse when available, otherwise uses a rough estimate.
|
||||||
|
Cost is computed using env vars LLM_COST_INPUT_PER_1K and LLM_COST_OUTPUT_PER_1K.
|
||||||
|
"""
|
||||||
|
enabled = os.getenv("LLM_LOG_USAGE", "true").strip().lower() in {"1", "true", "yes", "on"}
|
||||||
|
if not enabled:
|
||||||
|
return
|
||||||
|
|
||||||
|
input_tokens = getattr(response_obj, "input_tokens", None)
|
||||||
|
output_tokens = getattr(response_obj, "output_tokens", None)
|
||||||
|
total_tokens = getattr(response_obj, "total_tokens", None)
|
||||||
|
|
||||||
|
if input_tokens is None:
|
||||||
|
input_tokens = self._estimate_tokens(prompt)
|
||||||
|
if output_tokens is None:
|
||||||
|
output_tokens = self._estimate_tokens(response_text)
|
||||||
|
if total_tokens is None and input_tokens is not None and output_tokens is not None:
|
||||||
|
total_tokens = input_tokens + output_tokens
|
||||||
|
|
||||||
|
cost_in = float(os.getenv("LLM_COST_INPUT_PER_1K", "0") or 0)
|
||||||
|
cost_out = float(os.getenv("LLM_COST_OUTPUT_PER_1K", "0") or 0)
|
||||||
|
cost_usd = None
|
||||||
|
if input_tokens is not None or output_tokens is not None:
|
||||||
|
cost_usd = (input_tokens or 0) * cost_in / 1000 + (output_tokens or 0) * cost_out / 1000
|
||||||
|
|
||||||
|
logger.info(
|
||||||
|
"LLM usage",
|
||||||
|
provider=self.provider_name,
|
||||||
|
model=self._model_name,
|
||||||
|
input_tokens=input_tokens,
|
||||||
|
output_tokens=output_tokens,
|
||||||
|
total_tokens=total_tokens,
|
||||||
|
cost_usd=cost_usd,
|
||||||
|
)
|
||||||
|
|
||||||
def _log_error(self, error: Exception, **kwargs: Any) -> None:
|
def _log_error(self, error: Exception, **kwargs: Any) -> None:
|
||||||
"""Log LLM error.
|
"""Log LLM error.
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -7,185 +7,17 @@ reducing code duplication and ensuring consistency.
|
||||||
from abc import ABC, abstractmethod
|
from abc import ABC, abstractmethod
|
||||||
|
|
||||||
from string import Template
|
from string import Template
|
||||||
|
import importlib
|
||||||
|
|
||||||
from sql_optimizer_team.tools.engine.tools_api.prompt_tool import PromptGeneratorTool
|
from sql_optimizer_team.tools.engine.tools_api.prompt_tool import PromptGeneratorTool
|
||||||
|
|
||||||
|
|
||||||
SQL_TO_NATURAL_TEMPLATE = Template("""
|
|
||||||
You are an expert $database_name database analyst and performance specialist. Your task is to translate the SQL query below into a detailed, precise natural-language description that another agent will later use to reconstruct and optimize the query.
|
|
||||||
|
|
||||||
$database_name SQL Query:
|
|
||||||
```sql
|
|
||||||
$query
|
|
||||||
```
|
|
||||||
|
|
||||||
Your explanation must follow these requirements:
|
|
||||||
|
|
||||||
1. **Describe the overall purpose**
|
|
||||||
- Explain clearly what the query is intended to accomplish and why (retrieve data, update rows, aggregate information, validate existence, create structures, etc.).
|
|
||||||
|
|
||||||
2. **List ALL involved database objects**
|
|
||||||
Explicitly list every:
|
|
||||||
- Table
|
|
||||||
- View
|
|
||||||
- CTE (Common Table Expression)
|
|
||||||
- Subquery or derived table
|
|
||||||
- Function
|
|
||||||
- Stored procedure, if referenced
|
|
||||||
- Temporary table
|
|
||||||
- Schema-qualified object
|
|
||||||
Use the exact names as they appear in the query.
|
|
||||||
|
|
||||||
3. **Describe all essential operations**
|
|
||||||
Explicitly state, using exact column names:
|
|
||||||
- Columns retrieved or modified
|
|
||||||
- Join types, join conditions, and which objects participate
|
|
||||||
- Filters and conditions (WHERE, boolean logic, comparisons)
|
|
||||||
- Aggregations (SUM, COUNT, etc.)
|
|
||||||
- Grouping and HAVING clauses
|
|
||||||
- Sorting (ORDER BY)
|
|
||||||
- Window functions
|
|
||||||
- DISTINCT, TOP, LIMIT, OFFSET, pagination
|
|
||||||
- Any $database_name-specific features used$specific_features
|
|
||||||
|
|
||||||
4. **Maintain strict factual accuracy**
|
|
||||||
- Do NOT infer business meaning unless directly implied.
|
|
||||||
- Do NOT rename or paraphrase column names; repeat them exactly.
|
|
||||||
|
|
||||||
5. **Use clear, structured natural language**
|
|
||||||
- Provide a step-by-step explanation that makes every operation and purpose explicit.
|
|
||||||
- The output must be complete enough that the query can be reconstructed.
|
|
||||||
|
|
||||||
6. **⚠️ CRITICAL: Identify Performance Issues**
|
|
||||||
Flag any of these CRITICAL performance problems found in the query:
|
|
||||||
- **NO WHERE CLAUSE** (BE CAREFUL - AVOID FALSE POSITIVES):
|
|
||||||
* ONLY flag if the MAIN/OUTER SELECT has absolutely NO WHERE keyword with filtering conditions
|
|
||||||
* If query HAS 'WHERE' followed by conditions (even old-style JOINs in WHERE), DO NOT flag
|
|
||||||
* Subqueries/EXISTS having WHERE does NOT mean main query has no WHERE
|
|
||||||
* CROSS APPLY/LATERAL with internal WHERE counts as filtered
|
|
||||||
* If truly no WHERE: Flag as CRITICAL (causes FULL TABLE SCAN, no predicate pushdown)
|
|
||||||
- **Non-SARGable patterns**: Functions on indexed columns in WHERE/JOIN (e.g., YEAR(date), UPPER(col))
|
|
||||||
- **Leading wildcards**: LIKE '%value%' patterns that prevent index usage
|
|
||||||
- **Implicit conversions**: Type mismatches in comparisons
|
|
||||||
- **NOLOCK/WITH (NOLOCK) hints**: If query uses WITH (NOLOCK), WITH (nolock), WITH(NOLOCK), (NOLOCK), (nolock) or NOLOCK/nolock (any case) → DO NOT REMOVE, but FLAG as **CRITICAL RISK**: "⚠️ WITH (NOLOCK) reads uncommitted/dirty data - CRITICAL: may cause INCORRECT FINANCIAL VALUES and data inconsistencies in production"
|
|
||||||
$analysis_requirements
|
|
||||||
|
|
||||||
Explanation:
|
|
||||||
""")
|
|
||||||
|
|
||||||
NATURAL_TO_SQL_TEMPLATE = Template("""
|
|
||||||
You are an expert $database_name SQL developer and query performance specialist.
|
|
||||||
Your task is to write an optimized SQL query based exclusively on the natural-language description provided below.
|
|
||||||
|
|
||||||
Description:
|
|
||||||
$explanation
|
|
||||||
|
|
||||||
⚠️ CRITICAL RULES - READ BEFORE GENERATING SQL:
|
|
||||||
|
|
||||||
1. **PRESERVE ALL BUSINESS LOGIC EXACTLY**
|
|
||||||
- Every CASE WHEN statement must have IDENTICAL conditions and results
|
|
||||||
- Every calculated column must use IDENTICAL formulas
|
|
||||||
- Every subquery must query the SAME tables with SAME filters
|
|
||||||
- Do NOT simplify, merge, or "improve" business logic - even if it looks redundant
|
|
||||||
- If description mentions specific conditions (cd_tp_apolice = 2, etc.), preserve them EXACTLY
|
|
||||||
|
|
||||||
2. **PRESERVE ALL TABLES AND COLUMNS**
|
|
||||||
- Include EVERY table mentioned in the description
|
|
||||||
- Include EVERY column mentioned in the description
|
|
||||||
- Use EXACT column names as described (no renaming)
|
|
||||||
- Use EXACT table aliases as described
|
|
||||||
|
|
||||||
3. **Translate the full described logic into SQL**
|
|
||||||
- Implement all actions, operations, filters, joins, and conditions exactly as stated.
|
|
||||||
- Use every object and column referenced in the description, using their exact names.
|
|
||||||
- If the description mentions specific filter values (e.g., cd_tipo_endosso = 0), use those EXACT values
|
|
||||||
|
|
||||||
4. **Write optimized SQL while preserving semantics**
|
|
||||||
- Apply $database_name best practices for performance.
|
|
||||||
- Use indexing-aware filtering, efficient join strategies, and clear expressions.
|
|
||||||
- Implement aggregations, groupings, window functions, or pagination when described.
|
|
||||||
- Prefer performant constructs commonly recommended for $database_name workloads.
|
|
||||||
- OPTIMIZATION means structure/hints/indexes - NOT changing logic
|
|
||||||
|
|
||||||
5. **Use $database_name-specific syntax and features**
|
|
||||||
- Apply native functions, operators, optimizer behaviors, or hints when appropriate.
|
|
||||||
- Incorporate $specific_requirements if provided.
|
|
||||||
|
|
||||||
6. **Ensure logical fidelity - ZERO TOLERANCE FOR CHANGES**
|
|
||||||
- The SQL must reflect PRECISELY the behavior described
|
|
||||||
- Do NOT add logic not explicitly stated
|
|
||||||
- Do NOT omit any step described
|
|
||||||
- Do NOT infer or assume details beyond what is explicitly stated
|
|
||||||
- Do NOT "simplify" complex CASE statements
|
|
||||||
- Do NOT merge or combine separate calculated columns
|
|
||||||
|
|
||||||
7. **Self-Verification Checklist** (perform before outputting):
|
|
||||||
- [ ] All tables from description are present in query
|
|
||||||
- [ ] All columns from description are present in SELECT
|
|
||||||
- [ ] All CASE conditions match description exactly
|
|
||||||
- [ ] All subquery filters match description exactly
|
|
||||||
- [ ] All JOIN conditions match description exactly
|
|
||||||
- [ ] No business logic was simplified or changed
|
|
||||||
|
|
||||||
8. **Output format**
|
|
||||||
- Provide ONLY the final, optimized SQL query.
|
|
||||||
- Do NOT include explanations, comments, or extra text.
|
|
||||||
|
|
||||||
Optimized SQL Query:
|
|
||||||
""")
|
|
||||||
|
|
||||||
CONSERVATIVE_ANALYSIS_TEMPLATE = Template("""
|
|
||||||
You are an expert $database_name database analyst and performance specialist.
|
|
||||||
|
|
||||||
Your task is to ANALYZE the SQL query below and provide SUGGESTIONS for improvement.
|
|
||||||
|
|
||||||
⚠️ CRITICAL: You must NOT rewrite or modify the query. Only provide analysis and suggestions.
|
|
||||||
|
|
||||||
$database_name SQL Query:
|
|
||||||
```sql
|
|
||||||
$query
|
|
||||||
```
|
|
||||||
|
|
||||||
Query Complexity Information:
|
|
||||||
- Columns: $column_count
|
|
||||||
- Tables: $table_count
|
|
||||||
- Subqueries: $subquery_count
|
|
||||||
- CASE statements: $case_count
|
|
||||||
- JOINs: $join_count
|
|
||||||
- Complexity Level: $complexity_level
|
|
||||||
|
|
||||||
Provide your analysis in the following structured format:
|
|
||||||
|
|
||||||
## PERFORMANCE ISSUES
|
|
||||||
List each performance issue found, with severity (CRITICAL/HIGH/MEDIUM/LOW):
|
|
||||||
- [SEVERITY] Issue description
|
|
||||||
- [SEVERITY] Issue description
|
|
||||||
|
|
||||||
## SUGGESTED INDEXES
|
|
||||||
List indexes that could improve this query:
|
|
||||||
- CREATE INDEX idx_name ON table(columns) -- Reason
|
|
||||||
|
|
||||||
## OPTIMIZATION SUGGESTIONS
|
|
||||||
List specific suggestions WITHOUT rewriting the query:
|
|
||||||
- Suggestion 1: Description of what could be improved and why
|
|
||||||
- Suggestion 2: Description of what could be improved and why
|
|
||||||
|
|
||||||
## RISK ASSESSMENT
|
|
||||||
- WITH (NOLOCK) usage: [Yes/No] - If yes, explain the risks
|
|
||||||
- Missing WHERE clause: [Yes/No] - If yes, explain the impact
|
|
||||||
- Implicit conversions: [Yes/No] - If yes, list them
|
|
||||||
|
|
||||||
## SUMMARY
|
|
||||||
Brief summary of the most important findings and priority order for addressing them.
|
|
||||||
|
|
||||||
Remember: DO NOT provide a rewritten query. Only analysis and suggestions.
|
|
||||||
""")
|
|
||||||
|
|
||||||
|
|
||||||
def _render_sql_to_natural(
|
def _render_sql_to_natural(
|
||||||
database_name: str, query: str, specific_features: str = "", analysis_requirements: str = ""
|
database_name: str, query: str, specific_features: str = "", analysis_requirements: str = ""
|
||||||
) -> str:
|
) -> str:
|
||||||
return SQL_TO_NATURAL_TEMPLATE.substitute(
|
module = importlib.import_module("sql_optimizer_team.agents.sql_analyst_agent")
|
||||||
|
template_text = getattr(module, "SQL_TO_NATURAL_PROMPT")
|
||||||
|
return Template(template_text).substitute(
|
||||||
database_name=database_name,
|
database_name=database_name,
|
||||||
query=query,
|
query=query,
|
||||||
specific_features=f"\n{specific_features}" if specific_features else "",
|
specific_features=f"\n{specific_features}" if specific_features else "",
|
||||||
|
|
@ -196,7 +28,9 @@ def _render_sql_to_natural(
|
||||||
def _render_natural_to_sql(
|
def _render_natural_to_sql(
|
||||||
database_name: str, explanation: str, specific_requirements: str
|
database_name: str, explanation: str, specific_requirements: str
|
||||||
) -> str:
|
) -> str:
|
||||||
return NATURAL_TO_SQL_TEMPLATE.substitute(
|
module = importlib.import_module("sql_optimizer_team.agents.sql_optimizer_agent")
|
||||||
|
template_text = getattr(module, "NATURAL_TO_SQL_PROMPT")
|
||||||
|
return Template(template_text).substitute(
|
||||||
database_name=database_name,
|
database_name=database_name,
|
||||||
explanation=explanation,
|
explanation=explanation,
|
||||||
specific_requirements="\n".join(
|
specific_requirements="\n".join(
|
||||||
|
|
@ -215,7 +49,9 @@ def _render_conservative_analysis(
|
||||||
join_count: int = 0,
|
join_count: int = 0,
|
||||||
complexity_level: str = "unknown",
|
complexity_level: str = "unknown",
|
||||||
) -> str:
|
) -> str:
|
||||||
return CONSERVATIVE_ANALYSIS_TEMPLATE.substitute(
|
module = importlib.import_module("sql_optimizer_team.agents.conservative_analysis_agent")
|
||||||
|
template_text = getattr(module, "CONSERVATIVE_ANALYSIS_PROMPT")
|
||||||
|
return Template(template_text).substitute(
|
||||||
database_name=database_name,
|
database_name=database_name,
|
||||||
query=query,
|
query=query,
|
||||||
column_count=column_count,
|
column_count=column_count,
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue