feat: Enhance SQL optimization tools with internal knowledge base and observability features
- Updated README.md to include new setup instructions for RAG and observability. - Added internal knowledge base (KB) setup for SQL optimization team, supporting various document types. - Implemented token usage logging in LLM tools to track costs and usage. - Refactored SQL analysis and optimization prompts for clarity and consistency. - Introduced filtering of external tools based on environment configuration. - Enhanced conservative analysis agent with structured prompt for performance suggestions. - Updated requirements.txt to include new dependencies for RAG functionality. - Added internal KB helpers for building and attaching knowledge to agents.
This commit is contained in:
parent
c6dd91810b
commit
80d1f9d26a
14 changed files with 502 additions and 355 deletions
38
README.md
38
README.md
|
|
@ -25,15 +25,31 @@ src/
|
|||
|
||||
1) Crie o ambiente e instale dependências:
|
||||
- `pip install -r requirements.txt`
|
||||
2) Configure variáveis de ambiente (exemplo em `sample.env`).
|
||||
2) Configure variáveis de ambiente (exemplo em `sample.env` ou `.env`).
|
||||
3) Execute o servidor:
|
||||
- `PYTHONPATH=src python -m main`
|
||||
- `./scripts/start.sh`
|
||||
|
||||
Acesse:
|
||||
|
||||
- `http://localhost:8204/docs` (Swagger UI)
|
||||
- `http://localhost:8204` (informações básicas da API)
|
||||
|
||||
## UI local (Agent UI)
|
||||
|
||||
Use o **Agent UI** (agno-agi/agent-ui) como front local:
|
||||
|
||||
1) Instale com o script oficial:
|
||||
|
||||
- `npx create-agent-ui@latest`
|
||||
|
||||
1) Inicie a UI:
|
||||
|
||||
- `pnpm dev`
|
||||
|
||||
1) Abra `http://localhost:3000` e ajuste o endpoint para `http://localhost:8204`.
|
||||
|
||||
Opcional: se o AgentOS usar autenticação, configure `OS_SECURITY_KEY` conforme o README do Agent UI.
|
||||
|
||||
## Fluxo do time
|
||||
|
||||
1) **Gestor** recebe a requisição e valida o contexto (banco + SQL).
|
||||
|
|
@ -43,7 +59,23 @@ Acesse:
|
|||
5) **Conservative Analyst** (se solicitado) gera análise sem reescrever a query.
|
||||
6) **Gestor** consolida e entrega.
|
||||
|
||||
## RAG (KB interna)
|
||||
|
||||
- Coloque documentos em `kb/` (md/txt/sql/pdf).
|
||||
- O RAG local usa Chroma + SentenceTransformers.
|
||||
- Variáveis principais:
|
||||
- `SQL_OPT_KB_PATH`, `SQL_OPT_KB_CHROMA_PATH`, `SQL_OPT_KB_DB_FILE`
|
||||
- `SQL_OPT_KB_EMBEDDER_ID`
|
||||
- `SQL_OPT_BLOCK_EXTERNAL_TOOLS=true` bloqueia ferramentas externas.
|
||||
|
||||
## Observabilidade de tokens/custos
|
||||
|
||||
- Ative com `LLM_LOG_USAGE=true`.
|
||||
- Defina preços (USD por 1K tokens) com:
|
||||
- `LLM_COST_INPUT_PER_1K`
|
||||
- `LLM_COST_OUTPUT_PER_1K`
|
||||
|
||||
## Observações
|
||||
|
||||
- Use o modelo configurado em variáveis de ambiente (ex.: OpenAI, Gemini, Groq, etc.).
|
||||
- Use o provedor configurado em `.env` (ex.: Ollama local, OpenAI, Gemini, Groq, etc.).
|
||||
- O time é colaborativo e mantém histórico em SQLite (configurável via env).
|
||||
|
|
|
|||
|
|
@ -124,6 +124,11 @@ Recomendação de ferramentas de mercado:
|
|||
- **Langfuse** ou **Phoenix** para rastreio de prompts, custos e latência.
|
||||
- **Grafana/Prometheus** para dashboards executivos.
|
||||
|
||||
Status no POC:
|
||||
|
||||
- **Logging de tokens/custos** já implementado via `LLM_LOG_USAGE` e custos por 1K tokens.
|
||||
- Métricas persistentes e dashboards (Grafana/Prometheus) permanecem como evolução.
|
||||
|
||||
Métricas mínimas:
|
||||
|
||||
- Tokens por request e por área.
|
||||
|
|
@ -150,6 +155,11 @@ Métricas mínimas:
|
|||
- Curadoria contínua com feedback dos times para melhorar a relevância.
|
||||
- **Aumento de precisão**: respostas consistentes com políticas internas e padrões técnicos.
|
||||
|
||||
Status no POC:
|
||||
|
||||
- **RAG local** com base interna em `kb/` usando Chroma + SentenceTransformers.
|
||||
- **Bloqueio de ferramentas externas** por padrão via `SQL_OPT_BLOCK_EXTERNAL_TOOLS=true`.
|
||||
|
||||
## 10) Stack definitiva (100% Agno)
|
||||
|
||||
- **Agno** como framework único para orquestração, memória e tools.
|
||||
|
|
|
|||
18
kb/README.md
Normal file
18
kb/README.md
Normal file
|
|
@ -0,0 +1,18 @@
|
|||
# Base de Conhecimento Interna (KB)
|
||||
|
||||
Coloque aqui documentos internos que devem ser usados no RAG.
|
||||
|
||||
Suportado (por padrão):
|
||||
|
||||
- Markdown (.md)
|
||||
- Texto (.txt)
|
||||
- SQL (.sql)
|
||||
- PDF (.pdf)
|
||||
|
||||
Configurações via ambiente:
|
||||
|
||||
- SQL_OPT_KB_PATH (padrão: kb)
|
||||
- SQL_OPT_KB_CHROMA_PATH (padrão: tmp/kb_chroma)
|
||||
- SQL_OPT_KB_EMBEDDER_ID (padrão: sentence-transformers/all-MiniLM-L6-v2)
|
||||
- SQL_OPT_KB_DB_FILE (padrão: tmp/sql_optimizer_kb.db)
|
||||
- SQL_OPT_BLOCK_EXTERNAL_TOOLS (padrão: true)
|
||||
|
|
@ -30,3 +30,7 @@ oracledb==3.4.1
|
|||
pymssql==2.3.11
|
||||
sqlparse==0.5.5
|
||||
sqlglot==28.6.0
|
||||
|
||||
# RAG (local KB)
|
||||
chromadb==0.6.3
|
||||
sentence-transformers==3.4.1
|
||||
|
|
|
|||
12
sample.env
12
sample.env
|
|
@ -15,3 +15,15 @@
|
|||
# SQL Optimizer Team
|
||||
SQL_OPT_TEAM_DB_FILE=tmp/sql_optimizer_team.db
|
||||
SQL_OPT_TEAM_DEBUG_MODE=false
|
||||
|
||||
# Observabilidade de tokens/custos
|
||||
LLM_LOG_USAGE=true
|
||||
LLM_COST_INPUT_PER_1K=0
|
||||
LLM_COST_OUTPUT_PER_1K=0
|
||||
|
||||
# RAG / KB interna
|
||||
SQL_OPT_KB_PATH=kb
|
||||
SQL_OPT_KB_CHROMA_PATH=tmp/kb_chroma
|
||||
SQL_OPT_KB_DB_FILE=tmp/sql_optimizer_kb.db
|
||||
SQL_OPT_KB_EMBEDDER_ID=sentence-transformers/all-MiniLM-L6-v2
|
||||
SQL_OPT_BLOCK_EXTERNAL_TOOLS=true
|
||||
|
|
|
|||
|
|
@ -7,6 +7,53 @@ import os
|
|||
|
||||
base_model = get_model()
|
||||
|
||||
CONSERVATIVE_ANALYSIS_PROMPT = """
|
||||
You are an expert $database_name database analyst and performance specialist.
|
||||
|
||||
Your task is to ANALYZE the SQL query below and provide SUGGESTIONS for improvement.
|
||||
|
||||
⚠️ CRITICAL: You must NOT rewrite or modify the query. Only provide analysis and suggestions.
|
||||
|
||||
$database_name SQL Query:
|
||||
```sql
|
||||
$query
|
||||
```
|
||||
|
||||
Query Complexity Information:
|
||||
- Columns: $column_count
|
||||
- Tables: $table_count
|
||||
- Subqueries: $subquery_count
|
||||
- CASE statements: $case_count
|
||||
- JOINs: $join_count
|
||||
- Complexity Level: $complexity_level
|
||||
|
||||
Provide your analysis in the following structured format:
|
||||
|
||||
## PERFORMANCE ISSUES
|
||||
List each performance issue found, with severity (CRITICAL/HIGH/MEDIUM/LOW):
|
||||
- [SEVERITY] Issue description
|
||||
- [SEVERITY] Issue description
|
||||
|
||||
## SUGGESTED INDEXES
|
||||
List indexes that could improve this query:
|
||||
- CREATE INDEX idx_name ON table(columns) -- Reason
|
||||
|
||||
## OPTIMIZATION SUGGESTIONS
|
||||
List specific suggestions WITHOUT rewriting the query:
|
||||
- Suggestion 1: Description of what could be improved and why
|
||||
- Suggestion 2: Description of what could be improved and why
|
||||
|
||||
## RISK ASSESSMENT
|
||||
- WITH (NOLOCK) usage: [Yes/No] - If yes, explain the risks
|
||||
- Missing WHERE clause: [Yes/No] - If yes, explain the impact
|
||||
- Implicit conversions: [Yes/No] - If yes, list them
|
||||
|
||||
## SUMMARY
|
||||
Brief summary of the most important findings and priority order for addressing them.
|
||||
|
||||
Remember: DO NOT provide a rewritten query. Only analysis and suggestions.
|
||||
""".strip()
|
||||
|
||||
_db_path = os.getenv("SQL_OPT_TEAM_DB_FILE", "tmp/sql_optimizer_team.db")
|
||||
_debug_mode = os.getenv("SQL_OPT_TEAM_DEBUG_MODE", "false").strip().lower() in {"1", "true", "yes", "on"}
|
||||
|
||||
|
|
@ -30,52 +77,7 @@ conservative_analysis_agent = Agent(
|
|||
"- Solicite banco e SQL se não estiverem presentes.",
|
||||
"- Se o usuário fornecer um caminho de arquivo, use load_sql_from_file().",
|
||||
"- Use a template oficial abaixo para a análise conservadora (sem reescrever a SQL).",
|
||||
"""
|
||||
You are an expert $database_name database analyst and performance specialist.
|
||||
|
||||
Your task is to ANALYZE the SQL query below and provide SUGGESTIONS for improvement.
|
||||
|
||||
⚠️ CRITICAL: You must NOT rewrite or modify the query. Only provide analysis and suggestions.
|
||||
|
||||
$database_name SQL Query:
|
||||
```sql
|
||||
$query
|
||||
```
|
||||
|
||||
Query Complexity Information:
|
||||
- Columns: $column_count
|
||||
- Tables: $table_count
|
||||
- Subqueries: $subquery_count
|
||||
- CASE statements: $case_count
|
||||
- JOINs: $join_count
|
||||
- Complexity Level: $complexity_level
|
||||
|
||||
Provide your analysis in the following structured format:
|
||||
|
||||
## PERFORMANCE ISSUES
|
||||
List each performance issue found, with severity (CRITICAL/HIGH/MEDIUM/LOW):
|
||||
- [SEVERITY] Issue description
|
||||
- [SEVERITY] Issue description
|
||||
|
||||
## SUGGESTED INDEXES
|
||||
List indexes that could improve this query:
|
||||
- CREATE INDEX idx_name ON table(columns) -- Reason
|
||||
|
||||
## OPTIMIZATION SUGGESTIONS
|
||||
List specific suggestions WITHOUT rewriting the query:
|
||||
- Suggestion 1: Description of what could be improved and why
|
||||
- Suggestion 2: Description of what could be improved and why
|
||||
|
||||
## RISK ASSESSMENT
|
||||
- WITH (NOLOCK) usage: [Yes/No] - If yes, explain the risks
|
||||
- Missing WHERE clause: [Yes/No] - If yes, explain the impact
|
||||
- Implicit conversions: [Yes/No] - If yes, list them
|
||||
|
||||
## SUMMARY
|
||||
Brief summary of the most important findings and priority order for addressing them.
|
||||
|
||||
Remember: DO NOT provide a rewritten query. Only analysis and suggestions.
|
||||
""".strip(),
|
||||
CONSERVATIVE_ANALYSIS_PROMPT,
|
||||
"- NÃO reescreva a SQL em hipótese alguma.",
|
||||
],
|
||||
)
|
||||
|
|
|
|||
|
|
@ -1,18 +1,101 @@
|
|||
from agno.agent import Agent
|
||||
from agno.db.sqlite import SqliteDb
|
||||
from sql_optimizer_team.tools.engine.model_selector import get_model
|
||||
from sql_optimizer_team.tools.core_tools import explain_query_core
|
||||
from sql_optimizer_team.tools.prompt_tools import supported_databases
|
||||
from sql_optimizer_team.tools.sql_tools import load_sql_from_file, ensure_non_empty
|
||||
import os
|
||||
|
||||
base_model = get_model()
|
||||
|
||||
SQL_TO_NATURAL_PROMPT = """
|
||||
You are an expert $database_name database analyst and performance specialist. Your task is to translate the SQL query below into a detailed, precise natural-language description that another agent will later use to reconstruct and optimize the query.
|
||||
|
||||
$database_name SQL Query:
|
||||
```sql
|
||||
$query
|
||||
```
|
||||
|
||||
Your explanation must follow these requirements:
|
||||
|
||||
1. **Describe the overall purpose**
|
||||
- Explain clearly what the query is intended to accomplish and why (retrieve data, update rows, aggregate information, validate existence, create structures, etc.).
|
||||
|
||||
2. **List ALL involved database objects**
|
||||
Explicitly list every:
|
||||
- Table
|
||||
- View
|
||||
- CTE (Common Table Expression)
|
||||
- Subquery or derived table
|
||||
- Function
|
||||
- Stored procedure, if referenced
|
||||
- Temporary table
|
||||
- Schema-qualified object
|
||||
Use the exact names as they appear in the query.
|
||||
|
||||
3. **Describe all essential operations**
|
||||
Explicitly state, using exact column names:
|
||||
- Columns retrieved or modified
|
||||
- Join types, join conditions, and which objects participate
|
||||
- Filters and conditions (WHERE, boolean logic, comparisons)
|
||||
- Aggregations (SUM, COUNT, etc.)
|
||||
- Grouping and HAVING clauses
|
||||
- Sorting (ORDER BY)
|
||||
- Window functions
|
||||
- DISTINCT, TOP, LIMIT, OFFSET, pagination
|
||||
- Any $database_name-specific features used$specific_features
|
||||
|
||||
4. **Maintain strict factual accuracy**
|
||||
- Do NOT infer business meaning unless directly implied.
|
||||
- Do NOT rename or paraphrase column names; repeat them exactly.
|
||||
|
||||
5. **Use clear, structured natural language**
|
||||
- Provide a step-by-step explanation that makes every operation and purpose explicit.
|
||||
- The output must be complete enough that the query can be reconstructed.
|
||||
|
||||
6. **⚠️ CRITICAL: Identify Performance Issues**
|
||||
Flag any of these CRITICAL performance problems found in the query:
|
||||
- **NO WHERE CLAUSE** (BE CAREFUL - AVOID FALSE POSITIVES):
|
||||
* ONLY flag if the MAIN/OUTER SELECT has absolutely NO WHERE keyword with filtering conditions
|
||||
* If query HAS 'WHERE' followed by conditions (even old-style JOINs in WHERE), DO NOT flag
|
||||
* Subqueries/EXISTS having WHERE does NOT mean main query has no WHERE
|
||||
* CROSS APPLY/LATERAL with internal WHERE counts as filtered
|
||||
* If truly no WHERE: Flag as CRITICAL (causes FULL TABLE SCAN, no predicate pushdown)
|
||||
- **Non-SARGable patterns**: Functions on indexed columns in WHERE/JOIN (e.g., YEAR(date), UPPER(col))
|
||||
- **Leading wildcards**: LIKE '%value%' patterns that prevent index usage
|
||||
- **Implicit conversions**: Type mismatches in comparisons
|
||||
- **NOLOCK/WITH (NOLOCK) hints**: If query uses WITH (NOLOCK), WITH (nolock), WITH(NOLOCK), (NOLOCK), (nolock) or NOLOCK/nolock (any case) → DO NOT REMOVE, but FLAG as **CRITICAL RISK**: "⚠️ WITH (NOLOCK) reads uncommitted/dirty data - CRITICAL: may cause INCORRECT FINANCIAL VALUES and data inconsistencies in production"
|
||||
$analysis_requirements
|
||||
|
||||
Explanation:
|
||||
""".strip()
|
||||
|
||||
_db_path = os.getenv("SQL_OPT_TEAM_DB_FILE", "tmp/sql_optimizer_team.db")
|
||||
_debug_mode = os.getenv("SQL_OPT_TEAM_DEBUG_MODE", "false").strip().lower() in {"1", "true", "yes", "on"}
|
||||
|
||||
db = SqliteDb(db_file=_db_path)
|
||||
|
||||
|
||||
async def explain_query_tool(
|
||||
database_type: str,
|
||||
sql: str,
|
||||
provider: str | None = None,
|
||||
model: str | None = None,
|
||||
temperature: float | None = None,
|
||||
max_tokens: int | None = None,
|
||||
api_key: str | None = None,
|
||||
) -> dict[str, str]:
|
||||
from sql_optimizer_team.tools.core_tools import explain_query_core
|
||||
|
||||
return await explain_query_core(
|
||||
database_type=database_type,
|
||||
sql=sql,
|
||||
provider=provider,
|
||||
model=model,
|
||||
temperature=temperature,
|
||||
max_tokens=max_tokens,
|
||||
api_key=api_key,
|
||||
)
|
||||
|
||||
sql_analyst_agent = Agent(
|
||||
name="SQL Analyst",
|
||||
role=(
|
||||
|
|
@ -20,7 +103,7 @@ sql_analyst_agent = Agent(
|
|||
"A saída deve seguir exatamente a prompt original (SQL → natural) do projeto oracle-sql-query-optimizer."
|
||||
),
|
||||
model=base_model,
|
||||
tools=[explain_query_core, load_sql_from_file, ensure_non_empty, supported_databases],
|
||||
tools=[explain_query_tool, load_sql_from_file, ensure_non_empty, supported_databases],
|
||||
markdown=True,
|
||||
add_history_to_context=True,
|
||||
db=db,
|
||||
|
|
@ -32,67 +115,7 @@ sql_analyst_agent = Agent(
|
|||
"- Se o usuário fornecer um caminho de arquivo, use load_sql_from_file().",
|
||||
"- Preferência: use explain_query_core(database_type, sql) para gerar a explicação via core de negócio.",
|
||||
"- Use a template oficial abaixo para estruturar a explicação (SQL → natural).",
|
||||
"""
|
||||
You are an expert $database_name database analyst and performance specialist. Your task is to translate the SQL query below into a detailed, precise natural-language description that another agent will later use to reconstruct and optimize the query.
|
||||
|
||||
$database_name SQL Query:
|
||||
```sql
|
||||
$query
|
||||
```
|
||||
|
||||
Your explanation must follow these requirements:
|
||||
|
||||
1. **Describe the overall purpose**
|
||||
- Explain clearly what the query is intended to accomplish and why (retrieve data, update rows, aggregate information, validate existence, create structures, etc.).
|
||||
|
||||
2. **List ALL involved database objects**
|
||||
Explicitly list every:
|
||||
- Table
|
||||
- View
|
||||
- CTE (Common Table Expression)
|
||||
- Subquery or derived table
|
||||
- Function
|
||||
- Stored procedure, if referenced
|
||||
- Temporary table
|
||||
- Schema-qualified object
|
||||
Use the exact names as they appear in the query.
|
||||
|
||||
3. **Describe all essential operations**
|
||||
Explicitly state, using exact column names:
|
||||
- Columns retrieved or modified
|
||||
- Join types, join conditions, and which objects participate
|
||||
- Filters and conditions (WHERE, boolean logic, comparisons)
|
||||
- Aggregations (SUM, COUNT, etc.)
|
||||
- Grouping and HAVING clauses
|
||||
- Sorting (ORDER BY)
|
||||
- Window functions
|
||||
- DISTINCT, TOP, LIMIT, OFFSET, pagination
|
||||
- Any $database_name-specific features used$specific_features
|
||||
|
||||
4. **Maintain strict factual accuracy**
|
||||
- Do NOT infer business meaning unless directly implied.
|
||||
- Do NOT rename or paraphrase column names; repeat them exactly.
|
||||
|
||||
5. **Use clear, structured natural language**
|
||||
- Provide a step-by-step explanation that makes every operation and purpose explicit.
|
||||
- The output must be complete enough that the query can be reconstructed.
|
||||
|
||||
6. **⚠️ CRITICAL: Identify Performance Issues**
|
||||
Flag any of these CRITICAL performance problems found in the query:
|
||||
- **NO WHERE CLAUSE** (BE CAREFUL - AVOID FALSE POSITIVES):
|
||||
* ONLY flag if the MAIN/OUTER SELECT has absolutely NO WHERE keyword with filtering conditions
|
||||
* If query HAS 'WHERE' followed by conditions (even old-style JOINs in WHERE), DO NOT flag
|
||||
* Subqueries/EXISTS having WHERE does NOT mean main query has no WHERE
|
||||
* CROSS APPLY/LATERAL with internal WHERE counts as filtered
|
||||
* If truly no WHERE: Flag as CRITICAL (causes FULL TABLE SCAN, no predicate pushdown)
|
||||
- **Non-SARGable patterns**: Functions on indexed columns in WHERE/JOIN (e.g., YEAR(date), UPPER(col))
|
||||
- **Leading wildcards**: LIKE '%value%' patterns that prevent index usage
|
||||
- **Implicit conversions**: Type mismatches in comparisons
|
||||
- **NOLOCK/WITH (NOLOCK) hints**: If query uses WITH (NOLOCK), WITH (nolock), WITH(NOLOCK), (NOLOCK), (nolock) or NOLOCK/nolock (any case) → DO NOT REMOVE, but FLAG as **CRITICAL RISK**: "⚠️ WITH (NOLOCK) reads uncommitted/dirty data - CRITICAL: may cause INCORRECT FINANCIAL VALUES and data inconsistencies in production"
|
||||
$analysis_requirements
|
||||
|
||||
Explanation:
|
||||
""".strip(),
|
||||
SQL_TO_NATURAL_PROMPT,
|
||||
"- Entregue apenas a explicação natural estruturada conforme a prompt; não reescreva a SQL.",
|
||||
"- Identifique problemas críticos de performance conforme a prompt.",
|
||||
],
|
||||
|
|
|
|||
|
|
@ -1,18 +1,104 @@
|
|||
from agno.agent import Agent
|
||||
from agno.db.sqlite import SqliteDb
|
||||
from sql_optimizer_team.tools.engine.model_selector import get_model
|
||||
from sql_optimizer_team.tools.core_tools import optimize_query_core
|
||||
from sql_optimizer_team.tools.prompt_tools import supported_databases
|
||||
from sql_optimizer_team.tools.sql_tools import load_sql_from_file, ensure_non_empty
|
||||
import os
|
||||
|
||||
base_model = get_model()
|
||||
|
||||
NATURAL_TO_SQL_PROMPT = """
|
||||
You are an expert $database_name SQL developer and query performance specialist.
|
||||
Your task is to write an optimized SQL query based exclusively on the natural-language description provided below.
|
||||
|
||||
Description:
|
||||
$explanation
|
||||
|
||||
⚠️ CRITICAL RULES - READ BEFORE GENERATING SQL:
|
||||
|
||||
1. **PRESERVE ALL BUSINESS LOGIC EXACTLY**
|
||||
- Every CASE WHEN statement must have IDENTICAL conditions and results
|
||||
- Every calculated column must use IDENTICAL formulas
|
||||
- Every subquery must query the SAME tables with SAME filters
|
||||
- Do NOT simplify, merge, or "improve" business logic - even if it looks redundant
|
||||
- If description mentions specific conditions (cd_tp_apolice = 2, etc.), preserve them EXACTLY
|
||||
|
||||
2. **PRESERVE ALL TABLES AND COLUMNS**
|
||||
- Include EVERY table mentioned in the description
|
||||
- Include EVERY column mentioned in the description
|
||||
- Use EXACT column names as described (no renaming)
|
||||
- Use EXACT table aliases as described
|
||||
|
||||
3. **Translate the full described logic into SQL**
|
||||
- Implement all actions, operations, filters, joins, and conditions exactly as stated.
|
||||
- Use every object and column referenced in the description, using their exact names.
|
||||
- If the description mentions specific filter values (e.g., cd_tipo_endosso = 0), use those EXACT values
|
||||
|
||||
4. **Write optimized SQL while preserving semantics**
|
||||
- Apply $database_name best practices for performance.
|
||||
- Use indexing-aware filtering, efficient join strategies, and clear expressions.
|
||||
- Implement aggregations, groupings, window functions, or pagination when described.
|
||||
- Prefer performant constructs commonly recommended for $database_name workloads.
|
||||
- OPTIMIZATION means structure/hints/indexes - NOT changing logic
|
||||
|
||||
5. **Use $database_name-specific syntax and features**
|
||||
- Apply native functions, operators, optimizer behaviors, or hints when appropriate.
|
||||
- Incorporate $specific_requirements if provided.
|
||||
|
||||
6. **Ensure logical fidelity - ZERO TOLERANCE FOR CHANGES**
|
||||
- The SQL must reflect PRECISELY the behavior described
|
||||
- Do NOT add logic not explicitly stated
|
||||
- Do NOT omit any step described
|
||||
- Do NOT infer or assume details beyond what is explicitly stated
|
||||
- Do NOT "simplify" complex CASE statements
|
||||
- Do NOT merge or combine separate calculated columns
|
||||
|
||||
7. **Self-Verification Checklist** (perform before outputting):
|
||||
- [ ] All tables from description are present in query
|
||||
- [ ] All columns from description are present in SELECT
|
||||
- [ ] All CASE conditions match description exactly
|
||||
- [ ] All subquery filters match description exactly
|
||||
- [ ] All JOIN conditions match description exactly
|
||||
- [ ] No business logic was simplified or changed
|
||||
|
||||
8. **Output format**
|
||||
- Provide ONLY the final, optimized SQL query.
|
||||
- Do NOT include explanations, comments, or extra text.
|
||||
|
||||
Optimized SQL Query:
|
||||
""".strip()
|
||||
|
||||
_db_path = os.getenv("SQL_OPT_TEAM_DB_FILE", "tmp/sql_optimizer_team.db")
|
||||
_debug_mode = os.getenv("SQL_OPT_TEAM_DEBUG_MODE", "false").strip().lower() in {"1", "true", "yes", "on"}
|
||||
|
||||
db = SqliteDb(db_file=_db_path)
|
||||
|
||||
|
||||
async def optimize_query_tool(
|
||||
database_type: str,
|
||||
sql: str,
|
||||
provider: str | None = None,
|
||||
model: str | None = None,
|
||||
temperature: float | None = None,
|
||||
max_tokens: int | None = None,
|
||||
api_key: str | None = None,
|
||||
output_dir: str | None = None,
|
||||
no_review: bool = False,
|
||||
) -> dict[str, str | dict[str, str]]:
|
||||
from sql_optimizer_team.tools.core_tools import optimize_query_core
|
||||
|
||||
return await optimize_query_core(
|
||||
database_type=database_type,
|
||||
sql=sql,
|
||||
provider=provider,
|
||||
model=model,
|
||||
temperature=temperature,
|
||||
max_tokens=max_tokens,
|
||||
api_key=api_key,
|
||||
output_dir=output_dir,
|
||||
no_review=no_review,
|
||||
)
|
||||
|
||||
sql_optimizer_agent = Agent(
|
||||
name="SQL Optimizer",
|
||||
role=(
|
||||
|
|
@ -20,7 +106,7 @@ sql_optimizer_agent = Agent(
|
|||
"mantendo 100% da lógica e entregando apenas a SQL otimizada."
|
||||
),
|
||||
model=base_model,
|
||||
tools=[optimize_query_core, load_sql_from_file, ensure_non_empty, supported_databases],
|
||||
tools=[optimize_query_tool, load_sql_from_file, ensure_non_empty, supported_databases],
|
||||
markdown=True,
|
||||
add_history_to_context=True,
|
||||
db=db,
|
||||
|
|
@ -31,66 +117,7 @@ sql_optimizer_agent = Agent(
|
|||
"- Exija banco alvo e SQL antes de otimizar.",
|
||||
"- Use optimize_query_core(database_type, sql) para executar o core de negócio.",
|
||||
"- Use a template oficial abaixo para reescrever (natural → SQL) mantendo 100% da lógica.",
|
||||
"""
|
||||
You are an expert $database_name SQL developer and query performance specialist.
|
||||
Your task is to write an optimized SQL query based exclusively on the natural-language description provided below.
|
||||
|
||||
Description:
|
||||
$explanation
|
||||
|
||||
⚠️ CRITICAL RULES - READ BEFORE GENERATING SQL:
|
||||
|
||||
1. **PRESERVE ALL BUSINESS LOGIC EXACTLY**
|
||||
- Every CASE WHEN statement must have IDENTICAL conditions and results
|
||||
- Every calculated column must use IDENTICAL formulas
|
||||
- Every subquery must query the SAME tables with SAME filters
|
||||
- Do NOT simplify, merge, or "improve" business logic - even if it looks redundant
|
||||
- If description mentions specific conditions (cd_tp_apolice = 2, etc.), preserve them EXACTLY
|
||||
|
||||
2. **PRESERVE ALL TABLES AND COLUMNS**
|
||||
- Include EVERY table mentioned in the description
|
||||
- Include EVERY column mentioned in the description
|
||||
- Use EXACT column names as described (no renaming)
|
||||
- Use EXACT table aliases as described
|
||||
|
||||
3. **Translate the full described logic into SQL**
|
||||
- Implement all actions, operations, filters, joins, and conditions exactly as stated.
|
||||
- Use every object and column referenced in the description, using their exact names.
|
||||
- If the description mentions specific filter values (e.g., cd_tipo_endosso = 0), use those EXACT values
|
||||
|
||||
4. **Write optimized SQL while preserving semantics**
|
||||
- Apply $database_name best practices for performance.
|
||||
- Use indexing-aware filtering, efficient join strategies, and clear expressions.
|
||||
- Implement aggregations, groupings, window functions, or pagination when described.
|
||||
- Prefer performant constructs commonly recommended for $database_name workloads.
|
||||
- OPTIMIZATION means structure/hints/indexes - NOT changing logic
|
||||
|
||||
5. **Use $database_name-specific syntax and features**
|
||||
- Apply native functions, operators, optimizer behaviors, or hints when appropriate.
|
||||
- Incorporate $specific_requirements if provided.
|
||||
|
||||
6. **Ensure logical fidelity - ZERO TOLERANCE FOR CHANGES**
|
||||
- The SQL must reflect PRECISELY the behavior described
|
||||
- Do NOT add logic not explicitly stated
|
||||
- Do NOT omit any step described
|
||||
- Do NOT infer or assume details beyond what is explicitly stated
|
||||
- Do NOT "simplify" complex CASE statements
|
||||
- Do NOT merge or combine separate calculated columns
|
||||
|
||||
7. **Self-Verification Checklist** (perform before outputting):
|
||||
- [ ] All tables from description are present in query
|
||||
- [ ] All columns from description are present in SELECT
|
||||
- [ ] All CASE conditions match description exactly
|
||||
- [ ] All subquery filters match description exactly
|
||||
- [ ] All JOIN conditions match description exactly
|
||||
- [ ] No business logic was simplified or changed
|
||||
|
||||
8. **Output format**
|
||||
- Provide ONLY the final, optimized SQL query.
|
||||
- Do NOT include explanations, comments, or extra text.
|
||||
|
||||
Optimized SQL Query:
|
||||
""".strip(),
|
||||
NATURAL_TO_SQL_PROMPT,
|
||||
"- Extraia e devolva SOMENTE optimized_query (sem explicações, sem markdown).",
|
||||
"- Preserve 100% da lógica, colunas, aliases, filtros, joins e subqueries.",
|
||||
],
|
||||
|
|
|
|||
5
src/sql_optimizer_team/knowledge/__init__.py
Normal file
5
src/sql_optimizer_team/knowledge/__init__.py
Normal file
|
|
@ -0,0 +1,5 @@
|
|||
"""Internal knowledge base helpers."""
|
||||
|
||||
from sql_optimizer_team.knowledge.internal_kb import build_internal_knowledge, attach_internal_knowledge
|
||||
|
||||
__all__ = ["build_internal_knowledge", "attach_internal_knowledge"]
|
||||
100
src/sql_optimizer_team/knowledge/internal_kb.py
Normal file
100
src/sql_optimizer_team/knowledge/internal_kb.py
Normal file
|
|
@ -0,0 +1,100 @@
|
|||
"""Internal KB (RAG) setup for the SQL optimizer team."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass
|
||||
from pathlib import Path
|
||||
import os
|
||||
|
||||
from agno.db.sqlite import SqliteDb
|
||||
from agno.knowledge.knowledge import Knowledge
|
||||
from agno.knowledge.embedder.sentence_transformer import SentenceTransformerEmbedder
|
||||
from agno.vectordb.chroma import ChromaDb
|
||||
|
||||
from sql_optimizer_team.tools.engine.config.logger import get_logger
|
||||
|
||||
logger = get_logger(__name__)
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class InternalKBConfig:
|
||||
kb_path: Path
|
||||
chroma_path: Path
|
||||
embedder_id: str
|
||||
contents_db_file: Path
|
||||
block_external: bool
|
||||
|
||||
|
||||
def _load_config() -> InternalKBConfig:
|
||||
kb_path = Path(os.getenv("SQL_OPT_KB_PATH", "kb")).resolve()
|
||||
chroma_path = Path(os.getenv("SQL_OPT_KB_CHROMA_PATH", "tmp/kb_chroma")).resolve()
|
||||
embedder_id = os.getenv(
|
||||
"SQL_OPT_KB_EMBEDDER_ID",
|
||||
"sentence-transformers/all-MiniLM-L6-v2",
|
||||
).strip()
|
||||
contents_db_file = Path(os.getenv("SQL_OPT_KB_DB_FILE", "tmp/sql_optimizer_kb.db")).resolve()
|
||||
block_external = os.getenv("SQL_OPT_BLOCK_EXTERNAL_TOOLS", "true").strip().lower() in {"1", "true", "yes", "on"}
|
||||
return InternalKBConfig(
|
||||
kb_path=kb_path,
|
||||
chroma_path=chroma_path,
|
||||
embedder_id=embedder_id,
|
||||
contents_db_file=contents_db_file,
|
||||
block_external=block_external,
|
||||
)
|
||||
|
||||
|
||||
def build_internal_knowledge() -> Knowledge:
|
||||
config = _load_config()
|
||||
|
||||
if config.block_external:
|
||||
logger.info("External tools blocked for KB", kb_path=str(config.kb_path))
|
||||
|
||||
embedder = SentenceTransformerEmbedder(id=config.embedder_id)
|
||||
vector_db = ChromaDb(
|
||||
name="sql-optimizer-kb",
|
||||
path=str(config.chroma_path),
|
||||
persistent_client=True,
|
||||
embedder=embedder,
|
||||
)
|
||||
contents_db = SqliteDb(db_file=str(config.contents_db_file))
|
||||
|
||||
knowledge = Knowledge(
|
||||
name="internal-sql-kb",
|
||||
description="Base de conhecimento interna para otimização de SQL",
|
||||
vector_db=vector_db,
|
||||
contents_db=contents_db,
|
||||
max_results=6,
|
||||
)
|
||||
|
||||
if not config.kb_path.exists():
|
||||
logger.warning("KB path not found; skipping ingest", kb_path=str(config.kb_path))
|
||||
return knowledge
|
||||
|
||||
if config.block_external and not config.kb_path.is_dir():
|
||||
logger.warning("KB path is not a directory; skipping ingest", kb_path=str(config.kb_path))
|
||||
return knowledge
|
||||
|
||||
try:
|
||||
knowledge.insert(
|
||||
path=str(config.kb_path),
|
||||
include=["**/*.md", "**/*.txt", "**/*.sql", "**/*.pdf"],
|
||||
exclude=["**/.git/**", "**/.venv/**", "**/__pycache__/**"],
|
||||
upsert=True,
|
||||
skip_if_exists=True,
|
||||
)
|
||||
logger.info("KB ingest complete", kb_path=str(config.kb_path))
|
||||
except Exception as exc:
|
||||
logger.error("KB ingest failed", error=str(exc))
|
||||
|
||||
return knowledge
|
||||
|
||||
|
||||
def attach_internal_knowledge(knowledge: Knowledge, *agents: object) -> None:
|
||||
for agent in agents:
|
||||
try:
|
||||
setattr(agent, "knowledge", knowledge)
|
||||
setattr(agent, "add_knowledge_to_context", True)
|
||||
setattr(agent, "search_knowledge", True)
|
||||
setattr(agent, "update_knowledge", False)
|
||||
except Exception as exc:
|
||||
logger.warning("Failed to attach knowledge", agent=str(agent), error=str(exc))
|
||||
|
|
@ -2,13 +2,12 @@ from agno.team.team import Team
|
|||
from agno.os.app import AgentOS
|
||||
from agno.db.sqlite import SqliteDb
|
||||
from dotenv import load_dotenv
|
||||
from sql_optimizer_team.knowledge import build_internal_knowledge, attach_internal_knowledge
|
||||
from sql_optimizer_team.tools.engine.model_selector import get_model
|
||||
from sql_optimizer_team.agents import (
|
||||
sql_analyst_agent,
|
||||
sql_optimizer_agent,
|
||||
sql_quality_agent,
|
||||
conservative_analysis_agent,
|
||||
)
|
||||
from sql_optimizer_team.agents.sql_analyst_agent import sql_analyst_agent
|
||||
from sql_optimizer_team.agents.sql_optimizer_agent import sql_optimizer_agent
|
||||
from sql_optimizer_team.agents.sql_quality_agent import sql_quality_agent
|
||||
from sql_optimizer_team.agents.conservative_analysis_agent import conservative_analysis_agent
|
||||
import os
|
||||
|
||||
load_dotenv()
|
||||
|
|
@ -20,6 +19,39 @@ _debug_mode = os.getenv("SQL_OPT_TEAM_DEBUG_MODE", "false").strip().lower() in {
|
|||
|
||||
db = SqliteDb(db_file=_db_path)
|
||||
|
||||
_kb = build_internal_knowledge()
|
||||
attach_internal_knowledge(
|
||||
_kb,
|
||||
sql_analyst_agent,
|
||||
sql_optimizer_agent,
|
||||
sql_quality_agent,
|
||||
conservative_analysis_agent,
|
||||
)
|
||||
|
||||
_block_external_tools = os.getenv("SQL_OPT_BLOCK_EXTERNAL_TOOLS", "true").strip().lower() in {"1", "true", "yes", "on"}
|
||||
if _block_external_tools:
|
||||
_allowed_tool_names = {
|
||||
"explain_query_tool",
|
||||
"optimize_query_tool",
|
||||
"load_sql_from_file",
|
||||
"ensure_non_empty",
|
||||
"supported_databases",
|
||||
"diff_sql",
|
||||
}
|
||||
|
||||
def _filter_tools(agent) -> None:
|
||||
if not getattr(agent, "tools", None):
|
||||
return
|
||||
filtered = []
|
||||
for tool in agent.tools:
|
||||
name = getattr(tool, "__name__", None) or getattr(tool, "name", None) or str(tool)
|
||||
if name in _allowed_tool_names:
|
||||
filtered.append(tool)
|
||||
agent.tools = filtered
|
||||
|
||||
for _agent in [sql_analyst_agent, sql_optimizer_agent, sql_quality_agent, conservative_analysis_agent]:
|
||||
_filter_tools(_agent)
|
||||
|
||||
sql_optimizer_team = Team(
|
||||
name="SQL Optimization Team",
|
||||
model=base_model,
|
||||
|
|
|
|||
|
|
@ -41,6 +41,7 @@ class AgnoLLMTool(BaseLLMTool):
|
|||
result_text = self._extract_text(response)
|
||||
validated = self._validate_response(result_text)
|
||||
self._log_response(validated)
|
||||
self._log_usage_from_response(response, prompt, validated)
|
||||
return validated
|
||||
except Exception as e:
|
||||
self._log_error(e)
|
||||
|
|
|
|||
|
|
@ -5,6 +5,8 @@ This module provides a base class with common functionality for all LLM tools.
|
|||
|
||||
from abc import ABC
|
||||
from typing import Any
|
||||
import math
|
||||
import os
|
||||
|
||||
from sql_optimizer_team.tools.engine.tools_api.llm_tool import LLMTool
|
||||
from sql_optimizer_team.tools.engine.types.tool_exceptions import LLMProviderError
|
||||
|
|
@ -91,6 +93,49 @@ class BaseLLMTool(LLMTool, ABC):
|
|||
**kwargs,
|
||||
)
|
||||
|
||||
def _estimate_tokens(self, text: str) -> int:
|
||||
"""Best-effort token estimate when provider usage is unavailable."""
|
||||
if not text:
|
||||
return 0
|
||||
return max(1, math.ceil(len(text) / 4))
|
||||
|
||||
def _log_usage_from_response(self, response_obj: Any, prompt: str, response_text: str) -> None:
|
||||
"""Log token usage and cost if enabled.
|
||||
|
||||
Reads usage from ModelResponse when available, otherwise uses a rough estimate.
|
||||
Cost is computed using env vars LLM_COST_INPUT_PER_1K and LLM_COST_OUTPUT_PER_1K.
|
||||
"""
|
||||
enabled = os.getenv("LLM_LOG_USAGE", "true").strip().lower() in {"1", "true", "yes", "on"}
|
||||
if not enabled:
|
||||
return
|
||||
|
||||
input_tokens = getattr(response_obj, "input_tokens", None)
|
||||
output_tokens = getattr(response_obj, "output_tokens", None)
|
||||
total_tokens = getattr(response_obj, "total_tokens", None)
|
||||
|
||||
if input_tokens is None:
|
||||
input_tokens = self._estimate_tokens(prompt)
|
||||
if output_tokens is None:
|
||||
output_tokens = self._estimate_tokens(response_text)
|
||||
if total_tokens is None and input_tokens is not None and output_tokens is not None:
|
||||
total_tokens = input_tokens + output_tokens
|
||||
|
||||
cost_in = float(os.getenv("LLM_COST_INPUT_PER_1K", "0") or 0)
|
||||
cost_out = float(os.getenv("LLM_COST_OUTPUT_PER_1K", "0") or 0)
|
||||
cost_usd = None
|
||||
if input_tokens is not None or output_tokens is not None:
|
||||
cost_usd = (input_tokens or 0) * cost_in / 1000 + (output_tokens or 0) * cost_out / 1000
|
||||
|
||||
logger.info(
|
||||
"LLM usage",
|
||||
provider=self.provider_name,
|
||||
model=self._model_name,
|
||||
input_tokens=input_tokens,
|
||||
output_tokens=output_tokens,
|
||||
total_tokens=total_tokens,
|
||||
cost_usd=cost_usd,
|
||||
)
|
||||
|
||||
def _log_error(self, error: Exception, **kwargs: Any) -> None:
|
||||
"""Log LLM error.
|
||||
|
||||
|
|
|
|||
|
|
@ -7,185 +7,17 @@ reducing code duplication and ensuring consistency.
|
|||
from abc import ABC, abstractmethod
|
||||
|
||||
from string import Template
|
||||
import importlib
|
||||
|
||||
from sql_optimizer_team.tools.engine.tools_api.prompt_tool import PromptGeneratorTool
|
||||
|
||||
|
||||
SQL_TO_NATURAL_TEMPLATE = Template("""
|
||||
You are an expert $database_name database analyst and performance specialist. Your task is to translate the SQL query below into a detailed, precise natural-language description that another agent will later use to reconstruct and optimize the query.
|
||||
|
||||
$database_name SQL Query:
|
||||
```sql
|
||||
$query
|
||||
```
|
||||
|
||||
Your explanation must follow these requirements:
|
||||
|
||||
1. **Describe the overall purpose**
|
||||
- Explain clearly what the query is intended to accomplish and why (retrieve data, update rows, aggregate information, validate existence, create structures, etc.).
|
||||
|
||||
2. **List ALL involved database objects**
|
||||
Explicitly list every:
|
||||
- Table
|
||||
- View
|
||||
- CTE (Common Table Expression)
|
||||
- Subquery or derived table
|
||||
- Function
|
||||
- Stored procedure, if referenced
|
||||
- Temporary table
|
||||
- Schema-qualified object
|
||||
Use the exact names as they appear in the query.
|
||||
|
||||
3. **Describe all essential operations**
|
||||
Explicitly state, using exact column names:
|
||||
- Columns retrieved or modified
|
||||
- Join types, join conditions, and which objects participate
|
||||
- Filters and conditions (WHERE, boolean logic, comparisons)
|
||||
- Aggregations (SUM, COUNT, etc.)
|
||||
- Grouping and HAVING clauses
|
||||
- Sorting (ORDER BY)
|
||||
- Window functions
|
||||
- DISTINCT, TOP, LIMIT, OFFSET, pagination
|
||||
- Any $database_name-specific features used$specific_features
|
||||
|
||||
4. **Maintain strict factual accuracy**
|
||||
- Do NOT infer business meaning unless directly implied.
|
||||
- Do NOT rename or paraphrase column names; repeat them exactly.
|
||||
|
||||
5. **Use clear, structured natural language**
|
||||
- Provide a step-by-step explanation that makes every operation and purpose explicit.
|
||||
- The output must be complete enough that the query can be reconstructed.
|
||||
|
||||
6. **⚠️ CRITICAL: Identify Performance Issues**
|
||||
Flag any of these CRITICAL performance problems found in the query:
|
||||
- **NO WHERE CLAUSE** (BE CAREFUL - AVOID FALSE POSITIVES):
|
||||
* ONLY flag if the MAIN/OUTER SELECT has absolutely NO WHERE keyword with filtering conditions
|
||||
* If query HAS 'WHERE' followed by conditions (even old-style JOINs in WHERE), DO NOT flag
|
||||
* Subqueries/EXISTS having WHERE does NOT mean main query has no WHERE
|
||||
* CROSS APPLY/LATERAL with internal WHERE counts as filtered
|
||||
* If truly no WHERE: Flag as CRITICAL (causes FULL TABLE SCAN, no predicate pushdown)
|
||||
- **Non-SARGable patterns**: Functions on indexed columns in WHERE/JOIN (e.g., YEAR(date), UPPER(col))
|
||||
- **Leading wildcards**: LIKE '%value%' patterns that prevent index usage
|
||||
- **Implicit conversions**: Type mismatches in comparisons
|
||||
- **NOLOCK/WITH (NOLOCK) hints**: If query uses WITH (NOLOCK), WITH (nolock), WITH(NOLOCK), (NOLOCK), (nolock) or NOLOCK/nolock (any case) → DO NOT REMOVE, but FLAG as **CRITICAL RISK**: "⚠️ WITH (NOLOCK) reads uncommitted/dirty data - CRITICAL: may cause INCORRECT FINANCIAL VALUES and data inconsistencies in production"
|
||||
$analysis_requirements
|
||||
|
||||
Explanation:
|
||||
""")
|
||||
|
||||
NATURAL_TO_SQL_TEMPLATE = Template("""
|
||||
You are an expert $database_name SQL developer and query performance specialist.
|
||||
Your task is to write an optimized SQL query based exclusively on the natural-language description provided below.
|
||||
|
||||
Description:
|
||||
$explanation
|
||||
|
||||
⚠️ CRITICAL RULES - READ BEFORE GENERATING SQL:
|
||||
|
||||
1. **PRESERVE ALL BUSINESS LOGIC EXACTLY**
|
||||
- Every CASE WHEN statement must have IDENTICAL conditions and results
|
||||
- Every calculated column must use IDENTICAL formulas
|
||||
- Every subquery must query the SAME tables with SAME filters
|
||||
- Do NOT simplify, merge, or "improve" business logic - even if it looks redundant
|
||||
- If description mentions specific conditions (cd_tp_apolice = 2, etc.), preserve them EXACTLY
|
||||
|
||||
2. **PRESERVE ALL TABLES AND COLUMNS**
|
||||
- Include EVERY table mentioned in the description
|
||||
- Include EVERY column mentioned in the description
|
||||
- Use EXACT column names as described (no renaming)
|
||||
- Use EXACT table aliases as described
|
||||
|
||||
3. **Translate the full described logic into SQL**
|
||||
- Implement all actions, operations, filters, joins, and conditions exactly as stated.
|
||||
- Use every object and column referenced in the description, using their exact names.
|
||||
- If the description mentions specific filter values (e.g., cd_tipo_endosso = 0), use those EXACT values
|
||||
|
||||
4. **Write optimized SQL while preserving semantics**
|
||||
- Apply $database_name best practices for performance.
|
||||
- Use indexing-aware filtering, efficient join strategies, and clear expressions.
|
||||
- Implement aggregations, groupings, window functions, or pagination when described.
|
||||
- Prefer performant constructs commonly recommended for $database_name workloads.
|
||||
- OPTIMIZATION means structure/hints/indexes - NOT changing logic
|
||||
|
||||
5. **Use $database_name-specific syntax and features**
|
||||
- Apply native functions, operators, optimizer behaviors, or hints when appropriate.
|
||||
- Incorporate $specific_requirements if provided.
|
||||
|
||||
6. **Ensure logical fidelity - ZERO TOLERANCE FOR CHANGES**
|
||||
- The SQL must reflect PRECISELY the behavior described
|
||||
- Do NOT add logic not explicitly stated
|
||||
- Do NOT omit any step described
|
||||
- Do NOT infer or assume details beyond what is explicitly stated
|
||||
- Do NOT "simplify" complex CASE statements
|
||||
- Do NOT merge or combine separate calculated columns
|
||||
|
||||
7. **Self-Verification Checklist** (perform before outputting):
|
||||
- [ ] All tables from description are present in query
|
||||
- [ ] All columns from description are present in SELECT
|
||||
- [ ] All CASE conditions match description exactly
|
||||
- [ ] All subquery filters match description exactly
|
||||
- [ ] All JOIN conditions match description exactly
|
||||
- [ ] No business logic was simplified or changed
|
||||
|
||||
8. **Output format**
|
||||
- Provide ONLY the final, optimized SQL query.
|
||||
- Do NOT include explanations, comments, or extra text.
|
||||
|
||||
Optimized SQL Query:
|
||||
""")
|
||||
|
||||
CONSERVATIVE_ANALYSIS_TEMPLATE = Template("""
|
||||
You are an expert $database_name database analyst and performance specialist.
|
||||
|
||||
Your task is to ANALYZE the SQL query below and provide SUGGESTIONS for improvement.
|
||||
|
||||
⚠️ CRITICAL: You must NOT rewrite or modify the query. Only provide analysis and suggestions.
|
||||
|
||||
$database_name SQL Query:
|
||||
```sql
|
||||
$query
|
||||
```
|
||||
|
||||
Query Complexity Information:
|
||||
- Columns: $column_count
|
||||
- Tables: $table_count
|
||||
- Subqueries: $subquery_count
|
||||
- CASE statements: $case_count
|
||||
- JOINs: $join_count
|
||||
- Complexity Level: $complexity_level
|
||||
|
||||
Provide your analysis in the following structured format:
|
||||
|
||||
## PERFORMANCE ISSUES
|
||||
List each performance issue found, with severity (CRITICAL/HIGH/MEDIUM/LOW):
|
||||
- [SEVERITY] Issue description
|
||||
- [SEVERITY] Issue description
|
||||
|
||||
## SUGGESTED INDEXES
|
||||
List indexes that could improve this query:
|
||||
- CREATE INDEX idx_name ON table(columns) -- Reason
|
||||
|
||||
## OPTIMIZATION SUGGESTIONS
|
||||
List specific suggestions WITHOUT rewriting the query:
|
||||
- Suggestion 1: Description of what could be improved and why
|
||||
- Suggestion 2: Description of what could be improved and why
|
||||
|
||||
## RISK ASSESSMENT
|
||||
- WITH (NOLOCK) usage: [Yes/No] - If yes, explain the risks
|
||||
- Missing WHERE clause: [Yes/No] - If yes, explain the impact
|
||||
- Implicit conversions: [Yes/No] - If yes, list them
|
||||
|
||||
## SUMMARY
|
||||
Brief summary of the most important findings and priority order for addressing them.
|
||||
|
||||
Remember: DO NOT provide a rewritten query. Only analysis and suggestions.
|
||||
""")
|
||||
|
||||
|
||||
def _render_sql_to_natural(
|
||||
database_name: str, query: str, specific_features: str = "", analysis_requirements: str = ""
|
||||
) -> str:
|
||||
return SQL_TO_NATURAL_TEMPLATE.substitute(
|
||||
module = importlib.import_module("sql_optimizer_team.agents.sql_analyst_agent")
|
||||
template_text = getattr(module, "SQL_TO_NATURAL_PROMPT")
|
||||
return Template(template_text).substitute(
|
||||
database_name=database_name,
|
||||
query=query,
|
||||
specific_features=f"\n{specific_features}" if specific_features else "",
|
||||
|
|
@ -196,7 +28,9 @@ def _render_sql_to_natural(
|
|||
def _render_natural_to_sql(
|
||||
database_name: str, explanation: str, specific_requirements: str
|
||||
) -> str:
|
||||
return NATURAL_TO_SQL_TEMPLATE.substitute(
|
||||
module = importlib.import_module("sql_optimizer_team.agents.sql_optimizer_agent")
|
||||
template_text = getattr(module, "NATURAL_TO_SQL_PROMPT")
|
||||
return Template(template_text).substitute(
|
||||
database_name=database_name,
|
||||
explanation=explanation,
|
||||
specific_requirements="\n".join(
|
||||
|
|
@ -215,7 +49,9 @@ def _render_conservative_analysis(
|
|||
join_count: int = 0,
|
||||
complexity_level: str = "unknown",
|
||||
) -> str:
|
||||
return CONSERVATIVE_ANALYSIS_TEMPLATE.substitute(
|
||||
module = importlib.import_module("sql_optimizer_team.agents.conservative_analysis_agent")
|
||||
template_text = getattr(module, "CONSERVATIVE_ANALYSIS_PROMPT")
|
||||
return Template(template_text).substitute(
|
||||
database_name=database_name,
|
||||
query=query,
|
||||
column_count=column_count,
|
||||
|
|
|
|||
Loading…
Reference in a new issue