feat: Enhance SQL optimization tools with internal knowledge base and observability features

- Updated README.md to include new setup instructions for RAG and observability.
- Added internal knowledge base (KB) setup for SQL optimization team, supporting various document types.
- Implemented token usage logging in LLM tools to track costs and usage.
- Refactored SQL analysis and optimization prompts for clarity and consistency.
- Introduced filtering of external tools based on environment configuration.
- Enhanced conservative analysis agent with structured prompt for performance suggestions.
- Updated requirements.txt to include new dependencies for RAG functionality.
- Added internal KB helpers for building and attaching knowledge to agents.
This commit is contained in:
william.dias 2026-01-23 13:02:17 -03:00
parent c6dd91810b
commit 80d1f9d26a
14 changed files with 502 additions and 355 deletions

View file

@ -25,15 +25,31 @@ src/
1) Crie o ambiente e instale dependências:
- `pip install -r requirements.txt`
2) Configure variáveis de ambiente (exemplo em `sample.env`).
2) Configure variáveis de ambiente (exemplo em `sample.env` ou `.env`).
3) Execute o servidor:
- `PYTHONPATH=src python -m main`
- `./scripts/start.sh`
Acesse:
- `http://localhost:8204/docs` (Swagger UI)
- `http://localhost:8204` (informações básicas da API)
## UI local (Agent UI)
Use o **Agent UI** (agno-agi/agent-ui) como front local:
1) Instale com o script oficial:
- `npx create-agent-ui@latest`
1) Inicie a UI:
- `pnpm dev`
1) Abra `http://localhost:3000` e ajuste o endpoint para `http://localhost:8204`.
Opcional: se o AgentOS usar autenticação, configure `OS_SECURITY_KEY` conforme o README do Agent UI.
## Fluxo do time
1) **Gestor** recebe a requisição e valida o contexto (banco + SQL).
@ -43,7 +59,23 @@ Acesse:
5) **Conservative Analyst** (se solicitado) gera análise sem reescrever a query.
6) **Gestor** consolida e entrega.
## RAG (KB interna)
- Coloque documentos em `kb/` (md/txt/sql/pdf).
- O RAG local usa Chroma + SentenceTransformers.
- Variáveis principais:
- `SQL_OPT_KB_PATH`, `SQL_OPT_KB_CHROMA_PATH`, `SQL_OPT_KB_DB_FILE`
- `SQL_OPT_KB_EMBEDDER_ID`
- `SQL_OPT_BLOCK_EXTERNAL_TOOLS=true` bloqueia ferramentas externas.
## Observabilidade de tokens/custos
- Ative com `LLM_LOG_USAGE=true`.
- Defina preços (USD por 1K tokens) com:
- `LLM_COST_INPUT_PER_1K`
- `LLM_COST_OUTPUT_PER_1K`
## Observações
- Use o modelo configurado em variáveis de ambiente (ex.: OpenAI, Gemini, Groq, etc.).
- Use o provedor configurado em `.env` (ex.: Ollama local, OpenAI, Gemini, Groq, etc.).
- O time é colaborativo e mantém histórico em SQLite (configurável via env).

View file

@ -124,6 +124,11 @@ Recomendação de ferramentas de mercado:
- **Langfuse** ou **Phoenix** para rastreio de prompts, custos e latência.
- **Grafana/Prometheus** para dashboards executivos.
Status no POC:
- **Logging de tokens/custos** já implementado via `LLM_LOG_USAGE` e custos por 1K tokens.
- Métricas persistentes e dashboards (Grafana/Prometheus) permanecem como evolução.
Métricas mínimas:
- Tokens por request e por área.
@ -150,6 +155,11 @@ Métricas mínimas:
- Curadoria contínua com feedback dos times para melhorar a relevância.
- **Aumento de precisão**: respostas consistentes com políticas internas e padrões técnicos.
Status no POC:
- **RAG local** com base interna em `kb/` usando Chroma + SentenceTransformers.
- **Bloqueio de ferramentas externas** por padrão via `SQL_OPT_BLOCK_EXTERNAL_TOOLS=true`.
## 10) Stack definitiva (100% Agno)
- **Agno** como framework único para orquestração, memória e tools.

18
kb/README.md Normal file
View file

@ -0,0 +1,18 @@
# Base de Conhecimento Interna (KB)
Coloque aqui documentos internos que devem ser usados no RAG.
Suportado (por padrão):
- Markdown (.md)
- Texto (.txt)
- SQL (.sql)
- PDF (.pdf)
Configurações via ambiente:
- SQL_OPT_KB_PATH (padrão: kb)
- SQL_OPT_KB_CHROMA_PATH (padrão: tmp/kb_chroma)
- SQL_OPT_KB_EMBEDDER_ID (padrão: sentence-transformers/all-MiniLM-L6-v2)
- SQL_OPT_KB_DB_FILE (padrão: tmp/sql_optimizer_kb.db)
- SQL_OPT_BLOCK_EXTERNAL_TOOLS (padrão: true)

View file

@ -30,3 +30,7 @@ oracledb==3.4.1
pymssql==2.3.11
sqlparse==0.5.5
sqlglot==28.6.0
# RAG (local KB)
chromadb==0.6.3
sentence-transformers==3.4.1

View file

@ -15,3 +15,15 @@
# SQL Optimizer Team
SQL_OPT_TEAM_DB_FILE=tmp/sql_optimizer_team.db
SQL_OPT_TEAM_DEBUG_MODE=false
# Observabilidade de tokens/custos
LLM_LOG_USAGE=true
LLM_COST_INPUT_PER_1K=0
LLM_COST_OUTPUT_PER_1K=0
# RAG / KB interna
SQL_OPT_KB_PATH=kb
SQL_OPT_KB_CHROMA_PATH=tmp/kb_chroma
SQL_OPT_KB_DB_FILE=tmp/sql_optimizer_kb.db
SQL_OPT_KB_EMBEDDER_ID=sentence-transformers/all-MiniLM-L6-v2
SQL_OPT_BLOCK_EXTERNAL_TOOLS=true

View file

@ -7,30 +7,7 @@ import os
base_model = get_model()
_db_path = os.getenv("SQL_OPT_TEAM_DB_FILE", "tmp/sql_optimizer_team.db")
_debug_mode = os.getenv("SQL_OPT_TEAM_DEBUG_MODE", "false").strip().lower() in {"1", "true", "yes", "on"}
db = SqliteDb(db_file=_db_path)
conservative_analysis_agent = Agent(
name="Conservative Analyst",
role=(
"Você realiza análise de performance sem reescrever a query, "
"seguindo a prompt conservadora original do oracle-sql-query-optimizer."
),
model=base_model,
tools=[load_sql_from_file, ensure_non_empty, supported_databases],
markdown=True,
add_history_to_context=True,
db=db,
enable_agentic_memory=True,
enable_user_memories=True,
debug_mode=_debug_mode,
instructions=[
"- Solicite banco e SQL se não estiverem presentes.",
"- Se o usuário fornecer um caminho de arquivo, use load_sql_from_file().",
"- Use a template oficial abaixo para a análise conservadora (sem reescrever a SQL).",
"""
CONSERVATIVE_ANALYSIS_PROMPT = """
You are an expert $database_name database analyst and performance specialist.
Your task is to ANALYZE the SQL query below and provide SUGGESTIONS for improvement.
@ -75,7 +52,32 @@ conservative_analysis_agent = Agent(
Brief summary of the most important findings and priority order for addressing them.
Remember: DO NOT provide a rewritten query. Only analysis and suggestions.
""".strip(),
""".strip()
_db_path = os.getenv("SQL_OPT_TEAM_DB_FILE", "tmp/sql_optimizer_team.db")
_debug_mode = os.getenv("SQL_OPT_TEAM_DEBUG_MODE", "false").strip().lower() in {"1", "true", "yes", "on"}
db = SqliteDb(db_file=_db_path)
conservative_analysis_agent = Agent(
name="Conservative Analyst",
role=(
"Você realiza análise de performance sem reescrever a query, "
"seguindo a prompt conservadora original do oracle-sql-query-optimizer."
),
model=base_model,
tools=[load_sql_from_file, ensure_non_empty, supported_databases],
markdown=True,
add_history_to_context=True,
db=db,
enable_agentic_memory=True,
enable_user_memories=True,
debug_mode=_debug_mode,
instructions=[
"- Solicite banco e SQL se não estiverem presentes.",
"- Se o usuário fornecer um caminho de arquivo, use load_sql_from_file().",
"- Use a template oficial abaixo para a análise conservadora (sem reescrever a SQL).",
CONSERVATIVE_ANALYSIS_PROMPT,
"- NÃO reescreva a SQL em hipótese alguma.",
],
)

View file

@ -1,38 +1,13 @@
from agno.agent import Agent
from agno.db.sqlite import SqliteDb
from sql_optimizer_team.tools.engine.model_selector import get_model
from sql_optimizer_team.tools.core_tools import explain_query_core
from sql_optimizer_team.tools.prompt_tools import supported_databases
from sql_optimizer_team.tools.sql_tools import load_sql_from_file, ensure_non_empty
import os
base_model = get_model()
_db_path = os.getenv("SQL_OPT_TEAM_DB_FILE", "tmp/sql_optimizer_team.db")
_debug_mode = os.getenv("SQL_OPT_TEAM_DEBUG_MODE", "false").strip().lower() in {"1", "true", "yes", "on"}
db = SqliteDb(db_file=_db_path)
sql_analyst_agent = Agent(
name="SQL Analyst",
role=(
"Você recebe a SQL original e o banco alvo e produz a descrição natural detalhada. "
"A saída deve seguir exatamente a prompt original (SQL → natural) do projeto oracle-sql-query-optimizer."
),
model=base_model,
tools=[explain_query_core, load_sql_from_file, ensure_non_empty, supported_databases],
markdown=True,
add_history_to_context=True,
db=db,
enable_agentic_memory=True,
enable_user_memories=True,
debug_mode=_debug_mode,
instructions=[
"- Solicite banco e SQL se não estiverem presentes. Bancos suportados: use supported_databases().",
"- Se o usuário fornecer um caminho de arquivo, use load_sql_from_file().",
"- Preferência: use explain_query_core(database_type, sql) para gerar a explicação via core de negócio.",
"- Use a template oficial abaixo para estruturar a explicação (SQL → natural).",
"""
SQL_TO_NATURAL_PROMPT = """
You are an expert $database_name database analyst and performance specialist. Your task is to translate the SQL query below into a detailed, precise natural-language description that another agent will later use to reconstruct and optimize the query.
$database_name SQL Query:
@ -92,7 +67,55 @@ sql_analyst_agent = Agent(
$analysis_requirements
Explanation:
""".strip(),
""".strip()
_db_path = os.getenv("SQL_OPT_TEAM_DB_FILE", "tmp/sql_optimizer_team.db")
_debug_mode = os.getenv("SQL_OPT_TEAM_DEBUG_MODE", "false").strip().lower() in {"1", "true", "yes", "on"}
db = SqliteDb(db_file=_db_path)
async def explain_query_tool(
database_type: str,
sql: str,
provider: str | None = None,
model: str | None = None,
temperature: float | None = None,
max_tokens: int | None = None,
api_key: str | None = None,
) -> dict[str, str]:
from sql_optimizer_team.tools.core_tools import explain_query_core
return await explain_query_core(
database_type=database_type,
sql=sql,
provider=provider,
model=model,
temperature=temperature,
max_tokens=max_tokens,
api_key=api_key,
)
sql_analyst_agent = Agent(
name="SQL Analyst",
role=(
"Você recebe a SQL original e o banco alvo e produz a descrição natural detalhada. "
"A saída deve seguir exatamente a prompt original (SQL → natural) do projeto oracle-sql-query-optimizer."
),
model=base_model,
tools=[explain_query_tool, load_sql_from_file, ensure_non_empty, supported_databases],
markdown=True,
add_history_to_context=True,
db=db,
enable_agentic_memory=True,
enable_user_memories=True,
debug_mode=_debug_mode,
instructions=[
"- Solicite banco e SQL se não estiverem presentes. Bancos suportados: use supported_databases().",
"- Se o usuário fornecer um caminho de arquivo, use load_sql_from_file().",
"- Preferência: use explain_query_core(database_type, sql) para gerar a explicação via core de negócio.",
"- Use a template oficial abaixo para estruturar a explicação (SQL → natural).",
SQL_TO_NATURAL_PROMPT,
"- Entregue apenas a explicação natural estruturada conforme a prompt; não reescreva a SQL.",
"- Identifique problemas críticos de performance conforme a prompt.",
],

View file

@ -1,37 +1,13 @@
from agno.agent import Agent
from agno.db.sqlite import SqliteDb
from sql_optimizer_team.tools.engine.model_selector import get_model
from sql_optimizer_team.tools.core_tools import optimize_query_core
from sql_optimizer_team.tools.prompt_tools import supported_databases
from sql_optimizer_team.tools.sql_tools import load_sql_from_file, ensure_non_empty
import os
base_model = get_model()
_db_path = os.getenv("SQL_OPT_TEAM_DB_FILE", "tmp/sql_optimizer_team.db")
_debug_mode = os.getenv("SQL_OPT_TEAM_DEBUG_MODE", "false").strip().lower() in {"1", "true", "yes", "on"}
db = SqliteDb(db_file=_db_path)
sql_optimizer_agent = Agent(
name="SQL Optimizer",
role=(
"Você executa a otimização completa usando o core de negócio do projeto, "
"mantendo 100% da lógica e entregando apenas a SQL otimizada."
),
model=base_model,
tools=[optimize_query_core, load_sql_from_file, ensure_non_empty, supported_databases],
markdown=True,
add_history_to_context=True,
db=db,
enable_agentic_memory=True,
enable_user_memories=True,
debug_mode=_debug_mode,
instructions=[
"- Exija banco alvo e SQL antes de otimizar.",
"- Use optimize_query_core(database_type, sql) para executar o core de negócio.",
"- Use a template oficial abaixo para reescrever (natural → SQL) mantendo 100% da lógica.",
"""
NATURAL_TO_SQL_PROMPT = """
You are an expert $database_name SQL developer and query performance specialist.
Your task is to write an optimized SQL query based exclusively on the natural-language description provided below.
@ -90,7 +66,58 @@ sql_optimizer_agent = Agent(
- Do NOT include explanations, comments, or extra text.
Optimized SQL Query:
""".strip(),
""".strip()
_db_path = os.getenv("SQL_OPT_TEAM_DB_FILE", "tmp/sql_optimizer_team.db")
_debug_mode = os.getenv("SQL_OPT_TEAM_DEBUG_MODE", "false").strip().lower() in {"1", "true", "yes", "on"}
db = SqliteDb(db_file=_db_path)
async def optimize_query_tool(
database_type: str,
sql: str,
provider: str | None = None,
model: str | None = None,
temperature: float | None = None,
max_tokens: int | None = None,
api_key: str | None = None,
output_dir: str | None = None,
no_review: bool = False,
) -> dict[str, str | dict[str, str]]:
from sql_optimizer_team.tools.core_tools import optimize_query_core
return await optimize_query_core(
database_type=database_type,
sql=sql,
provider=provider,
model=model,
temperature=temperature,
max_tokens=max_tokens,
api_key=api_key,
output_dir=output_dir,
no_review=no_review,
)
sql_optimizer_agent = Agent(
name="SQL Optimizer",
role=(
"Você executa a otimização completa usando o core de negócio do projeto, "
"mantendo 100% da lógica e entregando apenas a SQL otimizada."
),
model=base_model,
tools=[optimize_query_tool, load_sql_from_file, ensure_non_empty, supported_databases],
markdown=True,
add_history_to_context=True,
db=db,
enable_agentic_memory=True,
enable_user_memories=True,
debug_mode=_debug_mode,
instructions=[
"- Exija banco alvo e SQL antes de otimizar.",
"- Use optimize_query_core(database_type, sql) para executar o core de negócio.",
"- Use a template oficial abaixo para reescrever (natural → SQL) mantendo 100% da lógica.",
NATURAL_TO_SQL_PROMPT,
"- Extraia e devolva SOMENTE optimized_query (sem explicações, sem markdown).",
"- Preserve 100% da lógica, colunas, aliases, filtros, joins e subqueries.",
],

View file

@ -0,0 +1,5 @@
"""Internal knowledge base helpers."""
from sql_optimizer_team.knowledge.internal_kb import build_internal_knowledge, attach_internal_knowledge
__all__ = ["build_internal_knowledge", "attach_internal_knowledge"]

View file

@ -0,0 +1,100 @@
"""Internal KB (RAG) setup for the SQL optimizer team."""
from __future__ import annotations
from dataclasses import dataclass
from pathlib import Path
import os
from agno.db.sqlite import SqliteDb
from agno.knowledge.knowledge import Knowledge
from agno.knowledge.embedder.sentence_transformer import SentenceTransformerEmbedder
from agno.vectordb.chroma import ChromaDb
from sql_optimizer_team.tools.engine.config.logger import get_logger
logger = get_logger(__name__)
@dataclass(frozen=True)
class InternalKBConfig:
kb_path: Path
chroma_path: Path
embedder_id: str
contents_db_file: Path
block_external: bool
def _load_config() -> InternalKBConfig:
kb_path = Path(os.getenv("SQL_OPT_KB_PATH", "kb")).resolve()
chroma_path = Path(os.getenv("SQL_OPT_KB_CHROMA_PATH", "tmp/kb_chroma")).resolve()
embedder_id = os.getenv(
"SQL_OPT_KB_EMBEDDER_ID",
"sentence-transformers/all-MiniLM-L6-v2",
).strip()
contents_db_file = Path(os.getenv("SQL_OPT_KB_DB_FILE", "tmp/sql_optimizer_kb.db")).resolve()
block_external = os.getenv("SQL_OPT_BLOCK_EXTERNAL_TOOLS", "true").strip().lower() in {"1", "true", "yes", "on"}
return InternalKBConfig(
kb_path=kb_path,
chroma_path=chroma_path,
embedder_id=embedder_id,
contents_db_file=contents_db_file,
block_external=block_external,
)
def build_internal_knowledge() -> Knowledge:
config = _load_config()
if config.block_external:
logger.info("External tools blocked for KB", kb_path=str(config.kb_path))
embedder = SentenceTransformerEmbedder(id=config.embedder_id)
vector_db = ChromaDb(
name="sql-optimizer-kb",
path=str(config.chroma_path),
persistent_client=True,
embedder=embedder,
)
contents_db = SqliteDb(db_file=str(config.contents_db_file))
knowledge = Knowledge(
name="internal-sql-kb",
description="Base de conhecimento interna para otimização de SQL",
vector_db=vector_db,
contents_db=contents_db,
max_results=6,
)
if not config.kb_path.exists():
logger.warning("KB path not found; skipping ingest", kb_path=str(config.kb_path))
return knowledge
if config.block_external and not config.kb_path.is_dir():
logger.warning("KB path is not a directory; skipping ingest", kb_path=str(config.kb_path))
return knowledge
try:
knowledge.insert(
path=str(config.kb_path),
include=["**/*.md", "**/*.txt", "**/*.sql", "**/*.pdf"],
exclude=["**/.git/**", "**/.venv/**", "**/__pycache__/**"],
upsert=True,
skip_if_exists=True,
)
logger.info("KB ingest complete", kb_path=str(config.kb_path))
except Exception as exc:
logger.error("KB ingest failed", error=str(exc))
return knowledge
def attach_internal_knowledge(knowledge: Knowledge, *agents: object) -> None:
for agent in agents:
try:
setattr(agent, "knowledge", knowledge)
setattr(agent, "add_knowledge_to_context", True)
setattr(agent, "search_knowledge", True)
setattr(agent, "update_knowledge", False)
except Exception as exc:
logger.warning("Failed to attach knowledge", agent=str(agent), error=str(exc))

View file

@ -2,13 +2,12 @@ from agno.team.team import Team
from agno.os.app import AgentOS
from agno.db.sqlite import SqliteDb
from dotenv import load_dotenv
from sql_optimizer_team.knowledge import build_internal_knowledge, attach_internal_knowledge
from sql_optimizer_team.tools.engine.model_selector import get_model
from sql_optimizer_team.agents import (
sql_analyst_agent,
sql_optimizer_agent,
sql_quality_agent,
conservative_analysis_agent,
)
from sql_optimizer_team.agents.sql_analyst_agent import sql_analyst_agent
from sql_optimizer_team.agents.sql_optimizer_agent import sql_optimizer_agent
from sql_optimizer_team.agents.sql_quality_agent import sql_quality_agent
from sql_optimizer_team.agents.conservative_analysis_agent import conservative_analysis_agent
import os
load_dotenv()
@ -20,6 +19,39 @@ _debug_mode = os.getenv("SQL_OPT_TEAM_DEBUG_MODE", "false").strip().lower() in {
db = SqliteDb(db_file=_db_path)
_kb = build_internal_knowledge()
attach_internal_knowledge(
_kb,
sql_analyst_agent,
sql_optimizer_agent,
sql_quality_agent,
conservative_analysis_agent,
)
_block_external_tools = os.getenv("SQL_OPT_BLOCK_EXTERNAL_TOOLS", "true").strip().lower() in {"1", "true", "yes", "on"}
if _block_external_tools:
_allowed_tool_names = {
"explain_query_tool",
"optimize_query_tool",
"load_sql_from_file",
"ensure_non_empty",
"supported_databases",
"diff_sql",
}
def _filter_tools(agent) -> None:
if not getattr(agent, "tools", None):
return
filtered = []
for tool in agent.tools:
name = getattr(tool, "__name__", None) or getattr(tool, "name", None) or str(tool)
if name in _allowed_tool_names:
filtered.append(tool)
agent.tools = filtered
for _agent in [sql_analyst_agent, sql_optimizer_agent, sql_quality_agent, conservative_analysis_agent]:
_filter_tools(_agent)
sql_optimizer_team = Team(
name="SQL Optimization Team",
model=base_model,

View file

@ -41,6 +41,7 @@ class AgnoLLMTool(BaseLLMTool):
result_text = self._extract_text(response)
validated = self._validate_response(result_text)
self._log_response(validated)
self._log_usage_from_response(response, prompt, validated)
return validated
except Exception as e:
self._log_error(e)

View file

@ -5,6 +5,8 @@ This module provides a base class with common functionality for all LLM tools.
from abc import ABC
from typing import Any
import math
import os
from sql_optimizer_team.tools.engine.tools_api.llm_tool import LLMTool
from sql_optimizer_team.tools.engine.types.tool_exceptions import LLMProviderError
@ -91,6 +93,49 @@ class BaseLLMTool(LLMTool, ABC):
**kwargs,
)
def _estimate_tokens(self, text: str) -> int:
"""Best-effort token estimate when provider usage is unavailable."""
if not text:
return 0
return max(1, math.ceil(len(text) / 4))
def _log_usage_from_response(self, response_obj: Any, prompt: str, response_text: str) -> None:
"""Log token usage and cost if enabled.
Reads usage from ModelResponse when available, otherwise uses a rough estimate.
Cost is computed using env vars LLM_COST_INPUT_PER_1K and LLM_COST_OUTPUT_PER_1K.
"""
enabled = os.getenv("LLM_LOG_USAGE", "true").strip().lower() in {"1", "true", "yes", "on"}
if not enabled:
return
input_tokens = getattr(response_obj, "input_tokens", None)
output_tokens = getattr(response_obj, "output_tokens", None)
total_tokens = getattr(response_obj, "total_tokens", None)
if input_tokens is None:
input_tokens = self._estimate_tokens(prompt)
if output_tokens is None:
output_tokens = self._estimate_tokens(response_text)
if total_tokens is None and input_tokens is not None and output_tokens is not None:
total_tokens = input_tokens + output_tokens
cost_in = float(os.getenv("LLM_COST_INPUT_PER_1K", "0") or 0)
cost_out = float(os.getenv("LLM_COST_OUTPUT_PER_1K", "0") or 0)
cost_usd = None
if input_tokens is not None or output_tokens is not None:
cost_usd = (input_tokens or 0) * cost_in / 1000 + (output_tokens or 0) * cost_out / 1000
logger.info(
"LLM usage",
provider=self.provider_name,
model=self._model_name,
input_tokens=input_tokens,
output_tokens=output_tokens,
total_tokens=total_tokens,
cost_usd=cost_usd,
)
def _log_error(self, error: Exception, **kwargs: Any) -> None:
"""Log LLM error.

View file

@ -7,185 +7,17 @@ reducing code duplication and ensuring consistency.
from abc import ABC, abstractmethod
from string import Template
import importlib
from sql_optimizer_team.tools.engine.tools_api.prompt_tool import PromptGeneratorTool
SQL_TO_NATURAL_TEMPLATE = Template("""
You are an expert $database_name database analyst and performance specialist. Your task is to translate the SQL query below into a detailed, precise natural-language description that another agent will later use to reconstruct and optimize the query.
$database_name SQL Query:
```sql
$query
```
Your explanation must follow these requirements:
1. **Describe the overall purpose**
- Explain clearly what the query is intended to accomplish and why (retrieve data, update rows, aggregate information, validate existence, create structures, etc.).
2. **List ALL involved database objects**
Explicitly list every:
- Table
- View
- CTE (Common Table Expression)
- Subquery or derived table
- Function
- Stored procedure, if referenced
- Temporary table
- Schema-qualified object
Use the exact names as they appear in the query.
3. **Describe all essential operations**
Explicitly state, using exact column names:
- Columns retrieved or modified
- Join types, join conditions, and which objects participate
- Filters and conditions (WHERE, boolean logic, comparisons)
- Aggregations (SUM, COUNT, etc.)
- Grouping and HAVING clauses
- Sorting (ORDER BY)
- Window functions
- DISTINCT, TOP, LIMIT, OFFSET, pagination
- Any $database_name-specific features used$specific_features
4. **Maintain strict factual accuracy**
- Do NOT infer business meaning unless directly implied.
- Do NOT rename or paraphrase column names; repeat them exactly.
5. **Use clear, structured natural language**
- Provide a step-by-step explanation that makes every operation and purpose explicit.
- The output must be complete enough that the query can be reconstructed.
6. ** CRITICAL: Identify Performance Issues**
Flag any of these CRITICAL performance problems found in the query:
- **NO WHERE CLAUSE** (BE CAREFUL - AVOID FALSE POSITIVES):
* ONLY flag if the MAIN/OUTER SELECT has absolutely NO WHERE keyword with filtering conditions
* If query HAS 'WHERE' followed by conditions (even old-style JOINs in WHERE), DO NOT flag
* Subqueries/EXISTS having WHERE does NOT mean main query has no WHERE
* CROSS APPLY/LATERAL with internal WHERE counts as filtered
* If truly no WHERE: Flag as CRITICAL (causes FULL TABLE SCAN, no predicate pushdown)
- **Non-SARGable patterns**: Functions on indexed columns in WHERE/JOIN (e.g., YEAR(date), UPPER(col))
- **Leading wildcards**: LIKE '%value%' patterns that prevent index usage
- **Implicit conversions**: Type mismatches in comparisons
- **NOLOCK/WITH (NOLOCK) hints**: If query uses WITH (NOLOCK), WITH (nolock), WITH(NOLOCK), (NOLOCK), (nolock) or NOLOCK/nolock (any case) DO NOT REMOVE, but FLAG as **CRITICAL RISK**: "⚠️ WITH (NOLOCK) reads uncommitted/dirty data - CRITICAL: may cause INCORRECT FINANCIAL VALUES and data inconsistencies in production"
$analysis_requirements
Explanation:
""")
NATURAL_TO_SQL_TEMPLATE = Template("""
You are an expert $database_name SQL developer and query performance specialist.
Your task is to write an optimized SQL query based exclusively on the natural-language description provided below.
Description:
$explanation
CRITICAL RULES - READ BEFORE GENERATING SQL:
1. **PRESERVE ALL BUSINESS LOGIC EXACTLY**
- Every CASE WHEN statement must have IDENTICAL conditions and results
- Every calculated column must use IDENTICAL formulas
- Every subquery must query the SAME tables with SAME filters
- Do NOT simplify, merge, or "improve" business logic - even if it looks redundant
- If description mentions specific conditions (cd_tp_apolice = 2, etc.), preserve them EXACTLY
2. **PRESERVE ALL TABLES AND COLUMNS**
- Include EVERY table mentioned in the description
- Include EVERY column mentioned in the description
- Use EXACT column names as described (no renaming)
- Use EXACT table aliases as described
3. **Translate the full described logic into SQL**
- Implement all actions, operations, filters, joins, and conditions exactly as stated.
- Use every object and column referenced in the description, using their exact names.
- If the description mentions specific filter values (e.g., cd_tipo_endosso = 0), use those EXACT values
4. **Write optimized SQL while preserving semantics**
- Apply $database_name best practices for performance.
- Use indexing-aware filtering, efficient join strategies, and clear expressions.
- Implement aggregations, groupings, window functions, or pagination when described.
- Prefer performant constructs commonly recommended for $database_name workloads.
- OPTIMIZATION means structure/hints/indexes - NOT changing logic
5. **Use $database_name-specific syntax and features**
- Apply native functions, operators, optimizer behaviors, or hints when appropriate.
- Incorporate $specific_requirements if provided.
6. **Ensure logical fidelity - ZERO TOLERANCE FOR CHANGES**
- The SQL must reflect PRECISELY the behavior described
- Do NOT add logic not explicitly stated
- Do NOT omit any step described
- Do NOT infer or assume details beyond what is explicitly stated
- Do NOT "simplify" complex CASE statements
- Do NOT merge or combine separate calculated columns
7. **Self-Verification Checklist** (perform before outputting):
- [ ] All tables from description are present in query
- [ ] All columns from description are present in SELECT
- [ ] All CASE conditions match description exactly
- [ ] All subquery filters match description exactly
- [ ] All JOIN conditions match description exactly
- [ ] No business logic was simplified or changed
8. **Output format**
- Provide ONLY the final, optimized SQL query.
- Do NOT include explanations, comments, or extra text.
Optimized SQL Query:
""")
CONSERVATIVE_ANALYSIS_TEMPLATE = Template("""
You are an expert $database_name database analyst and performance specialist.
Your task is to ANALYZE the SQL query below and provide SUGGESTIONS for improvement.
CRITICAL: You must NOT rewrite or modify the query. Only provide analysis and suggestions.
$database_name SQL Query:
```sql
$query
```
Query Complexity Information:
- Columns: $column_count
- Tables: $table_count
- Subqueries: $subquery_count
- CASE statements: $case_count
- JOINs: $join_count
- Complexity Level: $complexity_level
Provide your analysis in the following structured format:
## PERFORMANCE ISSUES
List each performance issue found, with severity (CRITICAL/HIGH/MEDIUM/LOW):
- [SEVERITY] Issue description
- [SEVERITY] Issue description
## SUGGESTED INDEXES
List indexes that could improve this query:
- CREATE INDEX idx_name ON table(columns) -- Reason
## OPTIMIZATION SUGGESTIONS
List specific suggestions WITHOUT rewriting the query:
- Suggestion 1: Description of what could be improved and why
- Suggestion 2: Description of what could be improved and why
## RISK ASSESSMENT
- WITH (NOLOCK) usage: [Yes/No] - If yes, explain the risks
- Missing WHERE clause: [Yes/No] - If yes, explain the impact
- Implicit conversions: [Yes/No] - If yes, list them
## SUMMARY
Brief summary of the most important findings and priority order for addressing them.
Remember: DO NOT provide a rewritten query. Only analysis and suggestions.
""")
def _render_sql_to_natural(
database_name: str, query: str, specific_features: str = "", analysis_requirements: str = ""
) -> str:
return SQL_TO_NATURAL_TEMPLATE.substitute(
module = importlib.import_module("sql_optimizer_team.agents.sql_analyst_agent")
template_text = getattr(module, "SQL_TO_NATURAL_PROMPT")
return Template(template_text).substitute(
database_name=database_name,
query=query,
specific_features=f"\n{specific_features}" if specific_features else "",
@ -196,7 +28,9 @@ def _render_sql_to_natural(
def _render_natural_to_sql(
database_name: str, explanation: str, specific_requirements: str
) -> str:
return NATURAL_TO_SQL_TEMPLATE.substitute(
module = importlib.import_module("sql_optimizer_team.agents.sql_optimizer_agent")
template_text = getattr(module, "NATURAL_TO_SQL_PROMPT")
return Template(template_text).substitute(
database_name=database_name,
explanation=explanation,
specific_requirements="\n".join(
@ -215,7 +49,9 @@ def _render_conservative_analysis(
join_count: int = 0,
complexity_level: str = "unknown",
) -> str:
return CONSERVATIVE_ANALYSIS_TEMPLATE.substitute(
module = importlib.import_module("sql_optimizer_team.agents.conservative_analysis_agent")
template_text = getattr(module, "CONSERVATIVE_ANALYSIS_PROMPT")
return Template(template_text).substitute(
database_name=database_name,
query=query,
column_count=column_count,