feat: Enhance SQL optimization tools with internal knowledge base and observability features

- Updated README.md to include new setup instructions for RAG and observability. - Added internal knowledge base (KB) setup for SQL optimization team, supporting various document types. - Implemented token usage logging in LLM tools to track costs and usage. - Refactored SQL analysis and optimization prompts for clarity and consistency. - Introduced filtering of external tools based on environment configuration. - Enhanced conservative analysis agent with structured prompt for performance suggestions. - Updated requirements.txt to include new dependencies for RAG functionality. - Added internal KB helpers for building and attaching knowledge to agents.
2026-01-23 13:02:17 -03:00 · 2026-01-23 13:02:17 -03:00 · 80d1f9d26a
commit 80d1f9d26a
parent c6dd91810b
14 changed files with 502 additions and 355 deletions
--- a/README.md
+++ b/README.md
@ -25,15 +25,31 @@ src/
 1) Crie o ambiente e instale dependências:
   - `pip install -r requirements.txt`
-2) Configure variáveis de ambiente (exemplo em `sample.env`).
+2) Configure variáveis de ambiente (exemplo em `sample.env` ou `.env`).
 3) Execute o servidor:
-    - `PYTHONPATH=src python -m main`
+   - `./scripts/start.sh`
 Acesse:
 - `http://localhost:8204/docs` (Swagger UI)
 - `http://localhost:8204` (informações básicas da API)
 ## UI local (Agent UI)
 Use o **Agent UI** (agno-agi/agent-ui) como front local:
 1) Instale com o script oficial:
 - `npx create-agent-ui@latest`
 1) Inicie a UI:
 - `pnpm dev`
 1) Abra `http://localhost:3000` e ajuste o endpoint para `http://localhost:8204`.
 Opcional: se o AgentOS usar autenticação, configure `OS_SECURITY_KEY` conforme o README do Agent UI.
 ## Fluxo do time
 1) **Gestor** recebe a requisição e valida o contexto (banco + SQL).
@ -43,7 +59,23 @@ Acesse:
 5) **Conservative Analyst** (se solicitado) gera análise sem reescrever a query.
 6) **Gestor** consolida e entrega.
 ## RAG (KB interna)
 - Coloque documentos em `kb/` (md/txt/sql/pdf).
 - O RAG local usa Chroma + SentenceTransformers.
 - Variáveis principais:
  - `SQL_OPT_KB_PATH`, `SQL_OPT_KB_CHROMA_PATH`, `SQL_OPT_KB_DB_FILE`
  - `SQL_OPT_KB_EMBEDDER_ID`
  - `SQL_OPT_BLOCK_EXTERNAL_TOOLS=true` bloqueia ferramentas externas.
 ## Observabilidade de tokens/custos
 - Ative com `LLM_LOG_USAGE=true`.
 - Defina preços (USD por 1K tokens) com:
  - `LLM_COST_INPUT_PER_1K`
  - `LLM_COST_OUTPUT_PER_1K`
 ## Observações
- Use o modelo configurado em variáveis de ambiente (ex.: OpenAI, Gemini, Groq, etc.).
+- Use o provedor configurado em `.env` (ex.: Ollama local, OpenAI, Gemini, Groq, etc.).
 - O time é colaborativo e mantém histórico em SQLite (configurável via env).
--- a/docs/proposta-arquitetura-agno.md
+++ b/docs/proposta-arquitetura-agno.md
@ -124,6 +124,11 @@ Recomendação de ferramentas de mercado:
 - **Langfuse** ou **Phoenix** para rastreio de prompts, custos e latência.
 - **Grafana/Prometheus** para dashboards executivos.
 Status no POC:
 - **Logging de tokens/custos** já implementado via `LLM_LOG_USAGE` e custos por 1K tokens.
 - Métricas persistentes e dashboards (Grafana/Prometheus) permanecem como evolução.
 Métricas mínimas:
 - Tokens por request e por área.
@ -150,6 +155,11 @@ Métricas mínimas:
 - Curadoria contínua com feedback dos times para melhorar a relevância.
 - **Aumento de precisão**: respostas consistentes com políticas internas e padrões técnicos.
 Status no POC:
 - **RAG local** com base interna em `kb/` usando Chroma + SentenceTransformers.
 - **Bloqueio de ferramentas externas** por padrão via `SQL_OPT_BLOCK_EXTERNAL_TOOLS=true`.
 ## 10) Stack definitiva (100% Agno)
 - **Agno** como framework único para orquestração, memória e tools.
--- a/kb/README.md
+++ b/kb/README.md
@ -0,0 +1,18 @@
 # Base de Conhecimento Interna (KB)
 Coloque aqui documentos internos que devem ser usados no RAG.
 Suportado (por padrão):
 - Markdown (.md)
 - Texto (.txt)
 - SQL (.sql)
 - PDF (.pdf)
 Configurações via ambiente:
 - SQL_OPT_KB_PATH (padrão: kb)
 - SQL_OPT_KB_CHROMA_PATH (padrão: tmp/kb_chroma)
 - SQL_OPT_KB_EMBEDDER_ID (padrão: sentence-transformers/all-MiniLM-L6-v2)
 - SQL_OPT_KB_DB_FILE (padrão: tmp/sql_optimizer_kb.db)
 - SQL_OPT_BLOCK_EXTERNAL_TOOLS (padrão: true)
--- a/requirements.txt
+++ b/requirements.txt
@ -30,3 +30,7 @@ oracledb==3.4.1
 pymssql==2.3.11
 sqlparse==0.5.5
 sqlglot==28.6.0
 # RAG (local KB)
 chromadb==0.6.3
 sentence-transformers==3.4.1
--- a/sample.env
+++ b/sample.env
@ -15,3 +15,15 @@
 # SQL Optimizer Team
 SQL_OPT_TEAM_DB_FILE=tmp/sql_optimizer_team.db
 SQL_OPT_TEAM_DEBUG_MODE=false
 # Observabilidade de tokens/custos
 LLM_LOG_USAGE=true
 LLM_COST_INPUT_PER_1K=0
 LLM_COST_OUTPUT_PER_1K=0
 # RAG / KB interna
 SQL_OPT_KB_PATH=kb
 SQL_OPT_KB_CHROMA_PATH=tmp/kb_chroma
 SQL_OPT_KB_DB_FILE=tmp/sql_optimizer_kb.db
 SQL_OPT_KB_EMBEDDER_ID=sentence-transformers/all-MiniLM-L6-v2
 SQL_OPT_BLOCK_EXTERNAL_TOOLS=true
--- a/src/sql_optimizer_team/agents/conservative_analysis_agent.py
+++ b/src/sql_optimizer_team/agents/conservative_analysis_agent.py
@ -7,6 +7,53 @@ import os
 base_model = get_model()
 CONSERVATIVE_ANALYSIS_PROMPT = """
 You are an expert $database_name database analyst and performance specialist.
 Your task is to ANALYZE the SQL query below and provide SUGGESTIONS for improvement.
 ⚠️ CRITICAL: You must NOT rewrite or modify the query. Only provide analysis and suggestions.
 $database_name SQL Query:
 ```sql
 $query
 ```
 Query Complexity Information:
 - Columns: $column_count
 - Tables: $table_count
 - Subqueries: $subquery_count
 - CASE statements: $case_count
 - JOINs: $join_count
 - Complexity Level: $complexity_level
 Provide your analysis in the following structured format:
 ## PERFORMANCE ISSUES
 List each performance issue found, with severity (CRITICAL/HIGH/MEDIUM/LOW):
 - [SEVERITY] Issue description
 - [SEVERITY] Issue description
 ## SUGGESTED INDEXES
 List indexes that could improve this query:
 - CREATE INDEX idx_name ON table(columns) -- Reason
 ## OPTIMIZATION SUGGESTIONS
 List specific suggestions WITHOUT rewriting the query:
 - Suggestion 1: Description of what could be improved and why
 - Suggestion 2: Description of what could be improved and why
 ## RISK ASSESSMENT
 - WITH (NOLOCK) usage: [Yes/No] - If yes, explain the risks
 - Missing WHERE clause: [Yes/No] - If yes, explain the impact
 - Implicit conversions: [Yes/No] - If yes, list them
 ## SUMMARY
 Brief summary of the most important findings and priority order for addressing them.
 Remember: DO NOT provide a rewritten query. Only analysis and suggestions.
 """.strip()
 _db_path = os.getenv("SQL_OPT_TEAM_DB_FILE", "tmp/sql_optimizer_team.db")
 _debug_mode = os.getenv("SQL_OPT_TEAM_DEBUG_MODE", "false").strip().lower() in {"1", "true", "yes", "on"}
@ -30,52 +77,7 @@ conservative_analysis_agent = Agent(
        "- Solicite banco e SQL se não estiverem presentes.",
        "- Se o usuário fornecer um caminho de arquivo, use load_sql_from_file().",
        "- Use a template oficial abaixo para a análise conservadora (sem reescrever a SQL).",
-        """
+        CONSERVATIVE_ANALYSIS_PROMPT,
        You are an expert $database_name database analyst and performance specialist.
        Your task is to ANALYZE the SQL query below and provide SUGGESTIONS for improvement.
        ⚠️ CRITICAL: You must NOT rewrite or modify the query. Only provide analysis and suggestions.
        $database_name SQL Query:
        ```sql
        $query
        ```
        Query Complexity Information:
        - Columns: $column_count
        - Tables: $table_count
        - Subqueries: $subquery_count
        - CASE statements: $case_count
        - JOINs: $join_count
        - Complexity Level: $complexity_level
        Provide your analysis in the following structured format:
        ## PERFORMANCE ISSUES
        List each performance issue found, with severity (CRITICAL/HIGH/MEDIUM/LOW):
        - [SEVERITY] Issue description
        - [SEVERITY] Issue description
        ## SUGGESTED INDEXES
        List indexes that could improve this query:
        - CREATE INDEX idx_name ON table(columns) -- Reason
        ## OPTIMIZATION SUGGESTIONS
        List specific suggestions WITHOUT rewriting the query:
        - Suggestion 1: Description of what could be improved and why
        - Suggestion 2: Description of what could be improved and why
        ## RISK ASSESSMENT
        - WITH (NOLOCK) usage: [Yes/No] - If yes, explain the risks
        - Missing WHERE clause: [Yes/No] - If yes, explain the impact
        - Implicit conversions: [Yes/No] - If yes, list them
        ## SUMMARY
        Brief summary of the most important findings and priority order for addressing them.
        Remember: DO NOT provide a rewritten query. Only analysis and suggestions.
        """.strip(),
        "- NÃO reescreva a SQL em hipótese alguma.",
    ],
 )
--- a/src/sql_optimizer_team/agents/sql_analyst_agent.py
+++ b/src/sql_optimizer_team/agents/sql_analyst_agent.py
@ -1,18 +1,101 @@
 from agno.agent import Agent
 from agno.db.sqlite import SqliteDb
 from sql_optimizer_team.tools.engine.model_selector import get_model
 from sql_optimizer_team.tools.core_tools import explain_query_core
 from sql_optimizer_team.tools.prompt_tools import supported_databases
 from sql_optimizer_team.tools.sql_tools import load_sql_from_file, ensure_non_empty
 import os
 base_model = get_model()
 SQL_TO_NATURAL_PROMPT = """
 You are an expert $database_name database analyst and performance specialist. Your task is to translate the SQL query below into a detailed, precise natural-language description that another agent will later use to reconstruct and optimize the query.
 $database_name SQL Query:
 ```sql
 $query
 ```
 Your explanation must follow these requirements:
 1. **Describe the overall purpose**
 - Explain clearly what the query is intended to accomplish and why (retrieve data, update rows, aggregate information, validate existence, create structures, etc.).
 2. **List ALL involved database objects**
 Explicitly list every:
 - Table
 - View
 - CTE (Common Table Expression)
 - Subquery or derived table
 - Function
 - Stored procedure, if referenced
 - Temporary table
 - Schema-qualified object
 Use the exact names as they appear in the query.
 3. **Describe all essential operations**
 Explicitly state, using exact column names:
 - Columns retrieved or modified
 - Join types, join conditions, and which objects participate
 - Filters and conditions (WHERE, boolean logic, comparisons)
 - Aggregations (SUM, COUNT, etc.)
 - Grouping and HAVING clauses
 - Sorting (ORDER BY)
 - Window functions
 - DISTINCT, TOP, LIMIT, OFFSET, pagination
 - Any $database_name-specific features used$specific_features
 4. **Maintain strict factual accuracy**
 - Do NOT infer business meaning unless directly implied.
 - Do NOT rename or paraphrase column names; repeat them exactly.
 5. **Use clear, structured natural language**
 - Provide a step-by-step explanation that makes every operation and purpose explicit.
 - The output must be complete enough that the query can be reconstructed.
 6. **⚠️ CRITICAL: Identify Performance Issues**
 Flag any of these CRITICAL performance problems found in the query:
 - **NO WHERE CLAUSE** (BE CAREFUL - AVOID FALSE POSITIVES):
    * ONLY flag if the MAIN/OUTER SELECT has absolutely NO WHERE keyword with filtering conditions
    * If query HAS 'WHERE' followed by conditions (even old-style JOINs in WHERE), DO NOT flag
    * Subqueries/EXISTS having WHERE does NOT mean main query has no WHERE
    * CROSS APPLY/LATERAL with internal WHERE counts as filtered
    * If truly no WHERE: Flag as CRITICAL (causes FULL TABLE SCAN, no predicate pushdown)
 - **Non-SARGable patterns**: Functions on indexed columns in WHERE/JOIN (e.g., YEAR(date), UPPER(col))
 - **Leading wildcards**: LIKE '%value%' patterns that prevent index usage
 - **Implicit conversions**: Type mismatches in comparisons
 - **NOLOCK/WITH (NOLOCK) hints**: If query uses WITH (NOLOCK), WITH (nolock), WITH(NOLOCK), (NOLOCK), (nolock) or NOLOCK/nolock (any case) → DO NOT REMOVE, but FLAG as **CRITICAL RISK**: "⚠️ WITH (NOLOCK) reads uncommitted/dirty data - CRITICAL: may cause INCORRECT FINANCIAL VALUES and data inconsistencies in production"
 $analysis_requirements
 Explanation:
 """.strip()
 _db_path = os.getenv("SQL_OPT_TEAM_DB_FILE", "tmp/sql_optimizer_team.db")
 _debug_mode = os.getenv("SQL_OPT_TEAM_DEBUG_MODE", "false").strip().lower() in {"1", "true", "yes", "on"}
 db = SqliteDb(db_file=_db_path)
 async def explain_query_tool(
    database_type: str,
    sql: str,
    provider: str | None = None,
    model: str | None = None,
    temperature: float | None = None,
    max_tokens: int | None = None,
    api_key: str | None = None,
 ) -> dict[str, str]:
    from sql_optimizer_team.tools.core_tools import explain_query_core
    return await explain_query_core(
        database_type=database_type,
        sql=sql,
        provider=provider,
        model=model,
        temperature=temperature,
        max_tokens=max_tokens,
        api_key=api_key,
    )
 sql_analyst_agent = Agent(
    name="SQL Analyst",
    role=(
@ -20,7 +103,7 @@ sql_analyst_agent = Agent(
        "A saída deve seguir exatamente a prompt original (SQL → natural) do projeto oracle-sql-query-optimizer."
    ),
    model=base_model,
-    tools=[explain_query_core, load_sql_from_file, ensure_non_empty, supported_databases],
+    tools=[explain_query_tool, load_sql_from_file, ensure_non_empty, supported_databases],
    markdown=True,
    add_history_to_context=True,
    db=db,
@ -32,67 +115,7 @@ sql_analyst_agent = Agent(
        "- Se o usuário fornecer um caminho de arquivo, use load_sql_from_file().",
        "- Preferência: use explain_query_core(database_type, sql) para gerar a explicação via core de negócio.",
                "- Use a template oficial abaixo para estruturar a explicação (SQL → natural).",
-                """
+                SQL_TO_NATURAL_PROMPT,
                You are an expert $database_name database analyst and performance specialist. Your task is to translate the SQL query below into a detailed, precise natural-language description that another agent will later use to reconstruct and optimize the query.
                $database_name SQL Query:
                ```sql
                $query
                ```
                Your explanation must follow these requirements:
                1. **Describe the overall purpose**
                - Explain clearly what the query is intended to accomplish and why (retrieve data, update rows, aggregate information, validate existence, create structures, etc.).
                2. **List ALL involved database objects**
                Explicitly list every:
                - Table
                - View
                - CTE (Common Table Expression)
                - Subquery or derived table
                - Function
                - Stored procedure, if referenced
                - Temporary table
                - Schema-qualified object
                Use the exact names as they appear in the query.
                3. **Describe all essential operations**
                Explicitly state, using exact column names:
                - Columns retrieved or modified
                - Join types, join conditions, and which objects participate
                - Filters and conditions (WHERE, boolean logic, comparisons)
                - Aggregations (SUM, COUNT, etc.)
                - Grouping and HAVING clauses
                - Sorting (ORDER BY)
                - Window functions
                - DISTINCT, TOP, LIMIT, OFFSET, pagination
                - Any $database_name-specific features used$specific_features
                4. **Maintain strict factual accuracy**
                - Do NOT infer business meaning unless directly implied.
                - Do NOT rename or paraphrase column names; repeat them exactly.
                5. **Use clear, structured natural language**
                - Provide a step-by-step explanation that makes every operation and purpose explicit.
                - The output must be complete enough that the query can be reconstructed.
                6. **⚠️ CRITICAL: Identify Performance Issues**
                Flag any of these CRITICAL performance problems found in the query:
                - **NO WHERE CLAUSE** (BE CAREFUL - AVOID FALSE POSITIVES):
                    * ONLY flag if the MAIN/OUTER SELECT has absolutely NO WHERE keyword with filtering conditions
                    * If query HAS 'WHERE' followed by conditions (even old-style JOINs in WHERE), DO NOT flag
                    * Subqueries/EXISTS having WHERE does NOT mean main query has no WHERE
                    * CROSS APPLY/LATERAL with internal WHERE counts as filtered
                    * If truly no WHERE: Flag as CRITICAL (causes FULL TABLE SCAN, no predicate pushdown)
                - **Non-SARGable patterns**: Functions on indexed columns in WHERE/JOIN (e.g., YEAR(date), UPPER(col))
                - **Leading wildcards**: LIKE '%value%' patterns that prevent index usage
                - **Implicit conversions**: Type mismatches in comparisons
                - **NOLOCK/WITH (NOLOCK) hints**: If query uses WITH (NOLOCK), WITH (nolock), WITH(NOLOCK), (NOLOCK), (nolock) or NOLOCK/nolock (any case) → DO NOT REMOVE, but FLAG as **CRITICAL RISK**: "⚠️ WITH (NOLOCK) reads uncommitted/dirty data - CRITICAL: may cause INCORRECT FINANCIAL VALUES and data inconsistencies in production"
                $analysis_requirements
                Explanation:
                """.strip(),
        "- Entregue apenas a explicação natural estruturada conforme a prompt; não reescreva a SQL.",
        "- Identifique problemas críticos de performance conforme a prompt.",
    ],
--- a/src/sql_optimizer_team/agents/sql_optimizer_agent.py
+++ b/src/sql_optimizer_team/agents/sql_optimizer_agent.py
@ -1,18 +1,104 @@
 from agno.agent import Agent
 from agno.db.sqlite import SqliteDb
 from sql_optimizer_team.tools.engine.model_selector import get_model
 from sql_optimizer_team.tools.core_tools import optimize_query_core
 from sql_optimizer_team.tools.prompt_tools import supported_databases
 from sql_optimizer_team.tools.sql_tools import load_sql_from_file, ensure_non_empty
 import os
 base_model = get_model()
 NATURAL_TO_SQL_PROMPT = """
 You are an expert $database_name SQL developer and query performance specialist.
 Your task is to write an optimized SQL query based exclusively on the natural-language description provided below.
 Description:
 $explanation
 ⚠️ CRITICAL RULES - READ BEFORE GENERATING SQL:
 1. **PRESERVE ALL BUSINESS LOGIC EXACTLY**
 - Every CASE WHEN statement must have IDENTICAL conditions and results
 - Every calculated column must use IDENTICAL formulas
 - Every subquery must query the SAME tables with SAME filters
 - Do NOT simplify, merge, or "improve" business logic - even if it looks redundant
 - If description mentions specific conditions (cd_tp_apolice = 2, etc.), preserve them EXACTLY
 2. **PRESERVE ALL TABLES AND COLUMNS**
 - Include EVERY table mentioned in the description
 - Include EVERY column mentioned in the description
 - Use EXACT column names as described (no renaming)
 - Use EXACT table aliases as described
 3. **Translate the full described logic into SQL**
 - Implement all actions, operations, filters, joins, and conditions exactly as stated.
 - Use every object and column referenced in the description, using their exact names.
 - If the description mentions specific filter values (e.g., cd_tipo_endosso = 0), use those EXACT values
 4. **Write optimized SQL while preserving semantics**
 - Apply $database_name best practices for performance.
 - Use indexing-aware filtering, efficient join strategies, and clear expressions.
 - Implement aggregations, groupings, window functions, or pagination when described.
 - Prefer performant constructs commonly recommended for $database_name workloads.
 - OPTIMIZATION means structure/hints/indexes - NOT changing logic
 5. **Use $database_name-specific syntax and features**
 - Apply native functions, operators, optimizer behaviors, or hints when appropriate.
 - Incorporate $specific_requirements if provided.
 6. **Ensure logical fidelity - ZERO TOLERANCE FOR CHANGES**
 - The SQL must reflect PRECISELY the behavior described
 - Do NOT add logic not explicitly stated
 - Do NOT omit any step described
 - Do NOT infer or assume details beyond what is explicitly stated
 - Do NOT "simplify" complex CASE statements
 - Do NOT merge or combine separate calculated columns
 7. **Self-Verification Checklist** (perform before outputting):
 - [ ] All tables from description are present in query
 - [ ] All columns from description are present in SELECT
 - [ ] All CASE conditions match description exactly
 - [ ] All subquery filters match description exactly
 - [ ] All JOIN conditions match description exactly
 - [ ] No business logic was simplified or changed
 8. **Output format**
 - Provide ONLY the final, optimized SQL query.
 - Do NOT include explanations, comments, or extra text.
 Optimized SQL Query:
 """.strip()
 _db_path = os.getenv("SQL_OPT_TEAM_DB_FILE", "tmp/sql_optimizer_team.db")
 _debug_mode = os.getenv("SQL_OPT_TEAM_DEBUG_MODE", "false").strip().lower() in {"1", "true", "yes", "on"}
 db = SqliteDb(db_file=_db_path)
 async def optimize_query_tool(
    database_type: str,
    sql: str,
    provider: str | None = None,
    model: str | None = None,
    temperature: float | None = None,
    max_tokens: int | None = None,
    api_key: str | None = None,
    output_dir: str | None = None,
    no_review: bool = False,
 ) -> dict[str, str | dict[str, str]]:
    from sql_optimizer_team.tools.core_tools import optimize_query_core
    return await optimize_query_core(
        database_type=database_type,
        sql=sql,
        provider=provider,
        model=model,
        temperature=temperature,
        max_tokens=max_tokens,
        api_key=api_key,
        output_dir=output_dir,
        no_review=no_review,
    )
 sql_optimizer_agent = Agent(
    name="SQL Optimizer",
    role=(
@ -20,7 +106,7 @@ sql_optimizer_agent = Agent(
        "mantendo 100% da lógica e entregando apenas a SQL otimizada."
    ),
    model=base_model,
-    tools=[optimize_query_core, load_sql_from_file, ensure_non_empty, supported_databases],
+    tools=[optimize_query_tool, load_sql_from_file, ensure_non_empty, supported_databases],
    markdown=True,
    add_history_to_context=True,
    db=db,
@ -31,66 +117,7 @@ sql_optimizer_agent = Agent(
        "- Exija banco alvo e SQL antes de otimizar.",
        "- Use optimize_query_core(database_type, sql) para executar o core de negócio.",
        "- Use a template oficial abaixo para reescrever (natural → SQL) mantendo 100% da lógica.",
-        """
+        NATURAL_TO_SQL_PROMPT,
        You are an expert $database_name SQL developer and query performance specialist.
        Your task is to write an optimized SQL query based exclusively on the natural-language description provided below.
        Description:
        $explanation
        ⚠️ CRITICAL RULES - READ BEFORE GENERATING SQL:
        1. **PRESERVE ALL BUSINESS LOGIC EXACTLY**
        - Every CASE WHEN statement must have IDENTICAL conditions and results
        - Every calculated column must use IDENTICAL formulas
        - Every subquery must query the SAME tables with SAME filters
        - Do NOT simplify, merge, or "improve" business logic - even if it looks redundant
        - If description mentions specific conditions (cd_tp_apolice = 2, etc.), preserve them EXACTLY
        2. **PRESERVE ALL TABLES AND COLUMNS**
        - Include EVERY table mentioned in the description
        - Include EVERY column mentioned in the description
        - Use EXACT column names as described (no renaming)
        - Use EXACT table aliases as described
        3. **Translate the full described logic into SQL**
        - Implement all actions, operations, filters, joins, and conditions exactly as stated.
        - Use every object and column referenced in the description, using their exact names.
        - If the description mentions specific filter values (e.g., cd_tipo_endosso = 0), use those EXACT values
        4. **Write optimized SQL while preserving semantics**
        - Apply $database_name best practices for performance.
        - Use indexing-aware filtering, efficient join strategies, and clear expressions.
        - Implement aggregations, groupings, window functions, or pagination when described.
        - Prefer performant constructs commonly recommended for $database_name workloads.
        - OPTIMIZATION means structure/hints/indexes - NOT changing logic
        5. **Use $database_name-specific syntax and features**
        - Apply native functions, operators, optimizer behaviors, or hints when appropriate.
        - Incorporate $specific_requirements if provided.
        6. **Ensure logical fidelity - ZERO TOLERANCE FOR CHANGES**
        - The SQL must reflect PRECISELY the behavior described
        - Do NOT add logic not explicitly stated
        - Do NOT omit any step described
        - Do NOT infer or assume details beyond what is explicitly stated
        - Do NOT "simplify" complex CASE statements
        - Do NOT merge or combine separate calculated columns
        7. **Self-Verification Checklist** (perform before outputting):
        - [ ] All tables from description are present in query
        - [ ] All columns from description are present in SELECT
        - [ ] All CASE conditions match description exactly
        - [ ] All subquery filters match description exactly
        - [ ] All JOIN conditions match description exactly
        - [ ] No business logic was simplified or changed
        8. **Output format**
        - Provide ONLY the final, optimized SQL query.
        - Do NOT include explanations, comments, or extra text.
        Optimized SQL Query:
        """.strip(),
        "- Extraia e devolva SOMENTE optimized_query (sem explicações, sem markdown).",
        "- Preserve 100% da lógica, colunas, aliases, filtros, joins e subqueries.",
    ],
--- a/src/sql_optimizer_team/knowledge/init.py
+++ b/src/sql_optimizer_team/knowledge/init.py
@ -0,0 +1,5 @@
 """Internal knowledge base helpers."""
 from sql_optimizer_team.knowledge.internal_kb import build_internal_knowledge, attach_internal_knowledge
 __all__ = ["build_internal_knowledge", "attach_internal_knowledge"]
--- a/src/sql_optimizer_team/knowledge/internal_kb.py
+++ b/src/sql_optimizer_team/knowledge/internal_kb.py
@ -0,0 +1,100 @@
 """Internal KB (RAG) setup for the SQL optimizer team."""
 from __future__ import annotations
 from dataclasses import dataclass
 from pathlib import Path
 import os
 from agno.db.sqlite import SqliteDb
 from agno.knowledge.knowledge import Knowledge
 from agno.knowledge.embedder.sentence_transformer import SentenceTransformerEmbedder
 from agno.vectordb.chroma import ChromaDb
 from sql_optimizer_team.tools.engine.config.logger import get_logger
 logger = get_logger(__name__)
@dataclass(frozen=True)
 class InternalKBConfig:
    kb_path: Path
    chroma_path: Path
    embedder_id: str
    contents_db_file: Path
    block_external: bool
 def _load_config() -> InternalKBConfig:
    kb_path = Path(os.getenv("SQL_OPT_KB_PATH", "kb")).resolve()
    chroma_path = Path(os.getenv("SQL_OPT_KB_CHROMA_PATH", "tmp/kb_chroma")).resolve()
    embedder_id = os.getenv(
        "SQL_OPT_KB_EMBEDDER_ID",
        "sentence-transformers/all-MiniLM-L6-v2",
    ).strip()
    contents_db_file = Path(os.getenv("SQL_OPT_KB_DB_FILE", "tmp/sql_optimizer_kb.db")).resolve()
    block_external = os.getenv("SQL_OPT_BLOCK_EXTERNAL_TOOLS", "true").strip().lower() in {"1", "true", "yes", "on"}
    return InternalKBConfig(
        kb_path=kb_path,
        chroma_path=chroma_path,
        embedder_id=embedder_id,
        contents_db_file=contents_db_file,
        block_external=block_external,
    )
 def build_internal_knowledge() -> Knowledge:
    config = _load_config()
    if config.block_external:
        logger.info("External tools blocked for KB", kb_path=str(config.kb_path))
    embedder = SentenceTransformerEmbedder(id=config.embedder_id)
    vector_db = ChromaDb(
        name="sql-optimizer-kb",
        path=str(config.chroma_path),
        persistent_client=True,
        embedder=embedder,
    )
    contents_db = SqliteDb(db_file=str(config.contents_db_file))
    knowledge = Knowledge(
        name="internal-sql-kb",
        description="Base de conhecimento interna para otimização de SQL",
        vector_db=vector_db,
        contents_db=contents_db,
        max_results=6,
    )
    if not config.kb_path.exists():
        logger.warning("KB path not found; skipping ingest", kb_path=str(config.kb_path))
        return knowledge
    if config.block_external and not config.kb_path.is_dir():
        logger.warning("KB path is not a directory; skipping ingest", kb_path=str(config.kb_path))
        return knowledge
    try:
        knowledge.insert(
            path=str(config.kb_path),
            include=["**/*.md", "**/*.txt", "**/*.sql", "**/*.pdf"],
            exclude=["**/.git/**", "**/.venv/**", "**/__pycache__/**"],
            upsert=True,
            skip_if_exists=True,
        )
        logger.info("KB ingest complete", kb_path=str(config.kb_path))
    except Exception as exc:
        logger.error("KB ingest failed", error=str(exc))
    return knowledge
 def attach_internal_knowledge(knowledge: Knowledge, *agents: object) -> None:
    for agent in agents:
        try:
            setattr(agent, "knowledge", knowledge)
            setattr(agent, "add_knowledge_to_context", True)
            setattr(agent, "search_knowledge", True)
            setattr(agent, "update_knowledge", False)
        except Exception as exc:
            logger.warning("Failed to attach knowledge", agent=str(agent), error=str(exc))
--- a/src/sql_optimizer_team/team_app.py
+++ b/src/sql_optimizer_team/team_app.py
@ -2,13 +2,12 @@ from agno.team.team import Team
 from agno.os.app import AgentOS
 from agno.db.sqlite import SqliteDb
 from dotenv import load_dotenv
 from sql_optimizer_team.knowledge import build_internal_knowledge, attach_internal_knowledge
 from sql_optimizer_team.tools.engine.model_selector import get_model
-from sql_optimizer_team.agents import (
+from sql_optimizer_team.agents.sql_analyst_agent import sql_analyst_agent
-    sql_analyst_agent,
+from sql_optimizer_team.agents.sql_optimizer_agent import sql_optimizer_agent
-    sql_optimizer_agent,
+from sql_optimizer_team.agents.sql_quality_agent import sql_quality_agent
-    sql_quality_agent,
+from sql_optimizer_team.agents.conservative_analysis_agent import conservative_analysis_agent
    conservative_analysis_agent,
 )
 import os
 load_dotenv()
@ -20,6 +19,39 @@ _debug_mode = os.getenv("SQL_OPT_TEAM_DEBUG_MODE", "false").strip().lower() in {
 db = SqliteDb(db_file=_db_path)
 _kb = build_internal_knowledge()
 attach_internal_knowledge(
    _kb,
    sql_analyst_agent,
    sql_optimizer_agent,
    sql_quality_agent,
    conservative_analysis_agent,
 )
 _block_external_tools = os.getenv("SQL_OPT_BLOCK_EXTERNAL_TOOLS", "true").strip().lower() in {"1", "true", "yes", "on"}
 if _block_external_tools:
    _allowed_tool_names = {
        "explain_query_tool",
        "optimize_query_tool",
        "load_sql_from_file",
        "ensure_non_empty",
        "supported_databases",
        "diff_sql",
    }
    def _filter_tools(agent) -> None:
        if not getattr(agent, "tools", None):
            return
        filtered = []
        for tool in agent.tools:
            name = getattr(tool, "__name__", None) or getattr(tool, "name", None) or str(tool)
            if name in _allowed_tool_names:
                filtered.append(tool)
        agent.tools = filtered
    for _agent in [sql_analyst_agent, sql_optimizer_agent, sql_quality_agent, conservative_analysis_agent]:
        _filter_tools(_agent)
 sql_optimizer_team = Team(
    name="SQL Optimization Team",
    model=base_model,
--- a/src/sql_optimizer_team/tools/engine/llm_tools/agno_tool.py
+++ b/src/sql_optimizer_team/tools/engine/llm_tools/agno_tool.py
@ -41,6 +41,7 @@ class AgnoLLMTool(BaseLLMTool):
            result_text = self._extract_text(response)
            validated = self._validate_response(result_text)
            self._log_response(validated)
            self._log_usage_from_response(response, prompt, validated)
            return validated
        except Exception as e:
            self._log_error(e)
--- a/src/sql_optimizer_team/tools/engine/llm_tools/base_tool.py
+++ b/src/sql_optimizer_team/tools/engine/llm_tools/base_tool.py
@ -5,6 +5,8 @@ This module provides a base class with common functionality for all LLM tools.
 from abc import ABC
 from typing import Any
 import math
 import os
 from sql_optimizer_team.tools.engine.tools_api.llm_tool import LLMTool
 from sql_optimizer_team.tools.engine.types.tool_exceptions import LLMProviderError
@ -91,6 +93,49 @@ class BaseLLMTool(LLMTool, ABC):
            **kwargs,
        )
    def _estimate_tokens(self, text: str) -> int:
        """Best-effort token estimate when provider usage is unavailable."""
        if not text:
            return 0
        return max(1, math.ceil(len(text) / 4))
    def _log_usage_from_response(self, response_obj: Any, prompt: str, response_text: str) -> None:
        """Log token usage and cost if enabled.
        Reads usage from ModelResponse when available, otherwise uses a rough estimate.
        Cost is computed using env vars LLM_COST_INPUT_PER_1K and LLM_COST_OUTPUT_PER_1K.
        """
        enabled = os.getenv("LLM_LOG_USAGE", "true").strip().lower() in {"1", "true", "yes", "on"}
        if not enabled:
            return
        input_tokens = getattr(response_obj, "input_tokens", None)
        output_tokens = getattr(response_obj, "output_tokens", None)
        total_tokens = getattr(response_obj, "total_tokens", None)
        if input_tokens is None:
            input_tokens = self._estimate_tokens(prompt)
        if output_tokens is None:
            output_tokens = self._estimate_tokens(response_text)
        if total_tokens is None and input_tokens is not None and output_tokens is not None:
            total_tokens = input_tokens + output_tokens
        cost_in = float(os.getenv("LLM_COST_INPUT_PER_1K", "0") or 0)
        cost_out = float(os.getenv("LLM_COST_OUTPUT_PER_1K", "0") or 0)
        cost_usd = None
        if input_tokens is not None or output_tokens is not None:
            cost_usd = (input_tokens or 0) * cost_in / 1000 + (output_tokens or 0) * cost_out / 1000
        logger.info(
            "LLM usage",
            provider=self.provider_name,
            model=self._model_name,
            input_tokens=input_tokens,
            output_tokens=output_tokens,
            total_tokens=total_tokens,
            cost_usd=cost_usd,
        )
    def _log_error(self, error: Exception, **kwargs: Any) -> None:
        """Log LLM error.
--- a/src/sql_optimizer_team/tools/engine/prompt_tools/base_generator.py
+++ b/src/sql_optimizer_team/tools/engine/prompt_tools/base_generator.py
@ -7,185 +7,17 @@ reducing code duplication and ensuring consistency.
 from abc import ABC, abstractmethod
 from string import Template
 import importlib
 from sql_optimizer_team.tools.engine.tools_api.prompt_tool import PromptGeneratorTool
 SQL_TO_NATURAL_TEMPLATE = Template("""
 	You are an expert $database_name database analyst and performance specialist. Your task is to translate the SQL query below into a detailed, precise natural-language description that another agent will later use to reconstruct and optimize the query.
 	$database_name SQL Query:
 	```sql
 	$query
 	```
 	Your explanation must follow these requirements:
 	1. **Describe the overall purpose**
 	- Explain clearly what the query is intended to accomplish and why (retrieve data, update rows, aggregate information, validate existence, create structures, etc.).
 	2. **List ALL involved database objects**
 	Explicitly list every:
 	- Table
 	- View
 	- CTE (Common Table Expression)
 	- Subquery or derived table
 	- Function
 	- Stored procedure, if referenced
 	- Temporary table
 	- Schema-qualified object
 	Use the exact names as they appear in the query.
 	3. **Describe all essential operations**
 	Explicitly state, using exact column names:
 	- Columns retrieved or modified
 	- Join types, join conditions, and which objects participate
 	- Filters and conditions (WHERE, boolean logic, comparisons)
 	- Aggregations (SUM, COUNT, etc.)
 	- Grouping and HAVING clauses
 	- Sorting (ORDER BY)
 	- Window functions
 	- DISTINCT, TOP, LIMIT, OFFSET, pagination
 	- Any $database_name-specific features used$specific_features
 	4. **Maintain strict factual accuracy**
 	- Do NOT infer business meaning unless directly implied.
 	- Do NOT rename or paraphrase column names; repeat them exactly.
 	5. **Use clear, structured natural language**
 	- Provide a step-by-step explanation that makes every operation and purpose explicit.
 	- The output must be complete enough that the query can be reconstructed.
 	6. **⚠️ CRITICAL: Identify Performance Issues**
 	Flag any of these CRITICAL performance problems found in the query:
 	- **NO WHERE CLAUSE** (BE CAREFUL - AVOID FALSE POSITIVES):
 	  * ONLY flag if the MAIN/OUTER SELECT has absolutely NO WHERE keyword with filtering conditions
 	  * If query HAS 'WHERE' followed by conditions (even old-style JOINs in WHERE), DO NOT flag
 	  * Subqueries/EXISTS having WHERE does NOT mean main query has no WHERE
 	  * CROSS APPLY/LATERAL with internal WHERE counts as filtered
 	  * If truly no WHERE: Flag as CRITICAL (causes FULL TABLE SCAN, no predicate pushdown)
 	- **Non-SARGable patterns**: Functions on indexed columns in WHERE/JOIN (e.g., YEAR(date), UPPER(col))
 	- **Leading wildcards**: LIKE '%value%' patterns that prevent index usage
 	- **Implicit conversions**: Type mismatches in comparisons
 	- **NOLOCK/WITH (NOLOCK) hints**: If query uses WITH (NOLOCK), WITH (nolock), WITH(NOLOCK), (NOLOCK), (nolock) or NOLOCK/nolock (any case) → DO NOT REMOVE, but FLAG as **CRITICAL RISK**: "⚠️ WITH (NOLOCK) reads uncommitted/dirty data - CRITICAL: may cause INCORRECT FINANCIAL VALUES and data inconsistencies in production"
 	$analysis_requirements
 	Explanation:
 """)
 NATURAL_TO_SQL_TEMPLATE = Template("""
 	You are an expert $database_name SQL developer and query performance specialist.
 	Your task is to write an optimized SQL query based exclusively on the natural-language description provided below.
 	Description:
 	$explanation
 	⚠️ CRITICAL RULES - READ BEFORE GENERATING SQL:
 	1. **PRESERVE ALL BUSINESS LOGIC EXACTLY**
 	- Every CASE WHEN statement must have IDENTICAL conditions and results
 	- Every calculated column must use IDENTICAL formulas
 	- Every subquery must query the SAME tables with SAME filters
 	- Do NOT simplify, merge, or "improve" business logic - even if it looks redundant
 	- If description mentions specific conditions (cd_tp_apolice = 2, etc.), preserve them EXACTLY
 	2. **PRESERVE ALL TABLES AND COLUMNS**
 	- Include EVERY table mentioned in the description
 	- Include EVERY column mentioned in the description
 	- Use EXACT column names as described (no renaming)
 	- Use EXACT table aliases as described
 	3. **Translate the full described logic into SQL**
 	- Implement all actions, operations, filters, joins, and conditions exactly as stated.
 	- Use every object and column referenced in the description, using their exact names.
 	- If the description mentions specific filter values (e.g., cd_tipo_endosso = 0), use those EXACT values
 	4. **Write optimized SQL while preserving semantics**
 	- Apply $database_name best practices for performance.
 	- Use indexing-aware filtering, efficient join strategies, and clear expressions.
 	- Implement aggregations, groupings, window functions, or pagination when described.
 	- Prefer performant constructs commonly recommended for $database_name workloads.
 	- OPTIMIZATION means structure/hints/indexes - NOT changing logic
 	5. **Use $database_name-specific syntax and features**
 	- Apply native functions, operators, optimizer behaviors, or hints when appropriate.
 	- Incorporate $specific_requirements if provided.
 	6. **Ensure logical fidelity - ZERO TOLERANCE FOR CHANGES**
 	- The SQL must reflect PRECISELY the behavior described
 	- Do NOT add logic not explicitly stated
 	- Do NOT omit any step described
 	- Do NOT infer or assume details beyond what is explicitly stated
 	- Do NOT "simplify" complex CASE statements
 	- Do NOT merge or combine separate calculated columns
 	7. **Self-Verification Checklist** (perform before outputting):
 	- [ ] All tables from description are present in query
 	- [ ] All columns from description are present in SELECT
 	- [ ] All CASE conditions match description exactly
 	- [ ] All subquery filters match description exactly
 	- [ ] All JOIN conditions match description exactly
 	- [ ] No business logic was simplified or changed
 	8. **Output format**
 	- Provide ONLY the final, optimized SQL query.
 	- Do NOT include explanations, comments, or extra text.
 	Optimized SQL Query:
 """)
 CONSERVATIVE_ANALYSIS_TEMPLATE = Template("""
 	You are an expert $database_name database analyst and performance specialist.
 	Your task is to ANALYZE the SQL query below and provide SUGGESTIONS for improvement.
 	⚠️ CRITICAL: You must NOT rewrite or modify the query. Only provide analysis and suggestions.
 	$database_name SQL Query:
 	```sql
 	$query
 	```
 	Query Complexity Information:
 	- Columns: $column_count
 	- Tables: $table_count
 	- Subqueries: $subquery_count
 	- CASE statements: $case_count
 	- JOINs: $join_count
 	- Complexity Level: $complexity_level
 	Provide your analysis in the following structured format:
 	## PERFORMANCE ISSUES
 	List each performance issue found, with severity (CRITICAL/HIGH/MEDIUM/LOW):
 	- [SEVERITY] Issue description
 	- [SEVERITY] Issue description
 	## SUGGESTED INDEXES
 	List indexes that could improve this query:
 	- CREATE INDEX idx_name ON table(columns) -- Reason
 	## OPTIMIZATION SUGGESTIONS
 	List specific suggestions WITHOUT rewriting the query:
 	- Suggestion 1: Description of what could be improved and why
 	- Suggestion 2: Description of what could be improved and why
 	## RISK ASSESSMENT
 	- WITH (NOLOCK) usage: [Yes/No] - If yes, explain the risks
 	- Missing WHERE clause: [Yes/No] - If yes, explain the impact
 	- Implicit conversions: [Yes/No] - If yes, list them
 	## SUMMARY
 	Brief summary of the most important findings and priority order for addressing them.
 	Remember: DO NOT provide a rewritten query. Only analysis and suggestions.
 """)
 def _render_sql_to_natural(
 	database_name: str, query: str, specific_features: str = "", analysis_requirements: str = ""
 ) -> str:
-	return SQL_TO_NATURAL_TEMPLATE.substitute(
+	module = importlib.import_module("sql_optimizer_team.agents.sql_analyst_agent")
 	template_text = getattr(module, "SQL_TO_NATURAL_PROMPT")
 	return Template(template_text).substitute(
 		database_name=database_name,
 		query=query,
 		specific_features=f"\n{specific_features}" if specific_features else "",
@ -196,7 +28,9 @@ def _render_sql_to_natural(
 def _render_natural_to_sql(
 	database_name: str, explanation: str, specific_requirements: str
 ) -> str:
-	return NATURAL_TO_SQL_TEMPLATE.substitute(
+	module = importlib.import_module("sql_optimizer_team.agents.sql_optimizer_agent")
 	template_text = getattr(module, "NATURAL_TO_SQL_PROMPT")
 	return Template(template_text).substitute(
 		database_name=database_name,
 		explanation=explanation,
 		specific_requirements="\n".join(
@ -215,7 +49,9 @@ def _render_conservative_analysis(
 	join_count: int = 0,
 	complexity_level: str = "unknown",
 ) -> str:
-	return CONSERVATIVE_ANALYSIS_TEMPLATE.substitute(
+	module = importlib.import_module("sql_optimizer_team.agents.conservative_analysis_agent")
 	template_text = getattr(module, "CONSERVATIVE_ANALYSIS_PROMPT")
 	return Template(template_text).substitute(
 		database_name=database_name,
 		query=query,
 		column_count=column_count,