26 lines
No EOL
15 KiB
JSON
26 lines
No EOL
15 KiB
JSON
{
|
||
"b6e1d99d9d5dd236": {
|
||
"query_hash": "b6e1d99d9d5dd236",
|
||
"original_query": "SELECT DISTINCT\n\nc.customer_id,\n\nc.first_name,\n\nc.last_name,\n\nc.email,\n\nCOUNT(o.order_id) as total_orders,\n\nSUM(CASE WHEN o.status = 'completed' THEN 1 ELSE 0 END) as completed_orders,\n\nSUM(CASE WHEN o.status = 'pending' THEN 1 ELSE 0 END) as pending_orders,\n\nAVG(o.total_amount) as avg_order_value,\n\nMAX(o.order_date) as last_order_date,\n\nCASE\n\nWHEN COUNT(o.order_id) > 50 THEN 'VIP'\n\nWHEN COUNT(o.order_id) > 20 THEN 'Premium'\n\nWHEN COUNT(o.order_id) > 5 THEN 'Regular'\n\nELSE 'New'\n\nEND as customer_tier\n\nFROM customers c\n\nLEFT JOIN orders o ON o.customer_id = c.customer_id\n\nWHERE c.customer_id IN (\n\nSELECT DISTINCT o19.customer_id\n\nFROM orders o19\n\nWHERE o19.order_date >= SYSDATE - 730\n\n)\n\nAND c.email LIKE '%@%'\n\nGROUP BY c.customer_id, c.first_name, c.last_name, c.email\n\nHAVING COUNT(o.order_id) > 0\n\nORDER BY COUNT(o.order_id) DESC",
|
||
"explanation": "1. **Overall Purpose**\nThe query is designed to retrieve a list of customers, their order details, and a classification of their customer tier based on the number of orders they have placed. The query is intended to provide a detailed view of customer activity, including the total number of orders, the number of completed and pending orders, the average order value, the date of the last order, and a classification of the customer based on their order volume.\n\n2. **Involved Database Objects**\nThe query involves two tables: `customers` and `orders`. It also includes a subquery on the `orders` table.\n\n3. **Essential Operations**\n- The query retrieves the following columns: `c.customer_id`, `c.first_name`, `c.last_name`, `c.email`, `total_orders`, `completed_orders`, `pending_orders`, `avg_order_value`, `last_order_date`, and `customer_tier`.\n- A LEFT JOIN is performed between the `customers` table (aliased as `c`) and the `orders` table (aliased as `o`) on the condition that `o.customer_id = c.customer_id`.\n- The WHERE clause filters the customers based on two conditions: the customer's ID must be in the list of customer IDs who have placed an order in the last two years (730 days), and the customer's email must contain an '@' symbol.\n- The COUNT function is used to calculate the total number of orders per customer (`total_orders`), and the SUM function is used with a CASE statement to calculate the number of completed and pending orders.\n- The AVG function is used to calculate the average order value (`avg_order_value`), and the MAX function is used to determine the date of the last order (`last_order_date`).\n- A CASE statement is used to classify customers into tiers based on the number of orders they have placed.\n- The GROUP BY clause groups the results by `c.customer_id`, `c.first_name`, `c.last_name`, and `c.email`.\n- The HAVING clause filters out customers who have not placed any orders.\n- The ORDER BY clause sorts the results in descending order based on the total number of orders.\n\n4. **Performance Issues**\n- The query uses a leading wildcard in the LIKE condition (`c.email LIKE '%@%'`), which can prevent the use of indexes and slow down the query.\n- The subquery in the WHERE clause could potentially be a performance issue, depending on the size of the `orders` table and the number of distinct customer IDs. It might be more efficient to use a JOIN instead.\n- The query uses multiple COUNT functions in the SELECT and ORDER BY clauses, which could be a performance issue if the number of rows is large. It might be more efficient to calculate the count once and store it in a variable or a temporary table.",
|
||
"database_type": "postgresql",
|
||
"version": "1.1",
|
||
"optimized_at": "2026-01-21T17:31:17.127080+00:00"
|
||
},
|
||
"60243f154ec1c6c5": {
|
||
"query_hash": "60243f154ec1c6c5",
|
||
"original_query": "SELECT DISTINCT\n\nc.customer_id,\n\nc.first_name,\n\nc.last_name,\n\nc.email,\n\n(SELECT COUNT() FROM orders o1 WHERE o1.customer_id = c.customer_id) as total_orders,\n\n(SELECT COUNT() FROM orders o2 WHERE o2.customer_id = c.customer_id AND o2.status = 'completed') as completed_orders,\n\n(SELECT COUNT() FROM orders o3 WHERE o3.customer_id = c.customer_id AND o3.status = 'pending') as pending_orders,\n\n(SELECT AVG(o5.total_amount) FROM orders o5 WHERE o5.customer_id = c.customer_id) as avg_order_value,\n\n(SELECT MAX(o6.order_date) FROM orders o6 WHERE o6.customer_id = c.customer_id) as last_order_date,\n\nCASE\n\nWHEN (SELECT COUNT() FROM orders o8 WHERE o8.customer_id = c.customer_id) > 50 THEN 'VIP'\n\nWHEN (SELECT COUNT() FROM orders o9 WHERE o9.customer_id = c.customer_id) > 20 THEN 'Premium'\n\nWHEN (SELECT COUNT() FROM orders o10 WHERE o10.customer_id = c.customer_id) > 5 THEN 'Regular'\n\nELSE 'New'\n\nEND as customer_tier\n\nFROM customers c\n\nWHERE c.customer_id IN (\n\nSELECT DISTINCT o19.customer_id\n\nFROM orders o19\n\nWHERE o19.order_date >= SYSDATE - 730\n\n)\n\nAND EXISTS (\n\nSELECT 1 FROM orders o21 WHERE o21.customer_id = c.customer_id\n\n)\n\nAND c.email LIKE '%@%'\n\nORDER BY (SELECT COUNT(*) FROM orders o22 WHERE o22.customer_id = c.customer_id) DESC.",
|
||
"explanation": "1. **Overall Purpose**\nThe query retrieves a list of customers and their order details from an Oracle database. It provides a summary of each customer's order history, including the total number of orders, the number of completed and pending orders, the average order value, the date of the last order, and a customer tier based on the total number of orders. The query is designed to only include customers who have placed an order in the last two years, have at least one order in the system, and have a valid email address.\n\n2. **Involved Database Objects**\nThe query involves two tables: `customers` and `orders`. There are no views, CTEs, stored procedures, temporary tables, or schema-qualified objects involved. The query uses several subqueries and a CASE statement.\n\n3. **Essential Operations**\n- Columns retrieved: `customer_id`, `first_name`, `last_name`, `email` from the `customers` table. The query also calculates and retrieves `total_orders`, `completed_orders`, `pending_orders`, `avg_order_value`, `last_order_date`, and `customer_tier`.\n- There are no joins in this query.\n- Filters and conditions: The query filters customers based on whether they have placed an order in the last two years (`o19.order_date >= SYSDATE - 730`), whether they have at least one order in the system (`EXISTS (SELECT 1 FROM orders o21 WHERE o21.customer_id = c.customer_id)`), and whether they have a valid email address (`c.email LIKE '%@%'`).\n- Aggregations: The query uses `COUNT()` to calculate the total, completed, and pending orders for each customer, `AVG()` to calculate the average order value, and `MAX()` to find the date of the last order.\n- Sorting: The query sorts the results by the total number of orders in descending order (`ORDER BY (SELECT COUNT(*) FROM orders o22 WHERE o22.customer_id = c.customer_id) DESC`).\n- The query uses the `DISTINCT` keyword to ensure that each customer appears only once in the results.\n- Oracle-specific features: The query uses the `SYSDATE` function to get the current date and time.\n\n4. **Performance Issues**\n- The query uses a leading wildcard in the `LIKE` operator (`c.email LIKE '%@%'`). This can prevent the use of an index and slow down the query.\n- The query uses multiple subqueries in the `SELECT` clause, which can be inefficient. These could potentially be replaced with JOINs or window functions for better performance.\n- The query uses a subquery in the `ORDER BY` clause, which can be inefficient. This could potentially be replaced with a calculated column in the main query.\n- The query uses a subquery in the `WHERE` clause to filter customers based on whether they have placed an order in the last two years. This could potentially be replaced with a JOIN for better performance.",
|
||
"database_type": "oracle",
|
||
"version": "1.1",
|
||
"optimized_at": "2026-01-21T17:34:23.063780+00:00"
|
||
},
|
||
"172701d41ec9ae46": {
|
||
"query_hash": "172701d41ec9ae46",
|
||
"original_query": "SELECT DISTINCT c.customer_id, c.first_name, c.last_name, c.email, (SELECT COUNT(*) FROM orders o1 WHERE o1.customer_id = c.customer_id) as total_orders, (SELECT COUNT(*) FROM orders o2 WHERE o2.customer_id = c.customer_id AND o2.status = 'completed') as completed_orders, (SELECT COUNT(*) FROM orders o3 WHERE o3.customer_id = c.customer_id AND o3.status = 'pending') as pending_orders, (SELECT AVG(o5.total_amount) FROM orders o5 WHERE o5.customer_id = c.customer_id) as avg_order_value, (SELECT MAX(o6.order_date) FROM orders o6 WHERE o6.customer_id = c.customer_id) as last_order_date, CASE WHEN (SELECT COUNT(*) FROM orders o8 WHERE o8.customer_id = c.customer_id) > 50 THEN 'VIP' WHEN (SELECT COUNT(*) FROM orders o9 WHERE o9.customer_id = c.customer_id) > 20 THEN 'Premium' WHEN (SELECT COUNT(*) FROM orders o10 WHERE o10.customer_id = c.customer_id) > 5 THEN 'Regular' ELSE 'New' END as customer_tier FROM customers c WHERE c.customer_id IN (SELECT DISTINCT o19.customer_id FROM orders o19 WHERE o19.order_date >= SYSDATE - 730) AND EXISTS (SELECT 1 FROM orders o21 WHERE o21.customer_id = c.customer_id) AND c.email LIKE '%@%' ORDER BY (SELECT COUNT(*) FROM orders o22 WHERE o22.customer_id = c.customer_id) DESC.",
|
||
"explanation": "Below is a precise, step‑by‑step natural‑language description of the SQL, including all objects, operations, and required performance flags.\n\n---\n\n## 1. Overall Purpose\nThe query retrieves a **distinct list of customers** from the `customers` table and computes several **per‑customer aggregated metrics** from the `orders` table (counts, average order value, last order date), and classifies each customer into a tier based on number of orders. It **filters** customers to those with at least one order and with an order in the last 730 days, and whose email contains an “@” character. The results are **sorted** by total number of orders in descending order.\n\n---\n\n## 2. All Involved Database Objects\n**Tables**\n- `customers` (alias: `c`)\n- `orders` (multiple correlated subqueries with aliases: `o1`, `o2`, `o3`, `o5`, `o6`, `o8`, `o9`, `o10`, `o19`, `o21`, `o22`)\n\n**Subqueries / Derived Queries**\n- Correlated scalar subqueries in the SELECT list:\n - `SELECT COUNT(*) FROM orders o1 WHERE o1.customer_id = c.customer_id` → `total_orders`\n - `SELECT COUNT(*) FROM orders o2 WHERE o2.customer_id = c.customer_id AND o2.status = 'completed'` → `completed_orders`\n - `SELECT COUNT(*) FROM orders o3 WHERE o3.customer_id = c.customer_id AND o3.status = 'pending'` → `pending_orders`\n - `SELECT AVG(o5.total_amount) FROM orders o5 WHERE o5.customer_id = c.customer_id` → `avg_order_value`\n - `SELECT MAX(o6.order_date) FROM orders o6 WHERE o6.customer_id = c.customer_id` → `last_order_date`\n - `CASE` expression uses multiple correlated counts:\n - `SELECT COUNT(*) FROM orders o8 WHERE o8.customer_id = c.customer_id`\n - `SELECT COUNT(*) FROM orders o9 WHERE o9.customer_id = c.customer_id`\n - `SELECT COUNT(*) FROM orders o10 WHERE o10.customer_id = c.customer_id`\n- Subquery in the WHERE clause:\n - `SELECT DISTINCT o19.customer_id FROM orders o19 WHERE o19.order_date >= SYSDATE - 730`\n- EXISTS subquery in WHERE:\n - `SELECT 1 FROM orders o21 WHERE o21.customer_id = c.customer_id`\n- ORDER BY subquery:\n - `SELECT COUNT(*) FROM orders o22 WHERE o22.customer_id = c.customer_id`\n\n**Functions**\n- `COUNT(*)`\n- `AVG(o5.total_amount)`\n- `MAX(o6.order_date)`\n- `SYSDATE` and arithmetic `SYSDATE - 730`\n\n**No CTEs, views, stored procedures, or temporary tables are referenced.**\n\n---\n\n## 3. Essential Operations (Exact Columns and Logic)\n\n### SELECT Clause (output columns)\nThe query returns **DISTINCT** rows with the following columns from `customers c` plus correlated aggregate values:\n1. `c.customer_id`\n2. `c.first_name`\n3. `c.last_name`\n4. `c.email`\n5. `total_orders` \n - computed as `COUNT(*)` from `orders o1` where `o1.customer_id = c.customer_id`\n6. `completed_orders` \n - computed as `COUNT(*)` from `orders o2` where `o2.customer_id = c.customer_id` and `o2.status = 'completed'`\n7. `pending_orders` \n - computed as `COUNT(*)` from `orders o3` where `o3.customer_id = c.customer_id` and `o3.status = 'pending'`\n8. `avg_order_value` \n - computed as `AVG(o5.total_amount)` from `orders o5` where `o5.customer_id = c.customer_id`\n9. `last_order_date` \n - computed as `MAX(o6.order_date)` from `orders o6` where `o6.customer_id = c.customer_id`\n10. `customer_tier` \n - computed by a `CASE` expression using repeated `COUNT(*)` correlated subqueries:\n - If `COUNT(*)` from `orders o8` where `o8.customer_id = c.customer_id` > 50 → `'VIP'`\n - Else if `COUNT(*)` from `orders o9` where `o9.customer_id = c.customer_id` > 20 → `'Premium'`\n - Else if `COUNT(*)` from `orders o10` where `o10.customer_id = c.customer_id` > 5 → `'Regular'`\n - Else → `'New'`\n\n### FROM Clause\n- Base table: `customers c`\n\n### WHERE Clause (filters)\nThe main query includes **three filters**:\n1. `c.customer_id IN (...)` \n - Subquery: `SELECT DISTINCT o19.customer_id FROM orders o19 WHERE o19.order_date >= SYSDATE - 730` \n - Filters customers to those with at least one order in the last 730 days.\n2. `EXISTS (SELECT 1 FROM orders o21 WHERE o21.customer_id = c.customer_id)` \n - Ensures that the customer has at least one order.\n3. `c.email LIKE '%@%'` \n - Filters customers whose `email` contains an “@”.\n\n### Aggregations\nAll aggregation is done through **correlated scalar subqueries** (COUNT, AVG, MAX). There is **no GROUP BY** in the outer query.\n\n### DISTINCT\n- `SELECT DISTINCT` is used on the full row (customer columns + computed aggregates).\n\n### ORDER BY\n- Results are sorted by:\n - `ORDER BY (SELECT COUNT(*) FROM orders o22 WHERE o22.customer_id = c.customer_id) DESC`\n - This is another correlated subquery that recomputes order count per customer.\n\n### Oracle‑specific features\n- `SYSDATE` is used for date arithmetic (`SYSDATE - 730`).\n\n---\n\n## 4. Performance Issues (Critical Flags)\n\n**⚠️ CRITICAL: Leading Wildcard in LIKE**\n- `c.email LIKE '%@%'` \n This begins with `%`, which prevents index usage on `email` and makes the predicate non‑sargable.\n\n**Other observations (not flagged as “critical” per instructions, but noteworthy):**\n- The query runs **many redundant correlated subqueries** (multiple `COUNT(*)` on `orders` for the same customer, including in the ORDER BY and CASE). This can lead to repeated scans of `orders` per customer and significant performance cost.\n\n---\n\n## 5. Summary for Reconstruction\nTo reconstruct the query: \n- Start with `SELECT DISTINCT` from `customers c`. \n- Return `c.customer_id`, `c.first_name`, `c.last_name`, `c.email`. \n- Add multiple correlated scalar subqueries on `orders` for counts, averages, and max dates. \n- Build a `CASE` expression that classifies customers based on three separate correlated `COUNT(*)` checks. \n- Apply WHERE conditions: `c.customer_id IN (SELECT DISTINCT o19.customer_id FROM orders o19 WHERE o19.order_date >= SYSDATE - 730)`, `EXISTS (SELECT 1 FROM orders o21 WHERE o21.customer_id = c.customer_id)`, and `c.email LIKE '%@%'`. \n- Order by a correlated count of orders descending.\n\n---\n\nIf you want optimization ideas or a rewritten version that removes redundant subqueries, I can provide that too.",
|
||
"database_type": "oracle",
|
||
"version": "1.1",
|
||
"optimized_at": "2026-01-21T17:49:44.839914+00:00"
|
||
}
|
||
} |