SQL: Auxiliary Tables

Often, a query appears difficult or impossible because there isn’t sufficient information in the database to formulate it successfully. By making use of auxiliary tables, it is often possible to add the information in a suitable form, and hence simplify the query. It is often the failure to add data to a model that results in a non-SQL solution being developed; this solution is typically a serial one, and so on a data warehouse appliance much of the parallel processing power is lost.

For example if you want to see how orders placed during a year are distributed over time, you might analyse the number of orders placed each month by customer. This would identify customers placing a significant number of orders within a given month, but not those placing a similar number over a period that straddles two months.

To find details of orders placed during two-month periods, and identify clusters that extend into a second month, we can use an auxiliary table to introduce grouping that would be difficult to achieve with the original data.

The auxiliary table is created containing sets of 2-month periods and a textual string is included, so we can display the results more clearly:

CREATE TABLE twomonths(
  start_month		INTEGER 	NOT NULL, 
  end_month		INTEGER 	NOT NULL, 
  period		CHAR(7) 	NOT NULL,  
CHECK(start_month < end_month))

INSERT INTO twomonths VALUES(1, 2, 'jan-feb');
INSERT INTO twomonths VALUES(2, 3, 'feb-mar');
INSERT INTO twomonths VALUES(3, 4, 'mar-apr');
...
INSERT INTO twomonths VALUES(9, 10, 'sep-oct');
INSERT INTO twomonths VALUES(10, 11, 'oct-nov');
INSERT INTO twomonths VALUES(11, 12, 'nov-dec');

The time periods overlap, so an order appears in two time periods, unless it is placed in the first or last months (December or January) when it appears only once.

This query extracts customers who have placed more than 3 orders in any two-month period. The technique lets you extract, for example, "those customers who placed more than 3 orders in Feb-March".

SELECT custno, t1.period, t1.start_month, COUNT(*) AS orders
FROM orders, twomonths AS t1 
WHERE EXTRACT(MONTH FROM orderdate) BETWEEN t1.start_month AND t1.end_month
GROUP BY custno, t1.period, t1.start_month  
HAVING COUNT(t1.period) > 3 
ORDER BY 1, 3

Take care when looking at the results of this type of query, because some customer-order groups appear in two periods, even though the orders were placed in one month! The point of the exercise to be able to extract "those customers who placed more than three orders in Feb-March". (This may be because you have an advertising campaign running, for that period.) Summing the results of this query would be inappropriate, due to the duplication.