Pivot Tables in Databases
November 3, 2007
Sometimes there are situations where you have to pivot a table or view. It could be that you need to pivot columns into rows or pivot rows into columns. There are always elegant ways to accomplish database tasks but not all of us are that good. So we resort to brute force hacks. This post will explain the brute force methods which will give you a very good chance to accomplish your task. It won’t be pretty and it may not be efficient, but it gets the job done.
The reason for discussing pivots is to invite everyone to provide better ways to accomplish this cumbersome task. I have employed this technique on MySQL so it should work. If Oracle, MS SQL, DB2 and other commercial DBMSes have better ways to do this stuff, and you know how to, please share with us.
I don’t like to use keywords
INNER JOIN because to me it is more readable to have an equal to sign than inner join keywords.
Pivot Columns into Rows
Say you have a table such as this:
ID | STUDENT | GRADE1 | GRADE2 | GRADE3
Say you want to pivot it so that you get a view such as this:
ID | STUDENT | GRADE1
ID | STUDENT | GRADE2
ID | STUDENT | GRADE3
Your best friend in this case is
UNION. For example,
SELECT ID, STUDENT, GRADE1 FROM SOMETABLE UNION ALL
SELECT ID, STUDENT, GRADE2 FROM SOMETABLE UNION ALL
SELECT ID, STUDENT, GRADE3 FROM SOMETABLE;
There you have it. Columns have been pivoted to rows. It is a brute force method but doesn’t it seem logical enough?
Pivot Rows into Columns
This is where it gets tricky. It is not very easy to pivot rows into columns. However, it can be done. The secret ingredient is the use of aggregate functions along with
group by. The function I like to use is
min() along with an arbitrary rank assigned to every row based on how you want to pivot the data.
Let’s say you have a table such as this:
TARGET_PHONE_NUMBER | SERVICE_PROVIDER | RATE_PER_MINUTE | QUALITY_OF_SERVICE
If you need to generate a horizontal listing of carriers based on what they charge for a particular destination (target_phone_number), from the lowest (to the left of the listing) to the highest (to the right of the listing) cost, first assign a rank to each row.
In MySQL 5, what I did was create a table with the same structure as our example table, plus an
CREATE TABLE RANKING_TABLE LIKE ORIGINAL_TABLE;
ALTER TABLE RANKING_TABLE ADD COLUMN RANK INT AUTO_INCREMENT PRIMARY KEY;
The inserts, however, have to be grouped and ordered according to the way we wish to rank them. For example,
INSERT INTO RANKING_TABLE (TPN, SP, RPM, QOS) SELECT TARGET_PHONE_NUMBER, SERVICE_PROVIDER, RATE_PER_MINUTE, QUALITY_OF_SERVICE FROM SOMETABLE ORDER BY TARGET_PHONE_NUMBER, QUALITY_OF_SERVICE, RATE_PER_MINUTE, SERVICE_PROVIDER;
This will insert all data according to the order you defined and give a rank to each row. You will then have to group your data together based on TPN (TARGET_PHONE_NUMBER).
In other databases that provide you with
rank() functions, you may use those. Consider the brute method of ranking as a poor man’s ranking.
Once you have the rankings, you just need to create a mega-script to pivot based on ranks. The bad thing about this method is that as you add more rows to be pivoted, your script grows as well and it becomes very difficult to maintain after some point.
SELECT TPN, MIN(RANK) FROM RANKING_TABLE GROUP BY TPN;
This was your first pivot, where the one with the lowest rank is supposed to be the best phone company to deal with. The next best carrier we get by:
SELECT A.TPN, B.RANKONE, A.MIN(RANK) RANK_TWO FROM RANKING_TABLE A, (SELECT TPN, MIN(RANK) RANKONE FROM RANKING_TABLE GROUP BY TPN) B WHERE A.TPN = B.TPN AND A.RANK <> B.RANKONE GROUP BY A.TPN;
Here you have created one row with two phone service providers. The left-most is the best and the next one is one notch less. Now we add a third provider to the query.
SELECT C.TPN, D.RANKONE, D.RANKTWO, MIN(C.RANK) RANKTHREE FROM RANKING_TABLE C, (SELECT A.TPN, B.RANKONE, A.MIN(RANK) RANKTWO FROM RANKING_TABLE A, (SELECT TPN, MIN(A.RANK) RANKONE FROM RANKING_TABLE GROUP BY TPN) B WHERE A.TPN = B.TPN
AND A.RANK <> B.RANKONE
GROUP BY A.TPN) D WHERE C.TPN = D.TPN AND D.RANKONE <> C.RANK AND D.RANKTWO <> C.RANK GROUP BY C.TPN;
You can add more layers to this query as you pivot more and more rows. As I said, it can become very unwieldy as you need to create more and more columns. However, once you know one way to do something, you can always fall back on it as default.
People say I like to write mega-queries, and I do. I like the step-by-step way of going over the query later on or for debugging while I am writing it. Using temporary tables is also something I like to do as less of as possible. Why create tables when a query can work just as well?
I have read that DBMSs are now coming out with
pivot operators which are much cleaner to use. As soon as MySQL implements it, I would definitely like to check it out. Otherwise, I know that this method will work with or without pivot operators.
Brute force methods are very much possible to pivot tables. It takes a lot of analysis to come up with better ways to do the same thing, especially when it comes to customizing to your situation. If you know you can always rely on at least one solution to give you accurate results, you can use it to test whatever you need to test. For production, you may come up with a better alternative. If you do know to improve pivoting data in tables, please share it with the world.