Parent-Child Hierarchy

The problem is thus: there are various users, each in a vertical hierarchy. For example, kinds of users or roles might be owner, manager, supervisor, and staff. Each user, except owner, has a parent. And each user, except staff, has a child. This way the hierarchy would be owner is parent of manager; manager is parent of supervisor; and supervisor is parent of staff. One user can be assigned one role only. So user John could be one of the roles mentioned. If we want to find the children of John, how can we do so?

First we create an example table in our database.

create table users (user varchar(50) unique, parent varchar(50));

Now let’s put some data in there:

insert into users values ('john', 'amy'), ('amy', null), ('adam', 'amy'), ('jane', 'john'), ('jennifer', 'john'), ('jethro', 'jane');

By looking at the data, we can see that John’s children are Jane, Jennifer, and Jethro. Why Jethro? Because Jane is a child of John, and Jethro is a child of Jane. So in effect, we are looking for all children of John, be they directly his children or the children of his children, and so on. Please note that we assume there are no many-to-many relationships here. So each user can only have one direct parent, while each parent may have many children (many-to-one).

The code in Python is implemented using SQLalchemy to connect to database. But the idea should remain the same no matter what technology you use. I use a very naive recursive implementation here, which should get you started.

def user_children(user=None, kid_list= None, session=None):
    query = session.query(users).filter_by(parent=user)
    children_list = []
    for result in query:
        children_list.append(result.username)
        kid_list.append(result.username)
    for name in children_list:
        user_children(name, kid_list, session)
    return kid_list

def main():
    kid_list = []
    # We assume session stuff for SQLalchemy has already been taken care of
    # prior to following statement
    session = mydb.mysession()
    all_children = user_children('john', kid_list, session)    
    session.close()
    return all_children

if __name__ == "__main__":
    main()

When we run the above code for user John, we get Jane, Jennifer, and Jethro.

Similarly, we can implement something which returns all parents of a user. Since there can only be one actual parent of a user, what we want is to continue upward until we get to a user who has no parent. The implementation is as below:

def user_parents(user=None, parent_list= None, session=None):
    query = session.query(users).filter_by(user=user)
    parents_list = []
    for result in query:
        parents_list.append(result.parent)
        # We don't want a NULL or None parent
        if result.parent:
            parent_list.append(result.parent)
    for name in parents_list:
        user_parents(name, parent_list, session)
    return parent_list

def main():
    parent_list = []
    # We assume session stuff for SQLalchemy has already been taken care of
    # prior to following statement
    session = mydb.mysession()
    all_parents = user_parents('jethro', parent_list, session)    
    session.close()
    return all_parents

if __name__ == "__main__":
    main()

When we run the above code for user Jethro, we get Jane, John, and Amy.

This seems to work for me. Of course, the data I tested this code on was very small. Don’t look at the efficiency of the code (although any suggestions on how to improve it are always welcome) but look the at general idea of how you might achieve the same results. I will also try to provide code for when there are many-to-many relationships.

Drop Unique Constraint in MySQL

If you would like to drop a unique constraint from your table in MySQL, there is a very simple way to do it. Thanks to how to drop unique constraint, you can do so as well. The trick is to know that you have to drop an index, instead of a unique constraint. Consider the following example:

create table mytable (
accountNumber tinyint unsigned not null,
authCode tinyint unsigned not null,
authorized enum ('NO', 'YES') not null default 'NO',
constraint uk_mytable unique(accountNumber, authCode));

In this example, I am setting a name (of my choosing) for the constraint. The reason is if I want to do anything with the constraint, I have an easily recognizable name for it. Now if you want to drop a unique constraint, you do this (tested in MySQL 5.0 and confirmed it works):

alter table mytable drop index uk_mytable;

Remember, do not do the following (or any variation of it) because it will give you SQL error:

alter table mytable drop unique constraint uk_mytable;

To recap, if you want to drop a unique constraint in MySQL, you have to drop an index rather than the unique constraint itself. The reason, I guess, is that MySQL consider a unique constraint to be an index.

MS SQL Server 2000: Create an Off-site Standby Server

This post is an extension to the “Poor Man’s Log Shipping” post written earlier on this blog. To summarize, the main server uses log shipping to maintain a standby server on-site. It also creates a nightly full backup and periodic backups of the transaction logs. I wrote a batch script to FTP these log backups once a day to an off-site location.

The reason for this was to create an off-site standby server. With daily log backups being received at this site, I just brought one of the nightly backups here. First things first: how do I get a multi-GB backup file to FTP over the Internet on a high-speed connection? It would take a long time. I could look at several options: FTP, BitTorrent, HTTP, or more. What I liked was the simplicity of FTP. All I had to do was compress the backup file and send it. However, even after compression, the size was multi-GB. So I used 7-zip to compress and also split the resulting file into 100MB chunks. Using the built-in command line FTP client in Windows and the mput command, I was able to transfer the data easily over a period of time. At the receiving end, I again used 7-zip to uncompress the data.

The next step was to restore the backups. First I needed to restore the full backup. I went with the GUI method through Enterprise Manager. All the options required are there but I wanted to understand the process and control it. Therefore, I abandoned the idea and tried the T-SQL approach. This was exactly what I was looking for. I got the best help from Microsoft’s Transact-SQL Reference for Restore.

First thing I needed to do was to get the names of the logical files in the full backup. This is necessary because of some reasons excellently mentioned in the Copying Databases article. The reason for me to do it was the directory structure was different in this server from the server where the backup was created. But how to do it? I got help from RESTORE FILELISTONLY. The actual command I used was this:

RESTORE FILELISTONLY FROM DISK = 'e:\fulldbbackup.bak';

It showed me logical files as well as the full path where the database would actually put the physical files. Since the path on this server was different from what the backup wanted, I had to make sure the database was restored to the correct path for this server. I had to specify exactly where to put the files during restore. The restore script I used was:

RESTORE DATABASE mydbname
FROM DISK = 'e:\fulldbbackup.bak'
WITH
MOVE 'datafile' TO 'e:\dbdata.mdf' ,
MOVE 'logfile' TO 'e:\dblogs.ldf' ,
STANDBY = 'e:\undofile.dat' ,
STATS = 10

I used STANDBY because I needed to restore subsequent transaction log backups. It took some time but the restoration completed. Then I needed to restore the log backups. One thing to remember is that logs need to be restored or applied in the sequence they were created. During my explorations, I noticed that if you try to apply a log backup that was created before the full backup was created, SQL Server will give an error and not proceed. If you apply a backup that has already been applied, it will process the backup but will also say that zero pages were processed. So it is my opinion that even if you make a mistake in applying the wrong log backup, it will not destroy your database. Of course, I did not skip a log backup and apply the next one so I cannot say what will happen if you do something like that. The script to restore one log backup is:

RESTORE LOG mydbname
FROM DISK = 'e:\logs\log1.trn'
WITH STANDBY = 'e:\log_undofile.dat',
STATS = 10

I had approximately two weeks worth of transaction log backups that needed to be restored. I could not manually change the name of the log file for each backup. So I thought of writing a script in Python to read the contents of the ‘e:\logs\’ directory and run the script each time with each file name in the directory. Since I am lazy, I sought an easier way. So I did the following:

In Windows command line, I ran:

dir e:\logs\ > e:\allfiles.txt

This created a list of all the files in that directory. But the format was what you would normally get using the dir command. So I used the find and replace feature of my text editor to replace all spaces with a semi-colon. Then I replaced multiple semi-colons with a single semi-colon. Something like:

Find: ‘ ‘ (it means a single space but without the quotes)
Replace: ;

And then

Find: ;;
Replace: ;

I continued replacing multiple semi-colons with a single semi-colon until I got just one after each data. I then opened this csv-type file in Excel (actually, it was OpenOffice.org’s Calc), copied the column with the file names, and then saved it in a text file.

Again find and replace came to help out. Each file was named like log1.trn, log2.trn, and so on. So I did this:

Find: log
Replace: RESTORE LOG mydbname FROM DISK = ‘e:\logs\log

And another find and replace was:

Find: .trn
Replace: .trn’ WITH STANDBY = ‘e:\log_undofile.dat’, STATS = 10

This created a file with scripts like so:

RESTORE LOG mydbname FROM DISK = 'e:\logs\log1.trn' WITH STANDBY = 'e:\log_undofile.dat', STATS = 10
RESTORE LOG mydbname FROM DISK = 'e:\logs\log2.trn' WITH STANDBY = 'e:\log_undofile.dat', STATS = 10

I saved and opened this file in SQL Analyzer and ran the script. Since there were a whole bunch of the log backup files, it took quite some time to finish the process.

After all the current backups were restored, I made a habit of collecting a week’s worth of log backups and applied them in a similar fashion.

I know this is a very manual process and I could write a Python script once to do all this stuff for me. I intend to write such a script but right now I do not have the time. Besides, this procedure was just for me to learn how to restore backups and then apply transaction log backups.

Some good resources include: (a) Using Standby Servers; (b) SQL Server 2000 Backup and Restore; (c) SQL Server 2000 Backup Types and Recovery Models;

Playing With MySQL Storage Engines in Ubuntu

To see which tables are using which storage engine, you may run the following command on your database

show table status from databasename;

Now you can decide how to manage your tables. A useful comparison of storage engines in MySQL is also available.

Set InnoDB as Default Storage Engine

If you want to use InnoDB as the default storage engine for MySQL in Ubuntu 7.10 (Gutsy Gibbon), you may do so easily. A forum post (Re: default table type = innodb is stable??) helped a lot in learning about this. First, stop the server

sudo /etc/init.d/mysql stop

Read the MySQL configuration file

cat /etc/mysql/my.cnf

Look for the following part in the configure file

# * Basic Settings
#
user = mysql

Edit the file

sudo vim /etc/mysql/my.cnf

And change the text (we need to look for) to

# * Basic Settings
#
user = mysql
default-storage-engine = InnoDB

Start the server

sudo /etc/init.d/mysql start

Now whenever you create a new table, it will use the InnoDB storage engine.

Change Storage Engine for Tables

A very good post (Convert a bunch of tables to InnoDB) helped me gain this knowledge. If you want to change the storage engine for tables, just run the following query on your database. If you want the table to use InnoDB use

alter table tablename type=InnoDB;

If you want the table to use MyISAM use

alter table tablename type=MyISAM;

Create a Table With Storage Engine

If you want to create a table and specify which storage engine to use, you may do so in the following way

create table dummytable1 (id int, name varchar (10)) type = InnoDB;
create table dummytable2 (id int, name varchar (10)) type = MyISAM;

Type or Engine?

You can replace type = InnoDB with engine = InnoDB. They are both the same (I think). It is up to you to decide.

Pivot Tables in Databases

Sometimes there are situations where you have to pivot a table or view. It could be that you need to pivot columns into rows or pivot rows into columns. There are always elegant ways to accomplish database tasks but not all of us are that good. So we resort to brute force hacks. This post will explain the brute force methods which will give you a very good chance to accomplish your task. It won’t be pretty and it may not be efficient, but it gets the job done.

The reason for discussing pivots is to invite everyone to provide better ways to accomplish this cumbersome task. I have employed this technique on MySQL so it should work. If Oracle, MS SQL, DB2 and other commercial DBMSes have better ways to do this stuff, and you know how to, please share with us.

I don’t like to use keywords INNER JOIN because to me it is more readable to have an equal to sign than inner join keywords.

Pivot Columns into Rows

Say you have a table such as this:

ID | STUDENT | GRADE1 | GRADE2 | GRADE3

Say you want to pivot it so that you get a view such as this:

ID | STUDENT | GRADE1

ID | STUDENT | GRADE2

ID | STUDENT | GRADE3

Your best friend in this case is UNION. For example,

SELECT ID, STUDENT, GRADE1 FROM SOMETABLE UNION ALL
SELECT ID, STUDENT, GRADE2 FROM SOMETABLE UNION ALL
SELECT ID, STUDENT, GRADE3 FROM SOMETABLE;

There you have it. Columns have been pivoted to rows. It is a brute force method but doesn’t it seem logical enough?

Pivot Rows into Columns

This is where it gets tricky. It is not very easy to pivot rows into columns. However, it can be done. The secret ingredient is the use of aggregate functions along with group by. The function I like to use is min() along with an arbitrary rank assigned to every row based on how you want to pivot the data.

Let’s say you have a table such as this:

TARGET_PHONE_NUMBER | SERVICE_PROVIDER | RATE_PER_MINUTE | QUALITY_OF_SERVICE

If you need to generate a horizontal listing of carriers based on what they charge for a particular destination (target_phone_number), from the lowest (to the left of the listing) to the highest (to the right of the listing) cost, first assign a rank to each row.

In MySQL 5, what I did was create a table with the same structure as our example table, plus an auto_increment column.

CREATE TABLE RANKING_TABLE LIKE ORIGINAL_TABLE;

ALTER TABLE RANKING_TABLE ADD COLUMN RANK INT AUTO_INCREMENT PRIMARY KEY;

The inserts, however, have to be grouped and ordered according to the way we wish to rank them. For example,

INSERT INTO RANKING_TABLE (TPN, SP, RPM, QOS) SELECT TARGET_PHONE_NUMBER, SERVICE_PROVIDER, RATE_PER_MINUTE, QUALITY_OF_SERVICE FROM SOMETABLE ORDER BY TARGET_PHONE_NUMBER, QUALITY_OF_SERVICE, RATE_PER_MINUTE, SERVICE_PROVIDER;

This will insert all data according to the order you defined and give a rank to each row. You will then have to group your data together based on TPN (TARGET_PHONE_NUMBER).

In other databases that provide you with rank() functions, you may use those. Consider the brute method of ranking as a poor man’s ranking.

Once you have the rankings, you just need to create a mega-script to pivot based on ranks. The bad thing about this method is that as you add more rows to be pivoted, your script grows as well and it becomes very difficult to maintain after some point.

SELECT TPN, MIN(RANK) FROM RANKING_TABLE GROUP BY TPN;

This was your first pivot, where the one with the lowest rank is supposed to be the best phone company to deal with. The next best carrier we get by:

SELECT A.TPN, B.RANKONE, A.MIN(RANK) RANK_TWO FROM RANKING_TABLE A, (SELECT TPN, MIN(RANK) RANKONE FROM RANKING_TABLE GROUP BY TPN) B WHERE A.TPN = B.TPN AND A.RANK <> B.RANKONE GROUP BY A.TPN;

Here you have created one row with two phone service providers. The left-most is the best and the next one is one notch less. Now we add a third provider to the query.

SELECT C.TPN, D.RANKONE, D.RANKTWO, MIN(C.RANK) RANKTHREE FROM RANKING_TABLE C, (SELECT A.TPN, B.RANKONE, A.MIN(RANK) RANKTWO FROM RANKING_TABLE A, (SELECT TPN, MIN(A.RANK) RANKONE FROM RANKING_TABLE GROUP BY TPN) B WHERE A.TPN = B.TPN AND A.RANK <> B.RANKONE GROUP BY A.TPN) D WHERE C.TPN = D.TPN AND D.RANKONE <> C.RANK AND D.RANKTWO <> C.RANK GROUP BY C.TPN;

You can add more layers to this query as you pivot more and more rows. As I said, it can become very unwieldy as you need to create more and more columns. However, once you know one way to do something, you can always fall back on it as default.

Conclusion

People say I like to write mega-queries, and I do. I like the step-by-step way of going over the query later on or for debugging while I am writing it. Using temporary tables is also something I like to do as less of as possible. Why create tables when a query can work just as well?

I have read that DBMSs are now coming out with pivot operators which are much cleaner to use. As soon as MySQL implements it, I would definitely like to check it out. Otherwise, I know that this method will work with or without pivot operators.

Brute force methods are very much possible to pivot tables. It takes a lot of analysis to come up with better ways to do the same thing, especially when it comes to customizing to your situation. If you know you can always rely on at least one solution to give you accurate results, you can use it to test whatever you need to test. For production, you may come up with a better alternative. If you do know to improve pivoting data in tables, please share it with the world.

Common Queries for MySQL 5

I found a very cool and helpful web page: Common Queries for MySQL 5. I found the idea very inspiring. So now I would like to emulate them. Whatever queries I write shall be posted here. Just look for the query category on this blog.