Extract data from PostgreSQL dump file

After taking a database dump from PostgreSQL using pg_dump, you may want to only get the schema or only the data. This script has been created and tested using Python versions 2.7 (Linux) and 3.2 (Windows), using a dump file from PostgreSQL version 9.0 (Linux).

Usage is simple. Provide an input dump file with the -f flag; output file with -o flag; and then choose either to extract/export data with -d flag or schema with -s flag. If you only want to extract data for certain tables, use the -t flag and provide a comma-separated list of table names. These table names should match exactly with what’s in the dump file.

I hope you find this script useful and can modify/extend it to your needs. If you have ideas on how to make this code better, please do not hesitate to share your ideas.


from re import search
import argparse
import codecs

script_version='0.0.1'
parser = argparse.ArgumentParser(
    description='From a pgsql dump file, extract only the data to be inserted', 
    version=script_version)
parser.add_argument('-f', '--file', metavar='in-file', action='store', 
    dest='in_file_name', type=str, required=True, 
    help='Name of pgsql dump file')
parser.add_argument('-o', '--out-file', metavar='out-file', action='store', 
    dest='out_file_name', type=str, required=True, 
    help='Name of output file')
parser.add_argument('-d', '--data-only', action="store_true", default=False, 
    dest='data_only', required=False, 
    help='''Only data is extracted and schema is ignored. 
    If not specified, then -s must be specified.''')
parser.add_argument('-t', '--table-list', metavar='table-name-list', action='store', 
    dest='table_name_list', type=str, required=False, 
    help='''Optional: Command-separated list of table names to process. 
    Works only with -d flag.''')
parser.add_argument('-s', '--schema-only', action="store_true", default=False, 
    dest='schema_only', required=False, 
    help='''Only schema is extracted and data is ignored.
    If not specified, then -d must be specified.''')
args = parser.parse_args()

if args.data_only and args.schema_only:
    print ('Error: You can\'t provide -d and -s flags at the same time; choose only one')
    exit()
elif args.data_only:
    data_only = True
    schema_only = False
    start_copy = False
elif args.schema_only:
    data_only = False
    schema_only = True
    start_copy = True
else:
    print ('Error: Choose one of -d and -s flags')
    exit()

print ('Processing File:', args.in_file_name)
input_file_name = args.in_file_name
output_file_name = args.out_file_name
table_name_list = args.table_name_list

if table_name_list:
    table_list = table_name_list.split(',')
else:
    table_list = None

outfile = codecs.open(output_file_name, "w", encoding="utf-8")
with codecs.open(input_file_name, "r", encoding="utf-8") as infile:
    for line in infile:
        if data_only:
            if (not start_copy) and search('^COPY', line) and table_list:
                for table in table_list:
                    if search(''.join(['^COPY ', table.strip(), ' ']), line):
                        start_copy = True
                        outfile.write(line)
                        break
            elif (not start_copy) and search('^COPY', line) and not table_list:
                start_copy = True
                outfile.write(line)
            elif start_copy and search('^\\\.', line):
                start_copy = False
                outfile.write(line)
            elif start_copy:
                outfile.write(line)
        elif schema_only:
            if start_copy and search('^COPY', line):
                start_copy = False
            elif (not start_copy) and search('^\\\.', line):
                start_copy = True
            elif start_copy:
                outfile.write(line)
print ('Done')
outfile.close()

PostgreSQL Tricks

Bash Alias to Login

I create a bash alias for servers I need to connect to via psql. This is a building block for other tricks in this post. Such an alias follows the following pattern:

alias codeghardb='psql --host 192.168.1.5 --dbname cgdb --username cgdbuser -W'

In the above statement, -W prompts for a password. Now whenever I need to use psql for a cgdb database, I just run codeghardb.

Load Data from file using copy command

First off, please understand that there is a difference between copy and \copy command. From what I understand, you should use \copy because it allows a regular user to insert data from file while copy only allows superuser to insert data from file. Also, copy is a server command and \copy is a client command.

You login to database via psql as below (we created an alias in the first step so we don’t have to enter these details every time)

codeghardb

and get psql prompt as below

cgdb=>

You can run the following command on the psql prompt:

cgdb=> \copy mytable (columnone, columntwo) from insert.txt with delimiter as '!'

We expect to have two columns in insert.txt, each delimited (or separated) by !, for the above command.

Run script from file in psql

Sometimes you create a SQL script with multiple statements and commands which need to be run through psql. We use the alias we created before, put the copy command we create previously in a file called script.sql. We also need to do some post-loading processing. So our script.sql may look like this:

--First insert the file
\copy mytable (columnone, columntwo) from insert.txt with delimiter as '!'
--Now update so that all empty string columns are made NULL
update mytable set columntwo = NULL where columntwo = ''
--We are done

All statements in the SQL script need to be on their own line. Run the script.sql file as below:

codeghar@codeghar:~$ codeghardb < script.sql

I learned this trick from Run external script in psql.

Hat Tips

Many thanks to COPY – PostgreSQL Wiki; Adding Data with INSERT and COPY; Using PostgreSQL’s COPY function effectively; Populating a Database;

Postgresql 8.3 on Debian Lenny

As I write this post, Debian Lenny is still Testing. I installed Lenny in VMware from a testing installer, and changed the sources.list file replacing lenny with testing. I will be installing Postgresql 8.3 by name, instead of the package Postgresql. As I learn more things, I will either create a new post or add to this one.

Install Postgresql

sudo aptitude install postgresql-8.3

This will install postgresql-client and OpenSSL (with some of its other packages). Location of Postgresql’s configuration files is /etc/postgresql/8.3/main/ and data is stored in /var/lib/postgres/8.3/main/.

Log in to Postgresql

It is installed under the user postgres and to use Postgresql client, you may need to be logged in with that user name. Two ways to do this are:

su -
su postgres

And the second way, if the user you are logged in as has sudo capabilities:

sudo su postgres

As you can see, there is no automatic way of using client unless you login as user postgres. But this can be changed. We can allow all users of the system to connect. By default, you should find a line local all all ident sameuser in the pg_hba.conf file, which means all users are allowed to access the database using Unix sockets (local users, not remote users). However, you are still not able to use the database as your own user. The reason is you need to add that user in the database as well.

As I see it, this is a good way of restricting access. Of course, you can still specify in pg_hba.conf file which users exactly are allowed and then also add only them to the database.

To add the user to the database, do the following:

su -
su postgres
createuser -P
[you will be asked to enter the password for the user, and also if this user should be a superuser]
exit

Now you can simply login using your own username. For example, if your user name is codeghar, then once you login to Debian with codeghar, you do this:

psql -W template1

You will be prompted for a password (because of the W) and then you will be using the client in the template1 database, which I think is provided by Postgresql by default.

Hat Tips

These sites helped a lot in not only getting me started, but also in writing this post: Installing Postgresql on Debian; PostgreSQL Database Server Configuration;