Create Sequence Diagrams Using seqdiag

Have you ever needed to create sequence diagrams? If you’re like me you constantly have to. Up to now I have always used a manual process using pen/paper or Visio. But I wanted an easier way and found seqdiag.

I installed it in a virtualenv on Ubuntu 14.04.

codeghar@host~$ virtualenv --system-site-packages virt

I used --system-site-packages because seqdiag relies on Python Imaging Library (PIL). Of course, make sure you have python-pil package installed in your OS. I ran into this problem (About the PIL Error — IOError: decoder zip not available) when I didn’t use --system-site-packages. The impression I got was it’s hard to get PIL installed in a virtualenv and easier to just use the system-wide PIL.

codeghar@host~$ source virt/bin/activate

(virt)codeghar@host~$ pip install seqdiag

To create a diagram you need to provide a dialog in a .diag file. You can look at the sample diagrams for more information. The following will create a new file (or overwrite an existing file) called example.png.

(virt)codeghar@host~$ seqdiag -Tpng --no-transparency example.diag

Hat tip to generate uml sequence diagrams with python or perl for bringing seqdiag to my attention.

Beginning AES with Python3

Encryption is a vast field and one post can never do it justice. But I’ll try to provide code examples on how to use the PyCrypto library to work with AES.

Disclaimer: My programming skills might not be up to par when it comes to encryption. Try to learn from my mistakes (when I make them).

Install the library in Fedora:

yum install python3 python3-crypto

Let’s just dive into the code. I have tried to make it simple and clear.

#!/usr/bin/env python3
from Crypto.Cipher import AES
from Crypto import Random
from base64 import b64encode, b64decode
from Crypto.Util import Counter
from binascii import hexlify

print('AES block size: {0}'.format(AES.block_size))
original_key = 'This is my k\u00eay!! The extra stuff will be truncated before using it.'
key = original_key.encode('utf-8')[0:32]
print('Original Key: {0}'.format(original_key))
print('Usable Key: {0}'.format(key))
print('Base64 Encoded key: {0}'.format(b64encode(key).decode('utf-8')))
message = '0123456789'.encode('utf-8')
print('Original Message: {0}'.format(message))


print('```MODE CFB```')
cfb_iv =
print('Base64 Encoded IV: {0}'.format(b64encode(cfb_iv).decode('utf-8')))

cfb_cipher_encrypt =, AES.MODE_CFB, cfb_iv)
cfb_msg_encrypt = b64encode(cfb_cipher_encrypt.encrypt(message))
print ('Mode CFB, Base64 Encoded, Encrypted message: {0}'.format( cfb_msg_encrypt.decode('utf-8')))

cfb_cipher_decrypt =, AES.MODE_CFB, cfb_iv)
cfb_msg_decrypt = cfb_cipher_decrypt.decrypt(b64decode(cfb_msg_encrypt)).decode('utf-8')
print('Mode CFB, Decrypted message: {0}'.format(cfb_msg_decrypt))


print('```MODE CTR```')
def ctr_pad_message(in_message):
    # We use PKCS7 padding
    length = 16 - (len(in_message) % 16)
    return (in_message + bytes([length])*length)
def ctr_unpad_message(in_message):
    return in_message[:-in_message[-1]]

ctr_iv = int(hexlify(, 16)
print('CTR IV (int): {0}'.format(ctr_iv))
ctr_encrypt_counter =, initial_value=ctr_iv)
ctr_decrypt_counter =, initial_value=ctr_iv)

ctr_padded_message = ctr_pad_message(message)
print('Mode CTR, Padded message: {0}'.format(ctr_padded_message))
ctr_cipher_encrypt =, AES.MODE_CTR, counter=ctr_encrypt_counter)
ctr_msg_encrypt = b64encode(ctr_cipher_encrypt.encrypt(ctr_padded_message))
print('Mode CTR, Base64 Encoded, Encrypted message: {0}'.format( ctr_msg_encrypt.decode('utf-8')))

ctr_cipher_decrypt =, AES.MODE_CTR, counter=ctr_decrypt_counter)
ctr_msg_decrypt = ctr_cipher_decrypt.decrypt(b64decode(ctr_msg_encrypt))
ctr_unpadded_message = ctr_unpad_message(ctr_msg_decrypt)
print('Mode CTR, Decrypted message: {0}'.format(ctr_msg_decrypt))
print('Mode CTR, Unpadded, Decrypted message: {0}'.format(ctr_unpadded_message))

Here I have provided examples for two modes: CFB and CTR. Although both of them should not use fixed blocks for some reason CTR expects you to use fixed blocks in this library.

A Python and Unicode Ahaa! Moment

EDIT (2013-03-08): Watch the presentation Pragmatic Unicode by Ned Batchelder and try to ignore this post. I wrote it when I had a lesser understanding of Unicode. In other words, this post is deprecated.

I get stumped every time I try to work with Unicode in Python. The biggest problems arise when trying to read files with Unicode data in them. Today was again a day when I found out that everything I know about Unicode is either completely misunderstood or I have forgotten. But after several hours of looking at various tutorials, code snippets, etc., I finally got my eureka moment.

When I write a text file with Unicode data in it, I always use the symbol (e.g. ㇹ) instead of its code (e.g. \u31f9). When I read this file in Python, I usually get some kind of error. I learned today that for my sanity I should use the code and not symbol when writing Unicode in text files. But which code? I use UTF-8 codes and Unicode 4.0 / ISO 10646 Plane 0 has a great list of them. Now when I read Unicode from file in Python, it reads it without problem.

This ties into JSON as well. In your JSON text, instead of writing symbols as we see them, write the hexadecimal code that computers see. I tried this technique with Python 3 on Windows 7 and Windows 2008 R2.

If you want to normalize Unicode data, use unicodedata. The function to use is normalize. I am still unclear on which supported “form” (‘NFC’, ‘NFKC’, ‘NFD’, ‘NFKD’) to use in which situations. But through trial and error I have settled on NFC because it retains the actual character (unlike NFD) and does not substitute the compatibility character with its equivalent (unlike NFKC and NFKD). You really do need to read more about the unicodedata to understand what I mean.

But it’s really that simple. Use UTF-8 hexadecimal code when writing text files and use NFC when reading files to normalize data. For example, if your file contains the following data:


Then your Python script should have something like:

import unicodedata
normalized_unicode = unicodedata.normalize('NFC', '\u2158\u31f9')

And when you display the data, it will show up as:


Generate HTML and PDF from DocBook in Fedora

DocBook is a widely-used format for writing documentation, articles, books, etc. For my purposes, I needed to generate XHTML and PDF files from documentation in DocBook format on a Fedora 16 server.


You need to install the following packages.

sudo yum install libxslt docbook5-style-xsl docbook-utils

Convert single DocBook file to XHTML

Now comes the conversion. Run xsltproc as below and it will create an HTML file (mybook.html in this case) in the current directory.

xsltproc -o mybook.html /usr/share/sgml/docbook/xsl-ns-stylesheets/xhtml-1_1/docbook.xsl mydocbook.xml

You can explore the /usr/share/sgml/docbook/xsl-ns-stylesheets/ path for more options.

Convert modular DocBook file to XHTML

You can create a modular DocBook document (a book in my case) by separating out chapters of the book into separate files and including them in the main file. For example, there’s only one chapter in my book so I’ll have two files: and docbook.chapter.xml. These two files would look something like the following:

An example of file

<?xml version="1.0" encoding="UTF-8"?>
<book xml:id="wikiply_doc" xmlns="" version="5.0" xmlns:xi="">
    <title>Sample Book</title>
            <para>Copyright 2011-2012 Code Ghar. All rights reserved.</para>
            <para>Redistribution and use in source (SGML DocBook) and 'compiled' forms (SGML, HTML, PDF, PostScript, RTF and so forth) with or without modification, are permitted.</para>
    <copyright><year>2012</year><holder>Code Ghar</holder></copyright>
    <xi:include href="docbook.chapter.xml" />

An example of file docbook.chapter.xml

<?xml version="1.0" encoding="UTF-8"?>
<chapter xml:id="installation" xmlns="" version="5.0" >
<title>Sample Chapter</title>
    <section xml:id="sample_chapter">
        <title>Sample Chapter</title>
        <para>This is example text in sample chapter</para>

Run xsltproc as below and it will create an HTML file (mybook.html in this case) in the current directory from both files.

xsltproc -xinclude -o mybook.html /usr/share/sgml/docbook/xsl-ns-stylesheets/xhtml-1_1/docbook.xsl

Note the use of the -xinclude flag in the command and the xi:include XML tag in the file. These two things make the magic of modular DocBook possible.

bash alias

Since I work with a DocBook book often, I have created a bash alias as below:

alias dbtohtml="xsltproc -xinclude -o /home/codeghar/book/mybook.html /usr/share/sgml/docbook/xsl-ns-stylesheets/xhtml-1_1/docbook.xsl /home/codeghar/book/; sed -e 's/</\n</g' -e 's/<meta name/\n<meta http-equiv=\"Content-Type\" content=\"text\/html; charset=utf-8\" \/> \n <meta name/g' -i /home/codeghar/book/mybook.html"

The generated file does not have the HTML meta tag to identify it as UTF-8 and so space characters display as the character  in the web browser. Therefore, sed is used to enter the appropriate meta tag in the file.

Convert DocBook to PDF

Using the same example files ( and docbook.chapter.xml), we will create a PDF instead of an XHTML file.

You need to install Apache FOP.

sudo yum install fop

Next you need to create an intermediate file ( as below.

xsltproc -xinclude -o /usr/share/sgml/docbook/xsl-ns-stylesheets/fo/docbook.xsl

Finally, run the following command to create the PDF file:

fop -pdf mybook.pdf

Hat Tips

DocBook Ubuntu Documentation; How to generate pdf from docbook 5.0; Getting Started with Docbook Book Authoring on Ubuntu; Writing Documentation; Playing With DocBook 5.0

Introduction to Python subprocess module

First off, head over to subprocess — Subprocess management to get all the details. This post will try to provide a gentle introduction to subprocess and my experience using it. There will be some suggestions here that I *think* are correct but be careful when you implement them in your code. Also remember that I wrote and tested this code using Python 3.1 on Debian Squeeze.

First off, I found it better to just use the Popen class and not the convenience functions provided. Using it helped me get a better handle on what’s going on. Second, learn the difference between Popen.wait() and Popen.communicate(). wait() basically sets Popen.returncode but keeps the stdout and stderr pipes as is. communicate() sets Popen.returncode but also returns stdout and stderr and closes the pipes so you can’t use them again as stdin for another command.

Third, use the shlex module so that you don’t have to fight with the command while creating a list to feed to args in Popen.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import subprocess
import sys
import shlex
command_line = "sed -e 's/^import dev as settings_file$/import production as settings_file/' test -i"
command_to_run = shlex.split(command_line)
print (command_to_run)
    command_run = subprocess.Popen(command_to_run, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
command_run_stdout, command_run_stderr = command_run.communicate()
print (command_run.returncode, command_run_stderr.decode('utf-8'))
print (command_run_stdout.decode('utf-8'))

The preceeding code sample is pretty self-explanatory. I used shlex to create a list from my command string, a list used in the Popen class. I set both stdout and stderr to send their output to pipes. command_run is an object representing the command I ran. Using communicate(), I get three things: returncode (set automatically), stdout (returned by communicate), and stderror (returned by communicate). Since command_run_stdout and command_run_stderr are byte strings, I convert them into UTF-8 before printing.

I will modify the preceeding code so that I can use the stdout and stderr as stdin for another command.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import subprocess
import sys
import shlex
command_line = "ls -l"
command_to_run = shlex.split(command_line)
print (command_to_run)
    command_run = subprocess.Popen(command_to_run, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
print (command_run.returncode)
command_to_run_2 = ["grep", "-i", "TOTAL"]
    command_run_2 = subprocess.Popen(command_to_run_2, stdin=command_run.stdout)

The biggest difference here was that I used wait() instead of communicate() so that I could use stdout as stdin for the second command.

If you are able to understand these things, I believe you are on your way to writing basic scripts that call out to the shell to do some tasks it’s best suited to do: run commands.

Delete Large List of Files

It all started when I was reading More Elegant Way To Delete Large List of Files? on reddit. Reading comments on the page led me to Perl to the rescue: case study of deleting a large directory. But me being a Python fan, I wasn’t satisfied with a Perl solution. My search led me to meeb’s comment on Quickest way to delete large amounts of files.

To summarize my quest for knowledge.

Using Perl: perl -e 'chdir "BADnew" or die; opendir D, "."; while ($n = readdir D) { unlink $n }'

Using Python:

#!/usr/bin/env python
import shutil

Using Bash:
Step 0: (optional) Create a list of files to delete (source: valadil’s comment and ensuing discussion). This step will help you figure out exactly what will be deleted.

find . -name "log*.xml" -exec echo rm -f {} \; > test_file;

Step 1: find . -type f -name "log*.xml" -print0 | xargs --null -n 100 rm

If it were up to me, I would use the Bash method as it’s easier for me to understand.

Extract data from PostgreSQL dump file

After taking a database dump from PostgreSQL using pg_dump, you may want to only get the schema or only the data. This script has been created and tested using Python versions 2.7 (Linux) and 3.2 (Windows), using a dump file from PostgreSQL version 9.0 (Linux).

Usage is simple. Provide an input dump file with the -f flag; output file with -o flag; and then choose either to extract/export data with -d flag or schema with -s flag. If you only want to extract data for certain tables, use the -t flag and provide a comma-separated list of table names. These table names should match exactly with what’s in the dump file.

I hope you find this script useful and can modify/extend it to your needs. If you have ideas on how to make this code better, please do not hesitate to share your ideas.

from re import search
import argparse
import codecs

parser = argparse.ArgumentParser(
    description='From a pgsql dump file, extract only the data to be inserted', 
parser.add_argument('-f', '--file', metavar='in-file', action='store', 
    dest='in_file_name', type=str, required=True, 
    help='Name of pgsql dump file')
parser.add_argument('-o', '--out-file', metavar='out-file', action='store', 
    dest='out_file_name', type=str, required=True, 
    help='Name of output file')
parser.add_argument('-d', '--data-only', action="store_true", default=False, 
    dest='data_only', required=False, 
    help='''Only data is extracted and schema is ignored. 
    If not specified, then -s must be specified.''')
parser.add_argument('-t', '--table-list', metavar='table-name-list', action='store', 
    dest='table_name_list', type=str, required=False, 
    help='''Optional: Command-separated list of table names to process. 
    Works only with -d flag.''')
parser.add_argument('-s', '--schema-only', action="store_true", default=False, 
    dest='schema_only', required=False, 
    help='''Only schema is extracted and data is ignored.
    If not specified, then -d must be specified.''')
args = parser.parse_args()

if args.data_only and args.schema_only:
    print ('Error: You can\'t provide -d and -s flags at the same time; choose only one')
elif args.data_only:
    data_only = True
    schema_only = False
    start_copy = False
elif args.schema_only:
    data_only = False
    schema_only = True
    start_copy = True
    print ('Error: Choose one of -d and -s flags')

print ('Processing File:', args.in_file_name)
input_file_name = args.in_file_name
output_file_name = args.out_file_name
table_name_list = args.table_name_list

if table_name_list:
    table_list = table_name_list.split(',')
    table_list = None

outfile =, "w", encoding="utf-8")
with, "r", encoding="utf-8") as infile:
    for line in infile:
        if data_only:
            if (not start_copy) and search('^COPY', line) and table_list:
                for table in table_list:
                    if search(''.join(['^COPY ', table.strip(), ' ']), line):
                        start_copy = True
            elif (not start_copy) and search('^COPY', line) and not table_list:
                start_copy = True
            elif start_copy and search('^\\\.', line):
                start_copy = False
            elif start_copy:
        elif schema_only:
            if start_copy and search('^COPY', line):
                start_copy = False
            elif (not start_copy) and search('^\\\.', line):
                start_copy = True
            elif start_copy:
print ('Done')