Sort String Based on a Substring Key in Python

I have a scenario where I need to sort a file based on a substring in each line. The format of each line is fixed, each line has a fixed length (with some exceptions due to error, etc.), and the file has to be sorted based on a substring key. Example lines could be as below:


I tried to use sort in bash but since there was no delimiter to break the lines, the -k flag did not seem to work. It wouldn’t sort on the value of, say -k 22,36, but instead would sort from the first character. I wanted to give awk a try but didn’t have much time to look at it. Since I already work with Python, I thought why not create a function to sort a list of strings based on a substring. Following is the result:

def sort_on_substr(string_list, pos_start, pos_end=None):
    sort_key_list = []
    sort_key_dict = {}
    for one_string in string_list:
        if not pos_end:
            pos_end_here = len(one_string)
        elif pos_end > len(one_string) or pos_end  len(one_string):
            pos_start_here = len(one_string)
            pos_end_here = pos_start_here
            pos_start_here = pos_start
        sort_key = one_string[pos_start_here:pos_end_here]
        if sort_key in sort_key_dict:
            templist = sort_key_dict[sort_key]
            sort_key_dict[sort_key] = templist
            templist = [one_string]
            sort_key_dict[sort_key] = templist
    sort_key_list_sorted = sorted(sort_key_list)
    return (sort_key_list_sorted, sort_key_dict)

To use this function you need to give a list of strings to sort, starting position of substring and ending position of substring (optional). It creates and returns two things: (1) a sorted list containing the substring keys; (2) a dictionary containing the string as value with the substring key as a key. Now you can iterate over the sorted list and stick it in the dictionary as a key to get the value. In my use case there could be one key in more than one strings so the dictionary stores the value as a list of strings. For me the order of this list does not matter because if the key is the same then the strings are equal and any one of them could come after the other in a sorted list.

An alternative to this approach is to forgo the sorted list of keys and instead sort the dictionary itself. Let’s leave it as an exercise for you in case you want to implement it that way.

Now all I have to do is supply a list of strings (maybe read from a file) and then give positions for the key. The result is a sorted list which I can then use as required.

Comments are closed.

%d bloggers like this: