CLI First

A lengthy discussion over the past many days, reading stuff online, and my own experience has taught me one thing related to technology: CLI should be a first-class citizen for every technology and GUI should focus on design while using the CLI underneath. Let me explain a little more.

CLI has many advantages. It can be automated, scales very nicely, documentation is easier, and it can be mixed and matched by the operator while working rather than be designed during creation. It has more advantages but these should suffice for now. GUI, on the other hand, is easier to operate because of two main things: discoverability and guidance. But GUI doesn’t scale as easily as CLI does nor is it easy to automate.

In my ideal world, all tasks are first conceived and designed for CLI operation. For example, installing an application in your OS is designed for CLI only. Then build a GUI on top to perform the same CLI operations but doing so in a way that it guides the user through different steps. This is how apt-get and Synaptic inter-relate to each other in the Debian world.

Granted I’m coming at this from a sysadmin perspective but I really do believe it’s the right approach. Especially in an age when cloud computing is the next big thing. Scalability is the key here and automation is much more scalable when it’s CLI. CLI is quite efficient for a single system (especially for repeated tasks) and scales very well for hundreds and thousands of systems.

Once the functionality has been implemented in CLI, designers should take over the GUI and make it as efficient and pretty as possible. Decouple the two domains and let engineers develop CLI and designers develop GUI. Lots of collaboration between the two teams wouldn’t hurt either. Right now a lot of enterprise software is designed by engineers and then feels clunky and inefficient. Real designers with good ideas should solve that part. Keeps daily sysadmins happy and occasional sysadmins satisfied.

Advertisements

Delete Large List of Files

It all started when I was reading More Elegant Way To Delete Large List of Files? on reddit. Reading comments on the page led me to Perl to the rescue: case study of deleting a large directory. But me being a Python fan, I wasn’t satisfied with a Perl solution. My search led me to meeb’s comment on Quickest way to delete large amounts of files.

To summarize my quest for knowledge.

Using Perl: perl -e 'chdir "BADnew" or die; opendir D, "."; while ($n = readdir D) { unlink $n }'

Using Python:


#!/usr/bin/env python
import shutil
shutil.rmtree('/stuff/i/want/to/delete')

Using Bash:
Step 0: (optional) Create a list of files to delete (source: valadil’s comment and ensuing discussion). This step will help you figure out exactly what will be deleted.

find . -name "log*.xml" -exec echo rm -f {} \; > test_file;

Step 1: find . -type f -name "log*.xml" -print0 | xargs --null -n 100 rm

If it were up to me, I would use the Bash method as it’s easier for me to understand.

iptables Introduction

I have always wanted to learn how to write iptables rules for Linux. In my quest, I have used these resources to teach me what little I have learned so far: Hardening Linux; Iptables Tutorial; Ubuntu Setup;. This is an introduction to iptables.

First, we need to learn how to write an iptables rule. The general format would be

iptables table command chain match target

There are three main tables: filter, nat, and mangle. filter is considered to be default if you do not specify a table. So your first rule would start to look like

iptables -t filter

Commands include, but are not limited to, append, insert, delete, and replace. Let’s say we are adding a new rule so our command now looks like

iptables -t filter -A

There are three main chains: input, output, and forward. Input deals with all traffic incoming to the server, output is traffic generated by server, and forward is traffic not for the server but for some other machine. Let’s say we need to deal with incoming traffic. Now our command looks like

iptables -t filter -A INPUT

Matching is the heart and soul of the rule. The most common things in matching are interface, source IP address, source port, destination IP address, destination port, and protocol. Let’s say our example deals with incoming interface eth0, for HTTP from any computer. Our command may look like

iptables -t filter -A INPUT -i eth0 -dport 80 -p tcp

Since any computer may connect, we have left out source IP and source ports.

Last part is target. Most common targets are accept, reject, and drop. Since we are looking to accept HTTP traffic in our source example, we will use accept. Now our command looks like

iptables -t filter -A INPUT -i eth0 -dport 80 -p tcp -j ACCEPT

We have created our first iptables rule. It will accept all incoming web traffic on port 80. See, it isn’t too hard to get started with iptables.

Location of iptables Rules in CentOS

CentOS stores its rules in /etc/sysconfig/iptables

Location of iptables Rules in Ubuntu

By default, Ubuntu has a policy of accepting all incoming traffic. Therefore, there are no default iptable rules. However, if you want to create your own, then put them in a file and modify /etc/network/interfaces by adding the following line:

pre-up iptables-restore < /etc/iptables.up.rules

where iptables.up.rules is the file where all rules were stored.

Little Linux Commands

In this post I shall add little commands that one may forget but could be very useful. My goal is to collect commands for as many distributions are possible. Following distributions are very closely related to each other and, unless otherwise noted, commands specified for one may be run on all of them without modification.

Red Hat: CentOS, Fedora
Ubuntu: Debian

So, for example, if a command is given for Red Hat, it may be run on CentOS and Fedora. If, however, a command is given explicitly for CentOS, it may or may not run on Red Hat and Fedora. If no distribution is given, it is very likely that the command runs on all distributions.

Find Distribution Release Version

If you want to know the release version of a distribution, you may use following commands.

Red Hat, Ubuntu: tail /etc/issue
Red Hat: tail /etc/redhat-release
SUSE: tail /etc/SuSE-release
Ubuntu (gives more detail): tail /etc/lsb-release

Find Gateway of Network Interface

All:netstat -rn

Runlevel

A runlevel determines what services are started when computer boots up. To find at what runlevel your computer is running at this moment, type the following.

All: runlevel

Output should look something like

N 5

Where N is the previous runlevel and 5 is current runlevel.

Packages Installed

Red Hat: yum list installed
Ubuntu: dpkg -l

Split Files

If you want to split a file into many smaller parts, use this (hat tip: How do I open a 2.5 gig .xml file?):

split -l 50 myfile.txt mynewfile

Show All Users

If you want to list all users of the system, whether they are logged in or not, run the following command. It uses the cut command on the /etc/passwd file.

All: cut -d: -f 1 /etc/passwd

Hat tip for this trick: How to list all your users; man cut.

Lock root

If you want to lock or disable root user, or any other user for that matter, do the following (replace root with the user you want to lock):

All: sudo passwd -l root

Another way to lock a user is to do the following:

All: sudo usermod -L root

Similarly, to unlock a user:

All: sudo usermod -U root

Securely Copy Directory from Remote Server

If you want to use SCP to copy a whole directory from a remote server to your current directory on local machine, do the following:

All: scp -r user@host:/home/me/mydir/. .

The first dot in the path of the remote server tells it to copy all files and folders in the /home/me/ directory, even hidden files and directories. If you use asterisk instead of dot, it will not copy hidden stuff. The second dot means copy everything to the current directory on local machine.

Hat tip for this trick: Moving /home data from old system to new Linux system.

Support for Virtualization in Processor

To see whether your current processor supports Intel-VT (vmx) or AMD-V (svm) virtualization, run the following:

All: egrep -e 'vmx|svm' /proc/cpuinfo

Thanks to CentOS 5 Xen Virtualization.

What packages are in package group

In Debian, tasksel has various groups of packages, such as Standard, Laptop, etc. But what do these groups contain? Thanks to a post by yankovic_yeah, we know.

Ubuntu: aptitude search $(tasksel --task-packages standard)

Fully Qualified Domain Name

A Fully Qualified Domain Name (FQDN) is the complete and unique name of a specific computer. It consists of two parts: computer’s host name and domain name. An example would be myserver.example.com. where myserver is the unique name of the computer and example.com is the domain name. There can be many computers called myserver but there can only be one computer called myserver.example.com.

Note that myserver.example.com. has a period at the end of com. So it is .com. and not .com only. The last period differentiates between an FQDN and a regular domain name. If your domain name is longer, say firstnetwork.example.com, then myserver.firstnetwork.example.com. would be your FQDN.

In Linux, you may edit your /etc/hosts file and put your FQDN in it this way:

127.0.0.1 localhost.localdomain localhost
127.0.0.1 myserver.example.com. myserver

The above example shows that both localhost.localdomain and myserver.example.com. are FQDNs for the same server. Adding localhost and myserver at the end of these lines means we have an alias for the FQDN. This alias has to be the same as your computer’s hostname.

To really comprehend the difference between an unqualified and a qualified domain name, there is a simple test. Using the /etc/hosts as above, you could do the following:

ping -c 3 localhost

ping -c 3 localhost.localdomain

ping -c 3 myserver

ping -c 3 myserver.example.com.

All of the above examples will show the destination IP to be 127.0.0.1 and it tells you that the configuration is correct

ping -c 3 myserver.example.com

Now if you do not have myserver.example.com in the Internet DNS, doing the above will not ping any computer. (Note the absence of a period at the of domain name).

MS SQL Server 2000: Create an Off-site Standby Server

This post is an extension to the “Poor Man’s Log Shipping” post written earlier on this blog. To summarize, the main server uses log shipping to maintain a standby server on-site. It also creates a nightly full backup and periodic backups of the transaction logs. I wrote a batch script to FTP these log backups once a day to an off-site location.

The reason for this was to create an off-site standby server. With daily log backups being received at this site, I just brought one of the nightly backups here. First things first: how do I get a multi-GB backup file to FTP over the Internet on a high-speed connection? It would take a long time. I could look at several options: FTP, BitTorrent, HTTP, or more. What I liked was the simplicity of FTP. All I had to do was compress the backup file and send it. However, even after compression, the size was multi-GB. So I used 7-zip to compress and also split the resulting file into 100MB chunks. Using the built-in command line FTP client in Windows and the mput command, I was able to transfer the data easily over a period of time. At the receiving end, I again used 7-zip to uncompress the data.

The next step was to restore the backups. First I needed to restore the full backup. I went with the GUI method through Enterprise Manager. All the options required are there but I wanted to understand the process and control it. Therefore, I abandoned the idea and tried the T-SQL approach. This was exactly what I was looking for. I got the best help from Microsoft’s Transact-SQL Reference for Restore.

First thing I needed to do was to get the names of the logical files in the full backup. This is necessary because of some reasons excellently mentioned in the Copying Databases article. The reason for me to do it was the directory structure was different in this server from the server where the backup was created. But how to do it? I got help from RESTORE FILELISTONLY. The actual command I used was this:

RESTORE FILELISTONLY FROM DISK = 'e:\fulldbbackup.bak';

It showed me logical files as well as the full path where the database would actually put the physical files. Since the path on this server was different from what the backup wanted, I had to make sure the database was restored to the correct path for this server. I had to specify exactly where to put the files during restore. The restore script I used was:

RESTORE DATABASE mydbname
FROM DISK = 'e:\fulldbbackup.bak'
WITH
MOVE 'datafile' TO 'e:\dbdata.mdf' ,
MOVE 'logfile' TO 'e:\dblogs.ldf' ,
STANDBY = 'e:\undofile.dat' ,
STATS = 10

I used STANDBY because I needed to restore subsequent transaction log backups. It took some time but the restoration completed. Then I needed to restore the log backups. One thing to remember is that logs need to be restored or applied in the sequence they were created. During my explorations, I noticed that if you try to apply a log backup that was created before the full backup was created, SQL Server will give an error and not proceed. If you apply a backup that has already been applied, it will process the backup but will also say that zero pages were processed. So it is my opinion that even if you make a mistake in applying the wrong log backup, it will not destroy your database. Of course, I did not skip a log backup and apply the next one so I cannot say what will happen if you do something like that. The script to restore one log backup is:

RESTORE LOG mydbname
FROM DISK = 'e:\logs\log1.trn'
WITH STANDBY = 'e:\log_undofile.dat',
STATS = 10

I had approximately two weeks worth of transaction log backups that needed to be restored. I could not manually change the name of the log file for each backup. So I thought of writing a script in Python to read the contents of the ‘e:\logs\’ directory and run the script each time with each file name in the directory. Since I am lazy, I sought an easier way. So I did the following:

In Windows command line, I ran:

dir e:\logs\ > e:\allfiles.txt

This created a list of all the files in that directory. But the format was what you would normally get using the dir command. So I used the find and replace feature of my text editor to replace all spaces with a semi-colon. Then I replaced multiple semi-colons with a single semi-colon. Something like:

Find: ‘ ‘ (it means a single space but without the quotes)
Replace: ;

And then

Find: ;;
Replace: ;

I continued replacing multiple semi-colons with a single semi-colon until I got just one after each data. I then opened this csv-type file in Excel (actually, it was OpenOffice.org’s Calc), copied the column with the file names, and then saved it in a text file.

Again find and replace came to help out. Each file was named like log1.trn, log2.trn, and so on. So I did this:

Find: log
Replace: RESTORE LOG mydbname FROM DISK = ‘e:\logs\log

And another find and replace was:

Find: .trn
Replace: .trn’ WITH STANDBY = ‘e:\log_undofile.dat’, STATS = 10

This created a file with scripts like so:

RESTORE LOG mydbname FROM DISK = 'e:\logs\log1.trn' WITH STANDBY = 'e:\log_undofile.dat', STATS = 10
RESTORE LOG mydbname FROM DISK = 'e:\logs\log2.trn' WITH STANDBY = 'e:\log_undofile.dat', STATS = 10

I saved and opened this file in SQL Analyzer and ran the script. Since there were a whole bunch of the log backup files, it took quite some time to finish the process.

After all the current backups were restored, I made a habit of collecting a week’s worth of log backups and applied them in a similar fashion.

I know this is a very manual process and I could write a Python script once to do all this stuff for me. I intend to write such a script but right now I do not have the time. Besides, this procedure was just for me to learn how to restore backups and then apply transaction log backups.

Some good resources include: (a) Using Standby Servers; (b) SQL Server 2000 Backup and Restore; (c) SQL Server 2000 Backup Types and Recovery Models;

Disk Usage in Ubuntu CLI

If you would like to see disk usage on your Ubuntu computer, you can do so easily through the command line.

du -c -h file...

where file is the file (or files if more than one are specified) of which you want to see the disk usage.

If you would like to see free disk space, this is your best bet

df -h

This will present all the disks and such mounted on your system and provide statistics like total space, space used, and so on.

For more information, run the following on your Linux system

man du

or

man df

Or you may search for the manual for these commands through a search engine.