Reinventing the (Wiki) wheel

I have been mulling for some time the idea of creating yet another Wiki engine/application. But before I started building it, I wanted to iron out the reasons why I want to do it. Some reasons seem childish when looked at objectively and others seem frivolous. But when I probe deeper into them, I find myself motivated to at least give it an honest chance.

So far I have asked myself some questions on why not to go ahead with this project. Is there really a need for another Wiki application when a plethora of choices are already available? Isn’t it better to work on a project which doesn’t have as many choices and people working on it? What is it that I can provide in your Wiki application that others haven’t already provided or considered but rejected? Who do I want to target as my audience; individuals, developers, small teams, large teams, etc.? Do I have the time, drive, and commitment to not only start this project but continue to maintain if it is ever finished? In short, I have tried to ask myself questions that would discourage me from going ahead because there’s a side of me that really believes that yet another Wiki engine will not benefit a lot of people. I could rather work on making, for example, a Session Border Controller solution using OpenSIPS and FreeSWITCH, you know, something that a lot more people could find useful.

But there’s another side that really wants to undertake this project for a few reasons. I tried MoinMoin (in fact, still using it) and the biggest problem for me (and I could be wrong here) was that to make changes to config, I have to change Python code. I am certainly capable of understanding Python code and trying to modify it to suit my purpose but the biggest obstacle is to first learn how and where to make the changes. This requires excellent documentation, something not easily done when users may be required to do some short programming stuff. I started with MoinMoin precisely because I feel very comfortable with Python and knew that I could adapt it if required. I love that MoinMoin stores content in flat files, which I can backup using rsync from my server to my notebook. But I don’t get that ‘Oh, I get it!’ feeling when trying to change the way MoinMoin works by default. I could try some other Wiki that does these things but not many are built using Python.

These, and many other, reasons prompted this prospect of re-inventing a Wiki application. Some of the goals, inspirations, and design choices are below.

  • I want to learn how to create a Wiki.
  • Always use the test tool for the job.
  • Leverage already-existing tools rather than re-creating them for no reason.
  • Focus primarily on making a Wiki.
  • First have a functioning Wiki and then think about performance, scalability, serving as true collaboration platform, plug-in support, etc.
  • The same Wiki should serve multiple groups and users with smart and secure permissions.
  • Security features, such as IP whitelist, IP blacklist, account lock on multiple login failures, etc. should be easy to manage.
  • Allow multiple Wikis to be hosted from the same server using either the same database (shared credentials among the farm) or different databases.
  • Wikis are mostly read-only with a small portion of it being edited at any given time. So use flat files for content and attachments. File systems were designed to handle files well so let them do their job.
  • Databases are great for structured data so use them configuration, settings, rules, etc. This deals with the issues of changing code to maintain policy.
  • Differentiate between Wiki Management System (WMS) and content. WMS are all the bits and pieces that make a Wiki function: user management, parsers, etc. Content is what the users most focus on because the Wiki exists to gather and serve content.
  • Advanced, fine-grained user profile and access control management, auditing, and logging.
  • Use Django because that’s what I know best. An additional advantage of using Django is that the project can then be modified by people who know it or it can be integrated/enhanced into a much bigger Django project.
  • Python and Django make it easy to create a multi-platform application but focus primarily on Linux.
  • Keep the project free, open source, and freely available.

Some wishlist items are also included.

  • Add some elements of blogs, such as tags, time-sensitivity of content, discussions, etc.
  • RSS feeds.
  • Simultaneous editing for a true multi-user Wiki.
  • Integration with different authentication/authorization methods, such as RADIUS, LDAP, etc.
  • Use YUI Library to create user-friendly and pretty templates, etc.

Now the question I ask myself is this: how much time and effort will be required to get this project started and rolling. This is a huge list of goals for a single developer (who is a developer-for-fun in the first place). Am I capable of managing this project? Only time will tell. But the last question I ask myself before I start: is it worth it to reinvent the wheel when others have already done pioneering work before you?

Django, Apache, and X-SendFile in Debian

The idea for this post is to allow downloading files only to restricted users. The only constraint for now is that users are logged in. As I add more code to the Django app, I plan to include the ability to specify which users should have access to which files.

This post assumes that you have a working Apache server with VirtualHost file setup for a Django project. I also assume that you would want to access files using a URL scheme such as http://example.com/download/filename.ext where filename.ext represents the file you want to download.

To install X-SendFile in Apache2, run the following:

sudo aptitude install libapache2-mod-xsendfile

And to make sure you enable it, run the following

sudo a2enmod xsendfile

Add the following two lines in your VirtualHost file, like so:

<VirtualHost *:80>
...
XSendFile on
XSendFileAllowAbove on
...
</VirtualHost>

The code for this has been uploaded to Launchpad under the project Django X-SendFile Download or DjangoXSFD for short. Since it’s a work in progress, you can always get the latest code from there.

Hat tips: Djangocon X-Sendfile Lightning Talk; Fast File Transfer with X-send file; How to make a private download area with django?; Having Django serve downloadable files; Fast private file transfers for Drupal;

Django – Logging in Views

I have searched as many places I could looking for an advisable way of logging within views in Django. I found a lot of advice, apps, code, etc. But I did not find anything easy, beautiful, or even understandable for someone with my limited knowledge. So I did what anyone in need would do: create an app of my own which can be used within views.py for any and all apps and projects.

The main idea was to use Python’s logging module but abstract it enough so that any developer can configure it easily. The code has been structured thus: a directory called django-logbook (or even djangologbook) which needs to be placed on Python’s path, preferably in the Django project folder. This directory contains at least three files, with more files included depending on the number of apps using Django-LogBook (my name for the app). These three files are: __init__.py, django-logbook.py, and sample.py. __init__.py is empty and is present so that the logbook directory can be considered a package or module by Python.

django-logbook.py contains the heart of the app. It contains a class calls LogBook which can be instantiated by any app. To make things easier in terms of management, I have added a sample.py which does the instantiation. The idea is to create separate log files for separate apps. Therefore, each app has its own appname.py in the logbook directory. Just copy sample.py and rename it as appname.py. Then in your views.py import the appname.py.

You don’t even have to import the whole appname.py in views.py. All you have to do is import as below:

from PROJECTNAME.logbook.APPNAME import djangologbookentry

Then within your views, you can log as below:

django-logbookentry.info('My info message')
django-logbookentry.error('2009-12-16 23:59', 'OMG an error')

In your appname.py in django-logbook directory, you need to provide three things: unique identification of logger for the app, name of the log file, and level of log messages. By default it is setup to log up to 5 files, each file 200,000 bytes in size.

For detailed configuration, you can view and modify the LogBook class.

The code for Django-LogBook is hosted at Launchpad.

When to use GET and POST

In HTTP and/or HTML, there are two (main?) types of submissions: GET and POST. I have always had a hard time determining when to use GET and when to use POST. In other words, what is the main difference between GET and POST?

Quite simply, if the submission is reading data, without making any changes, use GET. If your submission will be making changes, or causing side-effects, use POST. For example, when doing a query, such as a Google search, the form should use GET. If you are creating a new account with Yahoo! mail, the form should use POST. The Google search is reading data and new account in Yahoo! mail is changing things.

Simple enough, right? Not so, as there are some situations where either could work. One of the best descriptions of the problem I have read is Methods GET and POST in HTML forms – what’s the difference? And here is what I learned from it (you might learn something else reading the same words):

When doing a submission which will not make changes use GET. When making changes, use POST. But you may want to use POST instead of GET in the following (major?) case: when you don’t want the data you submit to become a part of the URL. When using GET, the form you submit becomes a part of the URL. So if you have a couple fields, say myname and mynumber, using GET your URL might look like http://www.mydomain.com/someform.html/?myname=codeghar&mynumber=1234 after submission. The biggest benefit is that you can bookmark this URL and visit it in future as just another link. The biggest disadvantage, in my opinion, is when you have multiple fields, the fields and their data becomes part of the URL, thus making for ugly URLs. So instead of a neat looking, small URL, you get one huge string.

Don’t get me wrong: a useful URL, even if ugly, should be preferred to an unfriendly, pretty URL. But if you want to pretty-fy your URLs, use POST instead of GET. And with a pretty URL, your user doesn’t have to know the inner workings of your form. They can still look at the source code of the page to see what’s going on, but only if they want to. Encoding the form data into a URL forces them to deal with the data even if they don’t want to.

The second benefit of using POST is that if your form contains non-ASCII data, it doesn’t form a part of the URL, which might be a good thing if your HTTP server is unable to handle this data in the URL. I don’t know, maybe all modern servers or intermediary devices can handle this stuff easily. But better safe than sorry, eh?

So from today my best practice is thus: if the form has a small number of fields, showing the submitted data in the URL is not a problem, and/or URL should be bookmark-able for future reference, use GET if there are no side-effects. If none of these requirements is met, use POST.

One concern I have is: if using HTTPS, is the URL sent in the secure tunnel or is it plaintext for all to see? According to the responses at HTTPS – is URL string itself secure??, the URL should be encrypted before being sent to the server as it is sent as part of the tunnel rather than a separate string. I think it would depend on the implementation of the client and server (I could be wrong in thinking this).

Don’t take my understanding of the situation as the final word. Read as much as you can on the subject to form your own best practice. And if you share your understanding and best practice with us, it would help us as well.

Good reads and hat tips: URIs, Addressability, and the use of HTTP GET and POST; Methods GET and POST in HTML forms – what’s the difference?; Post/Redirect/Get;

Django in Debian Lenny

Installing Django in Debian is very similar to in Ubuntu. Here we will use Postgresql so there will be a slight difference from when using MySQL.

sudo aptitude install postgresql python-psycopg2 python-django

Django in CentOS

This post has been written using CentOS 5.2, but these instructions may also work for other versions. Just let us know if they don’t.

Install EPEL

Extra Packages for Enterprise Linux (EPEL) is “a repository of high-quality add-on packages that complement the Fedora-based Red Hat Enterprise Linux […] and its compatible spinoffs such as CentOS or Scientific Linux.” You need this repository to install Django from a package. Another option is to download source and compile yourself. This guide, however, will be using packages. So just follow the instructions to add EPEL repository, which state:

su -c 'rpm -Uvh http://download.fedora.redhat.com/pub/epel/5/i386/epel-release-5-3.noarch.rpm'

Install Django

To install django from EPEL, just run the following command:

sudo yum install Django

Notice the uppercase ‘D’ in the name of the package. Django is installed in /usr/lib/python2.4/site-packages/django/ and it should already be included in the path.

Test Django

To make sure everything was installed properly, try to create a simple new project.

django-admin.py startproject /home/me/mynewproject

If a directory with the same name and some standard django sub-directories are created, then your installation was successful.

Choosing a Linux Distribution

Recently I have had more time to work with Linux. I had been using Ubuntu in some way for two years when I needed to set up Linux on a few years old server. Since I was comfortable with Ubuntu, I thought I might as well go ahead and use it. But then I found out that there were other alternatives as well. This caused a headache which still isn’t resolved to this day. Which distribution is the best to get hands-on, real world experience with?

Comfort

You have to look at your comfort level when choosing a distribution. If you are familiar with something, even in passing, it would be an easier path to go with what you know. On the other hand, all distributions may be different but they have more in common than there are differences. So learning another distribution style is not as difficult as one might expect.

Hardware Support

If the distribution you choose is not able to function on the hardware you have available, you should not choose it. If you can get it to work, with or without a lot of effort, all the power to you. If, however, you can’t get it to work, you might as well look for another option. I went ahead with Ubuntu on the server because it supported all its hardware out of the box. I did not have to tweak anything or waste a lot of time. On the same server I was unable to install CentOS because Red Hat had dropped support for server’s RAID card in its current distribution.

Purpose

For what purpose are you using a distribution? Is it going to be for starting out, testing, development, or deployment? For all these scenarios, there are many distributions fitting them just fine. For starting out, a friendly distribution like Ubuntu could work. If you are testing Linux for its feasibility in your environment, just about any distribution would work. A distribution for doing development work should be fast moving with new technology so that you can use it to its fullest extent. If it’s for production deployment, being conservative in your selection is recommended.

Cutting Edge Technology

Some distributions strive to be on the cutting-edge. I count Fedora, openSUSE, and Ubuntu in this category. They release new stuff every few months. So you get to work with what’s new. For example, on Ubuntu, I found Django packages ready to install and work. Since I wanted a package and I found it, I was able to start working. I did not have to jump through hoops just to get to the point where I would be able to work.

Enterprise

Yes, an enterprise version would be more stable and maybe more secure. But it is also less likely to include new technology in an easily accessible format. Taking the example of Django, I have not found any tutorial on the web to install it on CentOS using an RPM package. All tutorials I have read ask you to download and install from source. Yes, it’s the traditional way to do things but if package management is the future, we should look for packages first and source code later. Now if I am developing and deploying an application developed with Django, I want to have the peace of mind that I installed a package that had been tested to work well with the whole operating system, and not something I installed without knowing how it would turn out.

To me this is the most important point after hardware support. I am willing to learn a whole another distribution if it is enterprise level with great hardware support but also keeps up with new technology. Since not one distribution will always fill these requirements, we have to look at the best tool for the job at hand.

Security

I was shocked to learn a few days ago that Ubuntu server’s default firewall policy was to accept all traffic. CentOS, on the other hand, has a pretty aggressive firewall policy. Combined with recent scandal of Debian and OpenSSL, it has dented my confidence in Ubuntu. It’s not that Ubuntu is insecure, it’s just the appearance of security in the ecosystem is absent (to me, at least). It’s also not that these things cannot be rectified by me, it’s that why would I need to take an extra step when a prudent decision could do it for me in the first place.

Another aspect I look to is being root. Does one have to actually be root or would sudo do? I like the sudo model better since it forces you to actually type your permission when doing critical work. Yes, if you are careful su and su - would work as well as sudo. But I like the added carefulness of sudo. So the first thing I do after installing a distribution is to see if it has sudo and then enable it for at least one user.

Support

Support is a very important part of decision-making process. Support may be of three kinds: distribution creator, third-party professional, and community and friends. Support includes help as well as software updates. One can get help from many sources, and community is an essential part of this support ecosystem. It can get you started and get you out of trouble. Almost all (ok, maybe all) distributions provide software updates. Then there is an extra level of support which we know as enterprise or corporate support (think Red Hat). It is provided by either the creators and maintainer of the distribution or from third-party entities.

For a home user, software updates and community support should be sufficient. For a business, however, ‘corporate’ support is essential on production systems. Businesses like to pay someone to get extra insurance in case it is needed. If a server is essential to business operations, it is very important that the team running the server knows what it is doing, has community support for minor issues, and corporate support when things go really bad.

Red Hat, Novell, and Canonical provide this kind of support as they create their distributions. Of course, if you have a good team running your servers, you may not need to get corporate support. But if your manager is a non-technical person, she will most probably require it. And if it’s not your money being spent, why argue?

Conclusion

This was meant to be a discussion of factors I would look into when choosing a distribution. Nothing more, nothing less.

Disclaimer: I have edited, and will edit, this post as new arguments come up.