Mikko Kortelainen

Serving Python scripts with Apache mod_wsgi, part II - mod_rewrite

In part I, we learned how to configure Apache to server any .py file as a web application using mod_wsgi. I promised to tell you more about WebOb and multiprocessing and multithreading, and exception handling. I'll save those topics for later articles. Instead, in this part I will talk about using mod_rewrite - if, why and how to get rid of the .py extension. You will need the test apps from part I to try these out.

Removing the .py extension using mod_rewrite

Many people (including myself) will think that they want to get rid of the .py extension in the URLs. There are valid reasons for this:

  • The URLs would be more readable. As we all know it is much easier to remember and understand a URL like http://localhost/myapp/myfeature, than something like http://localhost/myapp.py?action=myfeature
  • The URLs would be more portable. You could, say, re-write your page with C or Haskell, or with whatever you desire, and the users would not notice a thing.

There are also valid reasong speaking against rewriting URLs with mod_rewrite:

  • It might be confusing for the developer, or the application integrator, or the webadmin taking care of the production server. If you have both a script called myapp.py and a directory myapp, which one is going to be called?
  • mod_rewrite allows some really nasty tricks in the wrong hands, so for security reasons, it is not a good idea to allow anybody on a shared server to just write their own rules without admin review.

Keeping these things in mind, I'll try to show you how to use mod_rewrite to hide the .py extension. First, enable the rewrite module:

sudo a2enmod rewrite

Then add this to Apache site configuration (or .htaccess in /var/www if you have AllowOverride +FileInfo +Options set for your dir):

<Directory /var/www/>
  ... other options ...
  Options +FollowSymLinks
  RewriteEngine on
  RewriteCond %{REQUEST_FILENAME} !-d
  RewriteCond %{REQUEST_FILENAME}\.py -f
  RewriteRule ^(.*)$ $1.py [L]
</Directory>

Restart Apache. Now you can browse to http://localhost/python_app/hello or http://localhost/python_app/environ. In the latter, observe that the REQUEST_URI variable has no .py extension, while the SCRIPT_NAME still does:

REQUEST_URI: /python_app/environ
SCRIPT_FILENAME: /var/www/python_app/environ.py
SCRIPT_NAME: /python_app/environ.py

Dissecting the rewrite rules

So what do all those Apache directives do? The first line, Options +FollowSymLinks, is required for rewritten URLs to work, otherwise Apache will deny the requests. The RewriteEngine on directive is needed in order to have any rewriting take place at all. The real magic happens in the last three lines. According to the documentation, the RewriteCond directive "Defines a condition under which rewriting will take place", while the RewriteRule directive "Defines rules for the rewriting engine". All the conditions preceding the rule must evaluate to true in order for the rule to be followed and the URL be rewritten.

RewriteCond %{REQUEST_FILENAME} !-d

This states, that the the requested filename (eg. "hello") must not (!) be an existing directory (-d).

RewriteCond %{REQUEST_FILENAME}\.py -f

This states, that the requested file (eg. "hello") with a .py extension added ("hello.py") must be an existing file (-f).

RewriteRule ^(.*)$ $1.py [L]

If the above two conditions were met, the requested URL (eg. "/python_app/hello"), denoted here with the regular expression "^(.*)$", should be rewritten with a .py extension ("/python_app/hello.py"). The $1 is a backreference to the regular expression. The final [L] says that processing should stop here. It has no meaning if you don't have any other rules. But when you add more rules, mod_rewrite will continue evaluating them if you don't tell it to stop here.

Observing URL dispatch after rewriting

Now, what happens if we create a directory called "hello" in the python_app directory? There's already a "hello.py" file, which we should be able to request without .py extension due to our new rewrite rules. Let's try what happens:

sudo mkdir -p /var/www/python_app/hello

Now, http://localhost/python_app/hello goes to the directory. Was this expected? Well, yes it is. After all, the first rewrite condition explicitly stated that the requested file must not be an existing directory for any rewriting to take place.

If you want to have it the other way around, you need to take the first RewriteCond out and do a bit more configuring. If you look closely, Apache actually redirected you to another url with an additional slash at the end (http://localhost/python_app/hello/). The "culprit" for the extra slash is mod_dir and the DirectorySlash directive, which is on by default. Turning it off and taking out the first RewriteCond will make the url without the slash call the script, and with the slash call the directory. Please note that this might be a security risk, because requesting a directory name without a slash will by default list all files in that directory. So if you have a directory, but no script with the same name, your directory contents can be listed.

Anyway, I think the latter behaviour is harder to use and understand. And I think that it is a good idea to avoid having same names for directories and scripts. Have apache handle the URL dispatch up to your script, and you handle it from there on in your code.

Useful resources: