Tornado Tip: Variables & Functions in Tornado Templates

Tornado has a very fast and flexible template system that is reminiscent of other template systems, such as that found in web.py.  I have found that in practical use of the template system, the documentation is thin in some areas.  A good example is in what variables and functions are exposed to the templates by default.

To that end, here is a list of the variables and functions available to a template as defined in both template.py and web.py:

  • _ (underscore)
    An alias for the locale.translate() function.
  • current_user
    The current_user object as returned by RequestHandler.get_current_user().
  • datetime
    The datetime module.   Example: {{ datetime.date.today().year }} returns the current year.
  • escape
    A function that escapes a string so it is valid within XML or XHTML.
  • handler
    The request handler that called the self.render function to process the template.  Example: {{handler.static_url(‘foo’)}} will run the static_url function in RequestHandler returning a full URL path, pre-pending static_url.
  • json_encode
    A function that JSON-encodes the given Python object.
  • locale
    The locale value as returned by RequestHandler.get_user_locale().
  • request
    The request object that is passed into the Request Handler.
  • reverse_url 
    Given a full module and class (myapp.Homepage) it will look at the handler to URI mapping and return the URL for the given class.
  • squeeze
    A function that replaces all sequences of whitespace chars with a single space.
  • static_url
    The value of the static_url property passed in to the application settings.
  • url_escape
    A function that returns a valid URL-encoded version of the given value.
  • xsrf_form_html
    If using the cross-site forgery protection, this function returns the hidden input field containing the xrsf variable.
This is the current list as of today's master branch on github.  If you're using 0.2 there are a few functions that are not in that version.  Did I miss one or get something wrong?  Please let me know.

Web Application Development with Tornado

I recently finished a rewrite of privatepaste.com in python using Tornado as the web framework. There are multiple reasons why I decided to use Tornado instead of something like Django, cherry.py or web.py, all of which I’ve previously used.  One of the main reasons for my choice to switch to Tornado was due of its feature-rich yet light-weight nature. In addition, the benchmarks and asynchronous http server were intriguing.

Tornado is distributed as a set of loosely coupled python modules. It’s up to the developer to decide which aspects of Tornado you’d like to use. It’s also up to the developer to write the core application which is responsible for running your web application as a daemon. To create a base level application, all that is required is using tornado.httpserver and tornado.web. If you’ve ever programmed with using web.py, many of the conventions should be similar to you.

The base principle in writing an application is to map a URI to a class. In that class you provide functions for the HTTP methods you intend to support. Because Tornado at its core is a HTTP server, you must implement every HTTP method you intend on supporting for a URI. For example, if you are writing a CMS, you would not only implement the GET function, but you’d want to implement a HEAD function for returning browsers with cache information for your content.

Being loosely-coupled, it is up to the developer to implement everything from session handling and authentication to localization and the data layer.  Some may consider this an issue, but not to worry, Tornado includes modules to help.  There are authentication mix-ins for Google, Twitter, Facebook and Friendfeed.  To achieve authentication with Tornado, you would extend the tornado.web.RequestHandler class and extend the get_current_user() to handle the authentication functionality.

Localization is handled in a similar fashion.  While there is some magic under the covers, it generally leaves localization implementation up to the developer.  By extending get_user_locale(), the developer returns a locale object which has been initialized with the appropriate language.  As with other modules, the meat of the documentation is in the locale and web classes.

Documentation is one of the key drawbacks of using Tornado.  If you can not dive in to other peoples code to find what you need, you’re going to have a difficult time with Tornado.  Much of the initial time that I spent with Tornado was in the Tornado code itself, figuring out how to access different parts of data within the http request, templating system and the application class.  The documentation provided is deceivingly simple, and indeed, for a Hello World application the documentation is sufficient and accurate.  It’s when you’re knee deep in code that you’ll find yourself having to go beyond the documentation to get what you need.

The template engine is full-featured and has yet to leave me wanting.  While there has been some back and forth on the mailing lists about the speed of the template engine, it has proven important to turn off debug mode when comparing template engines.  I have found the template engine to be very fast, even when extending other templates and including modules.

The biggest hurdle, which isn’t uncommon in any web application, is right-sizing processes to serve your application.  Because your tornado app runs as a stand-alone HTTP server that is directly coupled to your application classes, you need to run multiple processes to serve multiple requests.  Like FriendFeed, I use Tornado behind a web server using a reverse proxy module.  However, instead of Nginx, I am using Cherokee.  I use Python’s multiprocessing module to spawn multiple HTTPServer instances with my application.  Each instance has its own port number and the reverse proxy server uses these backends in a round-robin pool to provision requests.

When coming from a CGI based backend, one has to think a bit more about how you size your backends.  Because your web server front-end can’t spawn new backends on demand, you’ll need to make sure that you have enough backends to provision your maximum number of simultaneous requests.  There are changes in the master branch of Tornado on github to make Tornado fork on its own, spawning a thread per CPU core, but this will not change the scaling concern, as the same principles apply.

Asynchronous request handling is one of the more often touted features of Tornado.  It’s important to understand exactly how async requests fit into your application development model.  Because each Tornado back end is single threaded it is important to think about the blocking areas of your application, such as database calls, to determine if you can benefit from the async functionality.  To truly benefit from the async server, you’ll need to use a fully async model for any type of operation that would normally be blocking.

An example where the async functionality shines is the Authentication Mix-in’s for Google, Facebook, FriendFeed and Twitter.   When you use these mix-ins, you specify a callback function to call once the HTTPClient class has returned a result from your call.

Because I use psycopg2, a blocking PostgreSQL driver, for my database access, I generally do not use the async functionality.  For me this is not an issue, as in full featured applications I still see performance as fast as 1ms from start to finish of request.  Of course your own application has as much, if not more impact on performance than Tornado itself.

If you’re just getting started with Tornado, be sure to check out the demo code.  If you’re looking for a little more structure in getting started, check out Tinman, a meta-framework on top of Tornado.

Edit: Changed to reflect a misstatement about the new forking changes coming up for 0.3.

The Attention Deficit Disorder Guide to RabbitMQ

RabbitMQ has been one of my interests of late, as I’ve identified it as part of our technology path at work. There are other very good resources that dive pretty deep in RabbitMQ and how to use it. The goal of this guide is to help you get on your feet quickly and easily. It assumes a couple of things:

  • You already know about message queues and have some experience or knowledge on the subject.
  • You know what AMQP is.
  • You are already interested in RabbitMQ enough to try it out.

If you’re good on those things, let’s get started…

RabbitMQ is written in erlang. As such, you should have already downloaded and installed erlang as a first step.

Download RabbitMQ and install it, which is pretty easy.  I like to setup RabbitMQ in an /opt/rabbitmq directory. To do that, I set some environment variables before compiling (bash assumed):

1 export TARGET_DIR=/opt/rabbitmq
2 export SBIN_DIR=/opt/rabbitmq/sbin
3 export MAN_DIR=/opt/rabbitmq/man

Then I compile and install with “make install.” Because I like to run as my own user or a service user, I’ll chown -R myuser /opt/rabbitmq as appropriate.

There are a few other things we need to do including make the log directory and the directory RabbitMQ will use to store its data:

1 mkdir /var/log/rabbitmq
2 chown myuser /var/log/rabbitmq
3 mkdir /var/lib/rabbitmq
4 chown myuser /var/lib/rabbitmq

Now as “myuser” we can “cd /opt/rabbitmq/sbin” and run “./rabbitmq-server” and what you should see is:

RabbitMQ 1.6.0 (AMQP 8-0)
Copyright (C) 2007-2009 LShift Ltd., Cohesive Financial Technologies LLC., and Rabbit Technologies Ltd.
Licensed under the MPL. See http://www.rabbitmq.com/

node  : rabbit@binti
log  : /var/log/rabbitmq/rabbit.log
sasl log  : /var/log/rabbitmq/rabbit-sasl.log
database dir: /var/lib/rabbitmq/mnesia/rabbit

starting database …done
starting core processes …done
starting recovery …done
starting persister …done
starting guid generator …done
starting builtin applications …done
starting TCP listeners …done

If you have the hang of starting RabbitMQ and now want to run it in the background, instead do: “./rabbitmq-server -detached”.

Once we’ve gotten this far, we’ve got our broker up and running and now we’ll need some way to talk to it. For the purposes of this article, I’m going to talk about amqplib and Python. There are AMQP libraries for just about every relevant language at this point. RabbitMQ 1.6.0 implements the AMQP 0.8 standard. The easiest way to install amqplib is a simple “easy_install amqplib”.

But before we dive into code, there are a few key concepts we need to talk about:

Queues: You should get these already, one puts a message in a queue and a consumer app receives it somewhere else.

Exchanges: These are a little more tricky than queues. I like to think of them as namespaces.  One of the keen things about RabbitMQ exchanges is that different exchanges will get a different erlang process which should help make better use your available hardware resources. There are three types of exchanges that we need to talk about:

Direct: a direct exchange means when you put a message in, it goes to one consumer and he’s all that will get that message routed through the exchange.

Fanout
: a fanout exchange sends your message to every consumer that listening to a particular exchange / queue combination.

Topic Exchange
: this type of exchange allows you to do neat things like listen to the same queue across exchanges on one consumer, multiple queues in one namespace in a consumer and other wildcard type trickery.

Bindings: In RabbitMQ you bind your exchanges and queues together in unique combinations which determine how messages are routed to what consumers.

Memory: As of RabbitMQ 1.6.0 all messages are kept in memory. If you have nothing consuming your messages and you send too many of them, you’ll run out of memory.

Monitoring: The main install has the app rabbitmq_ctl which you can use to inspect the various parts of RabbitMQ. This isn’t very good for remote monitoring or visualization. For that there’s a great project called Alice which is also erlang based.

Speed: There are two ways to get messages from RabbitMQ: basic_get and basic_consume.

basic_get is where your app, on a message by message basis, asks RabbitMQ for a message. This is the slower of the two methods and will not allow single consumer applications to scale to a very high transaction rate.  Note that RabbitMQ will not register these connections as a consumer and you will not see them in list_queues or in Alice as such.

basic_consume
is where your app registers itself with RabbitMQ as a consumer and RabbitMQ will send messages to you as fast as you’re able to consume them.

Durability: If you want to have the definitions of your queues and exchanges hang around if you have to restart RabbitMQ you need to define them as durable.

Auto-Delete: If you want your queues and exchanges to exist even when there are no consumers waiting for messages on them, you need to turn auto-delete off.

Persistence: If you do not tell RabbitMQ that you want it to hang on to your messages if it reboots, it will not do so. You must set the delivery mode of a message to “2” to tell it to persist it until it is consumed.

Auto-Ack: You can tell RabbitMQ to automatically acknowledge receipt of a message, or you can do it yourself. This is a boolean setting that you use when you’re consuming messages via basic_get or basic_consume.

Queue and Exchange definitions: By default, queues and exchanges do not exist until you connect a consumer to them. You can cheat and do this in your code that enqueues your messages.

Now that we have that out of the way, here’s some sample Consumer code:

 1 #!/bin/env python
 2 """ Sample Consumer Code """
 3 
 4 import amqplib.client_0_8 as amqp
 5 # This is the function that basic_consume will send messages to                               
 6 def process_message( message ):
 7     """ Callback function used by channel.basic_consume """
 8     print 'Received: %s' % message.body
 9 
10 # Rabbit Server to connect to
11 host = '127.0.0.1'
12 port = 5672
13 
14 # Exchange and queue information
15 exchange_name = 'test'
16 exchange_type = 'direct'
17 queue_name = 'messages'
18 routing_key = 'test.messages'
19 
20 # Let's set this up by default, we'll use it later
21 process_messages = True
22 
23 # Connect to Rabbit
24 connection= amqp.Connection( host ='%s:%s' % ( host, port ),
25                         userid = 'guest',
26                         password = 'guest',
27                         ssl = False,
28                         virtual_host = '/' )
29 
30 # Create a channel to talk to Rabbit on
31 channel = connection.channel()
32 
33 # Create our exchange
34 channel.exchange_declare( exchange = exchange_name, 
35                           type = exchange_type, 
36                           durable = True,
37                           auto_delete = False )
38                                        
39 # Create our Queue
40 channel.queue_declare( queue = queue_name , 
41                        durable = True,
42                        exclusive = False, 
43                        auto_delete = True )
44             
45 # Bind to the Queue / Exchange
46 channel.queue_bind( queue = queue_name, 
47                     exchange = exchange_name,
48                     routing_key = routing_key )
49 
50 # Let AMQP know to send us messages
51 consumer_tag = channel.basic_consume( queue = queue_name, 
52                                       no_ack = True,
53                                       callback = process_message )
54 
55 # Loop while process_messages is True
56 while process_messages:
57 
58     # Wait for a message
59     channel.wait()            
60             
61 # Close the channel
62 channel.close()
63 
64 # Close our connection
65 connection.close()
66             
67 # This might go somewhere like a signal handler
68 def cancel_processing():
69     """ Stop consuming messages from RabbitMQ """
70     global channel, consumer_tag, process_messages
71     
72     # Do this so we exit our main loop
73     process_message = False          
74     
75     # Tell the channel you dont want to consume anymore  
76     channel.basic_cancel( consumer_tag )

Note that a lot of what is in that example is commented code and whitespace for ease of reading, the actual implementation is pretty darn simple.

Now that we have a consumer going let’s send some messages in:

 1 #!/bin/env python
 2 import amqplib.client_0_8 as amqp
 3 
 4 # Connect
 5 connection = amqp.Connection( host = "localhost:5672", 
 6                               userid = "guest", 
 7                               password = "guest", 
 8                               virtual_host = "/", 
 9                               insist = False )
10 
11 # Create our channel
12 channel = connection.channel()
13 
14 """ We've already declared our queue, exchange and binding in our consumer so just send the messages """
1 for i in range(0, 10):
2         message = amqp.Message("Test message %i!" % i)
3         message.properties["delivery_mode"] = 2
4         channel.basic_publish( message, 
5                                exchange = "test", 
6                                routing_key = "test.messages")

That’s it! If we did this right, you’ve now setup RabbitMQ, sent some messages and consumed them on the other end of the pipe.

If I’ve kept you this long and you’re still interested, but still have questions, I highly recommend this article which goes much more in depth and has been a valuable guide for me.

If you’re into both python and RabbitMQ, you might want to check out my consumer framework “rejected.py,” it’s on GitHub.

I hope you enjoyed the first of my A.D.D. Guides. I’d be happy to answer any questions and would appreciate feedback so I may improve this and future articles to come.

gdata Python Client Mappings

I had a little bit of an issue today finding out that Google’s gdata Python library changes attribute names for the UserEntity class to match with the PEP8 naming conventions.  Here’s a quick rundown:

  • userName becomes user_name
  • changePasswordAtNextLogin becomes change_password
  • ipWhitelisted becomes ip_whitelisted
  • agreedToTerms becomes agreed_to_terms
  • hashFunctionName becomes hash_function_name

Notice that changePasswordAtNextLogin is shortened to just change_password.  Frankly I’m surprised by the lack of consistency by the part of the gdata Python authors, and the lack of documentation of such in the Google Apps Provisioning API Developer’s Guide: Python.