I recently finished a rewrite of privatepaste.com in python using Tornado as the web framework. There are multiple reasons why I decided to use Tornado instead of something like Django, cherry.py or web.py, all of which I’ve previously used. One of the main reasons for my choice to switch to Tornado was due of its feature-rich yet light-weight nature. In addition, the benchmarks and asynchronous http server were intriguing.
Tornado is distributed as a set of loosely coupled python modules. It’s up to the developer to decide which aspects of Tornado you’d like to use. It’s also up to the developer to write the core application which is responsible for running your web application as a daemon. To create a base level application, all that is required is using tornado.httpserver and tornado.web. If you’ve ever programmed with using web.py, many of the conventions should be similar to you.
The base principle in writing an application is to map a URI to a class. In that class you provide functions for the HTTP methods you intend to support. Because Tornado at its core is a HTTP server, you must implement every HTTP method you intend on supporting for a URI. For example, if you are writing a CMS, you would not only implement the GET function, but you’d want to implement a HEAD function for returning browsers with cache information for your content.
Being loosely-coupled, it is up to the developer to implement everything from session handling and authentication to localization and the data layer. Some may consider this an issue, but not to worry, Tornado includes modules to help. There are authentication mix-ins for Google, Twitter, Facebook and Friendfeed. To achieve authentication with Tornado, you would extend the tornado.web.RequestHandler class and extend the get_current_user() to handle the authentication functionality.
Localization is handled in a similar fashion. While there is some magic under the covers, it generally leaves localization implementation up to the developer. By extending get_user_locale(), the developer returns a locale object which has been initialized with the appropriate language. As with other modules, the meat of the documentation is in the locale and web classes.
Documentation is one of the key drawbacks of using Tornado. If you can not dive in to other peoples code to find what you need, you’re going to have a difficult time with Tornado. Much of the initial time that I spent with Tornado was in the Tornado code itself, figuring out how to access different parts of data within the http request, templating system and the application class. The documentation provided is deceivingly simple, and indeed, for a Hello World application the documentation is sufficient and accurate. It’s when you’re knee deep in code that you’ll find yourself having to go beyond the documentation to get what you need.
The template engine is full-featured and has yet to leave me wanting. While there has been some back and forth on the mailing lists about the speed of the template engine, it has proven important to turn off debug mode when comparing template engines. I have found the template engine to be very fast, even when extending other templates and including modules.
The biggest hurdle, which isn’t uncommon in any web application, is right-sizing processes to serve your application. Because your tornado app runs as a stand-alone HTTP server that is directly coupled to your application classes, you need to run multiple processes to serve multiple requests. Like FriendFeed, I use Tornado behind a web server using a reverse proxy module. However, instead of Nginx, I am using Cherokee. I use Python’s multiprocessing module to spawn multiple HTTPServer instances with my application. Each instance has its own port number and the reverse proxy server uses these backends in a round-robin pool to provision requests.
When coming from a CGI based backend, one has to think a bit more about how you size your backends. Because your web server front-end can’t spawn new backends on demand, you’ll need to make sure that you have enough backends to provision your maximum number of simultaneous requests. There are changes in the master branch of Tornado on github to make Tornado fork on its own, spawning a thread per CPU core, but this will not change the scaling concern, as the same principles apply.
Asynchronous request handling is one of the more often touted features of Tornado. It’s important to understand exactly how async requests fit into your application development model. Because each Tornado back end is single threaded it is important to think about the blocking areas of your application, such as database calls, to determine if you can benefit from the async functionality. To truly benefit from the async server, you’ll need to use a fully async model for any type of operation that would normally be blocking.
An example where the async functionality shines is the Authentication Mix-in’s for Google, Facebook, FriendFeed and Twitter. When you use these mix-ins, you specify a callback function to call once the HTTPClient class has returned a result from your call.
Because I use psycopg2, a blocking PostgreSQL driver, for my database access, I generally do not use the async functionality. For me this is not an issue, as in full featured applications I still see performance as fast as 1ms from start to finish of request. Of course your own application has as much, if not more impact on performance than Tornado itself.
If you’re just getting started with Tornado, be sure to check out the demo code. If you’re looking for a little more structure in getting started, check out Tinman, a meta-framework on top of Tornado.
Edit: Changed to reflect a misstatement about the new forking changes coming up for 0.3.