Tech Notes

New.SavannahNow.com is running on a modified version of Drupal 4.6.8 (as of this writing.) We started on 4.6.5 and have applied three security upgrades, the freetagging patch, and a number of custom tweaks to make the system perform as we want.

Back in my post on Drupal and the enterprise, written right after OSCMS in Vancouver, I made some points about why we developed in 4.6 instead of the then-beta 4.7. I still think those observations are true, particularly the “we don’t have the resources to help get 4.7 out, so we shouldn’t tinker with it.” But that point is infinitely debatable. It comes down to personal (and business) comfort.

That said, we now wish we were in 4.7 (and getting there may be the next step). That’s because 4.7 is better. (Oh, the Forms API alone… and Views….) But what’s done is done. Some learnings from along the way.

PHP paint-by-numbers

Now please, Drupalers, don’t let this one fool you. It’s a compliment. In January, I’d say my PHP skill level was about a 3 on a scale of 1-10 (where 10 is good). Now I’d say it’s at least a 7. And my MySQL has gone from a 2 to a 6. Pretty cool.

Why? because Drupal provides structure and focus for learning common programming tasks. There are APIs, standards, routines and, yes, quirks, that let the novice programmer dive in and learn. Funny story. I wrote my second module in PHP, then ported it into Drupal. After I did that, I realized that the PHP functions that I was using to do some filesystem checking already existed in Drupal.

I could have saved some time, apparently, but didn’t know enough PHP to realize it. While developing quickly, it gave me great comfort every tim I ran across such things. It made me think that someone (a rather, a group of someones) was looking out for me and making sure my code didn’t suck too badly.

So the irony now is that I can contribute back to 4.7 (now 4.8) development. Didn’t think I could five months ago. But hey. the Drupal roadmap led me somewhere.

Cool Drupal Stuff

Some of my favorite Drupal things that made this project go.

  • NodeQueue — which we use all over the site. if it didn’t exist, i would’ve had to write it.
  • Devel — Absoultely indispensable.
  • BlogAPI — we had to write a ton of custom XML parser/importer routines to process data. We run it all through XMLRPC and the BlogAPI (I think, this was another corner of the development).
  • Flexiblock. Since we’re not using 4.7, Flexiblock makes all our multi-region theming dreams come true.
  • Core stability — there’s too much good in core to single out.
  • Profile — we bolted on a ton of functionality to this, but the solid foundation made it possible.

Actually, that last comment sums up Drupal for me. Standing on the shoulders…

New Stuff

Of all our research into Drupal, the one big hanging question was always scale. Sure BluttonToday.com runs fine. But it gets small traffic. SavannahNow.com serves about 5M pages a month.

I’m going to leave aside arguments of scale-by-adding-hardware by saying that our current infrastructure serves something like 80M pages per month on a cluster of a dozen servers or less. The idea of using more than 1 server for a signle site deely offends some people I work with. (That said, we’re going to put two load-balanced web servers in front of a dedicated database server next week.)

So from the beginning, we had to look for ways to help Drupal scale (without adding hardware).

1) Generator — I wrote a module, memtioned above, that takes any Drupal path in the url_alias table and flattens the page either on request or via cron. We moved custom menu functionality out of Drupal’s block system and into a JavaScript layer (in the upper right of the screen, if you’re playing along at home). This JS trick also helps us link together Drupal and non-drupal portions of the site. Generator also lets site editors preview site changes made through moving or adding blocks or editing node queues. it even features one level of restore from backup.

Note: we only flatten the first page of a paginated set, so all files are output as .php files that include a pagination checker. This little file checks for $_GET requests and, if necessary, uses a header-redirect to send users to the dynamic version of a page.

* UPDATE 6/23/2006

I originally was using a Curl request to do the file generation. But now I use file_get_contents() and file_put_contents() because the are a little faster.

*
Right now, the Generator flattens about 120 pages, and those pages (including the front page) will likely account for 50%-75% of normal site traffic. Serving those straight from Apache and bypassing the Drupal bootstrap gave us a 1000% performance increase in our tests.

[And again, please don’t assault me with server optimization questions or tricks. I won’t undersand any of it and that part simply isn’t my job.]

2) Nimex. This stands for NITF (News Industry Text Format) Import Export. it’s a custom module that takes data from the front-end pagination system and loads it into Drupal via XMLRPC. Very cool. It also handles media files (also exported from the print system) and allows editors to create ‘web enhanced’ versions of print stories. Nimex also powers the Flash players on the site. (Which use XML derived from media attached to a node or from a node queue list).

3) XSearch. An extended search module that layers a number of different XML feeds into a unified persentation. It also uses sarchAPI to find Drupal content. Very cool.

4) Biznode. A massive calendar effort whose design was to handle data imports (via XMLRPC) from vendors. The data model required rebuilding the wheel in some cases (we don’t even use the Event module anymore). This one was a bear. I also had to write some calendaring and scheduling functions for it. From a Drupal standpoint. we probably goofed this one. We used our own (really the vendor’s) data model and that meant we had to reinvent a lot of code.

5) Reputations. This one surprised me. I needed a simple module that let users rate one another and couldn’t find it. so this is the first module I wrote. I didn’t using the Voting API because I wanted to avoid the overhead (and I wanted to write a module entirely from scratch). I like it. I especially like that it builds lists of everyone who rated you and lets you link back to them. it also creates a highest-rated user page. This module, maybe more than the others, is actually ready for Drupal contrib.

6) Media Handling. Probably the nicest thing we did for the site editors is in automating a lot of the media loading process. In partiucular, Nik wrote a great Flash 8 player that we fed via XML. Very cool. See it on any of the section fronts.

7) Object caching. This one might be controversial (I don’t know), but it makes sense and can increase page load 20% or more. When the filecache patch came out, we tested it (and liked it), but we wondered why Drupal never caches say, the $user object. So I wrote some code that piggybacks on filecache to create object caches for:

  • blocks
  • nodes
  • users
  • profiles

* UPDATE 06/23/2006
I have extracted the $object_cache code from its dependency on file-based caching.
*

What happens is that on object_load(), we check to see if this object exists in cache, if so, we load from cache (and bypass a potentially large PHP function call stack). If it doesn’t exist, we generate the object normally and then cache it. The only danger here, I think, is in not clearing the object cache often enough. We clear cache, for example, on object_save(). I ran into one nasty case in user_load that hints at the problems here. The object cache is ID based. You request node 97, we cache node_load(nid => 97). You request user 31, we cache user_load(uid =>31).

In the case of “I forgot my password,” however, the user_load doesn’t take UID as an argument. This can result in a situation where the wrong user cache is loaded. So I has to put a check in place to ensure that object_load() arrays included an id.

As I said, we get a significant performace gain from this cache — which writes to the filesystem, not the database. But we need to keep a careful eye on it.

I should stress that it is very handy in conjunction with the Generator. The pages that we generate are typically taxonomy index pages (with the X most recent nodes on them). When we generate that page, the generator call invokes the cache for those nodes. This gives us a dual performance gain. 1) The index page is served by Apache. 2) the dynamic node pages load from cache (since node_load is nvoked to generate the index page).

Potentially very cool and possibly a candidate for 4.8/5.0

Shout Outs

Hopefully this all doesn’t seem too self-important. And while I don’t want to mention names (for fear that you’ll want to hire them), let me say that the development team did an awesome job. Without Kelly and Art, I’d never have finished my parts. Tobby, Tim and Libby made it all look like the designers intended. (For more on design, see Nik’s post. And Cameron and David for making it run.

And a big shout out to the Drupal developer community. Without you, this project never gets out the door.

8 Comments so far
Leave a comment

Hi, I followed your Drupal profile over here. I was very interested in your user rating module– now I’m interested in all your work, especially your speed enhancements. For me the need to scale will come later, so the promise of static page chaching - http://bendiken.net/2006/05/28/static-page-caching-for-drupal - was enough for me to choose Drupal. But it’s great to see you’ve got improvements working now. Please do let me know when you contribute the user-rank module. Thanks!

Congratulations on the roll out! Sounds like a really great learning experience and a cool site. The generator module in particular sounds great for sites that worry about getting slashdotted, etc.

As the author of VotingAPI, my ears perked up when you mentioned not wanting the overhead of that module. Would you be interested in discussing that? Not to ‘convince’ you to use it, but rather to figure out ways the API can be improved. I’d love to chat about ways it can be made lighter without sacrificing too much functionality.

Jeff-

Don’t get me wrong, VotingAPI is cool, but I don’t know that I needed it. Three main reasons I didn’t:

1) I only needed one new Voting app, so an API seemed like overkill. See #2…

2) More modules = more code.

3) Honestly, I wanted to write something from scratch so I could see all the moving parts. Learning the Voting API (which I started) would’ve shortcut the learning process.

- Ken

Thanks. I like the generator.module a lot. It’s a funny module, though, since I originally wrote it as a PHP form, using no Drupal functions.

I did that to make sure that the Curl trick would work, and because I wasn’t sure which filesystem checks I needed.

It turned out that the file checks I need are all in file.inc (I think). If more work were going to be done on generator, first it would have to be brought into full Drupal mode. Then there are some code redundancies to remove.

For a site with multiple editors (this site has at least 6), generator has the added benefit of ‘preview’ and ‘backup.’ Since we generate static pages from dynamic, when an editor moves blocks around, she can preview what the changes might look like before output.

Currently, the module supports one level of restore from backup, though the code could easily support N levels. Backups are stored in the filesystem, though the database might be better.

- Ken

The generator will be (mostly) deprecated in the next version of Drupal. CVS HEAD supports multiple cache backends, and the same can be achieved using the ‘fastpath file caching’ that Jeremy Andrews (CivicSpace) wrote. That patch was also backported to Drupal 4.7, by the way. I’d suggest that you check with them, and explore options to collaborate on that file-caching project.

We’ve been testing the filecache as well, but it hadn’t been released when I wrote the generator.

I sent Jeremy some code last week that removes the ‘wildcard’ barrier from using the file cache, so we’re talking.

The generator does have two additional intended features that filecache doesn’t. Since the files already exist, it allows site editors (there are 4) to preview changes when moving blocks, promoting stories, et. al.

Then when the editor is happy with the changes, they can ‘publish’ the results. Workflow that supports a ’staging’ server model would be very welcome at the enterprise level.

Second, the generator serves ’static’ pages to all users, not just anonymous users. Since the pages it generates are the most popular on the site, it really made our operations guys happy.

But, that said, I like the fastcache, and we use that as well.

Thanks for the shout out Ken. We could not have done it without you, and I am glad I was able to help :)

Ha Ha.

For those of you playing along at home. Every time I say something like “I don’t know how we did that.” Or, “Real developers worked on that part.” In those cases, I’m probably referring to Kelly. Or Art. Or Tobby.

Especially check out the Search feature on SavannahNow. Kelly wrote that.



Leave a comment
Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>

(required)

(required)