Google Sitemaps

Just wanted to share some code that I wrote to generate a sitemap.xml file in Google's sitemap format. It's a django application modeled off of django.contrib.syndication. Here's the rub:

First, create a Sitemap subclass. Like Feed, this will be queried for a list of items, then queried for properties for each item returned. For this site, my BlogSitemap subclass looks like this:

class BlogSitemap (Sitemap):
    def items( self ):
        return Entry.objects.filter( is_draft=False )
    def lastmod( self, obj ):
        return obj.mod_date

You can specify location, lastmod, changefreq, and priority in a similar fashion. See Google's documentation for more information. The location property will default to the value returned from the object's get_absolute_url method.

Next, point your urlconf to the appropriate sitemap view (or views). The sitemap views expect a dictionary mapping sitemap names to Sitemap subclasses. idioteque.sitemap.views.sitemap will simply compile a global list of URLs from all entries in the passed-in dictionary. My urls.py looks like this:

sitemaps = { 'blog' : BlogSitemap }
( r'^sitemap.xml$', 'idioteque.sitemap.views.sitemap', {'sitemaps':sitemaps} )

You can also create a sitemap index that references separate files for particular sections. Have a look at my urls.py for an example.

Lastly, there is a convenience function (idioteque.sitemap.ping_google) to ping Google when you want them to re-download your sitemap. For my blog, I overrode Entry's save method to ping google when I post something that isn't a draft:

def save( self ):
    super(Entry, self).save()
    if not self.is_draft:
        ping_google( "theidioteque.net/sitemap.xml" )

There are still some rough edges, but I'm using this code. Things that still need to be ironed out include:

  • Caching the sitemap views. This should be as simple as adding a django.views.decorators.cache.cache_page decorator, but it breaks urlresolvers.reverse at the moment. I have a ticket in.
  • Improving the ping_google function, so you don't have to pass the full URL in. This seems like something that should be taken care of automatically, I just haven't thought of a good way.

Enjoy!

Comments

Adrian Holovaty — August 29, 2006 at 2:40 p.m.

Nice work! I've long wanted to add something like this to Django (automatically generating Google sitemap stuff) -- great minds think alike, and so do ours. Would you be willing to contribute it to the framework proper?

Dan Watson — August 29, 2006 at 3:04 p.m.

Sure thing. When I get a free minute, I'll package it up, write some documentation, and submit a ticket.

Phil Powell — August 30, 2006 at 2:23 a.m.

This is great! Only yesterday I was taking a look at Google's sitemap.xml, and it hadn't even occurred to me that it would make sense to automate the generation process. Neat work, and thanks for sharing.

tabo — August 30, 2006 at 1:39 p.m.

Thank you!

I was going to write my own sitemap routines for feedjack, you saved me a lot of time!

Adrian Holovaty — September 01, 2006 at 12:02 a.m.

For the record, this is now an official part of Django -- see http://www.djangoproject.com/documentation/sitemaps/ . Thanks, Dan!

Add a Comment


Your email address will not be displayed.

You may use markdown formatting.