Recently I implemented a Rails application sitemap but once the search engines started hitting my site, I realized that I wasn't giving them all the information they desired.
- Yahoo was doing simple HTTP head requests on my sitemap and going away
- 184.108.40.206 - - [11/Dec/2007:08:35:49 -0800] "HEAD /sitemap.xml HTTP/1.0" 200 334 "-" "Yahoo! Slurp/Site Explorer"
- Google read my sitemap infrequently but loved to do HEAD requests on my index page
- 220.127.116.11 - - [17/Dec/2007:07:11:24 -0800] "HEAD / HTTP/1.1" 200 372 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
For those of you unfamiliar with HEAD requests, they are requests for a page that don't actually return the page. Just information like size and date. Its a nice way for search engines and caches to decide if there are any changes they need to pull. For some examples of HTTP requests check out the wiki page I wrote on making HTTP requests via telnet.
Now what do you think Google and Yahoo were looking for? My assumption is they are looking for the Last-Modified header in the response to see if they should bother requesting the sitemap. They could compare the size of the response since they last visited but I doubt they want track this information.
The Last-Modified date is the ideal field to check.
This is all fine and good but guess what. Rails doesn't set that header for you.
telnet ficlets.com 80 Trying 18.104.22.168... Connected to www.ficlets.com. Escape character is '^]'. HEAD / HTTP/1.0 Host: www.ficlets.com HTTP/1.1 200 OK Date: Wed, 09 Jan 2008 20:22:31 GMT Server: Mongrel 1.0.1 Status: 200 OK Cache-Control: no-cache Content-Type: text/html; charset=utf-8 Content-Length: 16355 Set-Cookie: _session_id=s322clipped; path=/ Vary: Accept-Encoding Connection: close
The good news is this is easy to add! If you look at my wiki example you'll see that I set the header with the time of the latest entry in the sitemap.
headers["Last-Modified"] = @entries.updated_at.httpdate
Now it isn't rails job to add this header for you but it would be nice if the scaffolding added this header. The standard show/1 actions are pulling a record from a database and the action knows the updated_at value.
I have added this header to my show actions as I am just pulling a record from a database and I have the modified time!
headers["Last-Modified"] = @business.updated_at.httpdate
That's all. I'm looking forward to any comments and suggestions people may have.