rails

Rack-Cache on Rails 3 – the unadvertised caching option

by on Mar.28, 2012, under rails, tech

Today I was poking around in the rails tmp/cache directory and I saw an entry I did not expect. It appeared that a controller action was being cached but I was not using, page, action or fragment caching.

I was setting an expiration header on the action response and as it turns out there is a handy gem in rails now called rack-cache. Rack-cache will create cache entries for items that you set future a expiration for. Since this isn’t mentioned in the Rails Caching Guide it took me a little while to track it down.

Here’s a quick example of it in action

Create a dummy app

rails new rack-cache
cd rack-cache
bundle exec rails generate controller RackCache cache_this

Edit the cache_this method

class RackCacheController < ApplicationController
def cache_this
    render :text => Time.zone.now.to_s
  end
end

Fire it up in production mode so caching is enabled

RAILS_ENV=production bundle exec rails s -p6666

When you hit the page in the browser and hit reload you’ll see the time change

http://localhost:6666/rack_cache/cache_this

2012-03-29 00:40:09 UTC
2012-03-29 00:40:29 UTC

Now lets change the action slightly

class RackCacheController < ApplicationController
  def cache_this
    expires_in(5.minutes, :public => true)
    render :text => "The time is #{Time.zone.now.to_s}"
  end
end

Now each time you reload the page you get the same time

The time is 2012-03-29 00:58:07 UTC
The time is 2012-03-29 00:58:07 UTC

Without a reload the browser won’t bother asking for the resource for another 5 minutes.  With the reload we get a 304 message.

Normally a simple clear of your browser cache would get you a new time but Rack-Cache also cached this on the server.  So clear all you want, you’ll not get an update until 5 minutes after the time in the window.   If there are 10 folks hitting this page they will all see the same time.   The 1st person to hit it after expiration will update the time.

The time is 2012-03-29 01:04:01 UTC

So where is the output cached?  By default Rails used the ActiveSupport::Cache::FileStore which lives in tmp/cache — configurable via config.cache_store.

ls -tr tmp/cache/
 assets    B15    2A5    A92    B2C    B31
 # B31 is newest dir
 find tmp/cache/B31
 tmp/cache/B31
 tmp/cache/B31/A20
 tmp/cache/B31/A20/1c4fa8db52f778f10f535fe2253f3a63f92fd87b

The cached response is in

tmp/cache/B31/A20/1c4fa8db52f778f10f535fe2253f3a63f92fd87b

cat tmp/cache/B31/A20/1c4fa8db52f778f10f535fe2253f3a63f92fd87b
 o: ActiveSupport::Cache::Entry    :@compressedF:@expires_in0:@created_atf1332983041.303097:
 @value[I"(The time is 2012-03-29 01:04:01 UTC:EF

This is a pretty powerful caching option that Rails developers should understand and use when appropriate.    In some cases, not knowing this feature was in place has broken applications. Maybe I'll edit the Rails Guide to include this information.

I noticed that rack-cache was filling our apache error log with entries

cache: [GET /somepath] miss
cache: [GET /anotherpath/logo?1333186605] fresh
cache: [POST /yet/another/path] invalidate, pass

Turn off the verbose option by adding this to your production.rb

config.action_dispatch.rack_cache =  {:metastore=>"rails:/", :entitystore=>"rails:/", :verbose=>false}

To disable rack-cache altogether just do this

config.action_dispatch.rack_cache =  nil

Some useful links

2 Comments more...

Setting up Sunspot/Solr for OR queries, stemming and lower memory usage

by on Jan.06, 2011, under rails, tech

As I keep finding in Rails 3, the Gems I used in Rails 2 no longer work or have fallen out of favor.   In Rails 2 acts_as_ferret met my searching needs but after submitting some fixes for Rails 3 and Ruby 1.9.2, I was still having issues so I moved on to Sunspot.

One of the 1st things I wanted to change with Sunspot was to make the default boolean operator OR.   This means when someone searches for “car window” they will get results that match car or window.

Not being a Solr expert my 1st thought was that all I needed to do was change

<solrQueryParser defaultOperator="AND"/>

to

<solrQueryParser defaultOperator="OR"/>

But it didn’t work.   After some research and digging through the logs I learned that Sunspot is using the dismax request handler.  To make a long story short, dismax ignores the defaultOperator and uses a minimum_match field.   The good news here is that setting this field to 1 in your search query is easy and gives you the same function as  defaultOperator=”OR”.

In your controller your search would look something like this.

@articles = Article.search do
  keywords(actual_search) {minimum_match 1}
end

Next thing I wanted was for car searches to return results for cars and other stems.   This required a 1 line change in schema.xml

In the <analyzer> block just add <filter class=”solr.SnowballPorterFilterFactory” language=”English” />

      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StandardFilterFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English" />
      </analyzer>

Finally, because the model I am searching is small and Java eats quite a bit of memory I wanted to reduce the Solr server’s memory footprint.  This may come back to bite me as my dataset grows but for now this is working fine.  To adjust the memory parameters used when using rake sunspot:solr:start just edit your sunspot.yml file and add min_memory and max_memory lines.

development:
  solr:
    hostname: localhost
    port: 8982
    log_level: DEBUG
    min_memory: 64M
    max_memory: 64M

This will result in -Xms64M -Xmx64M being sent to java on startup.

      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StandardFilterFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English" />
      </analyzer>
7 Comments more...

My sitemap notes are in the Advanced Rails Recipes Book

by on Mar.08, 2008, under rails, sitemap, tech

After I blogged about building a sitemap for Rails I contacted Mike Clark and asked him if he thought it would make a good Recipe for his upcoming book, Advanced Rails Recipes. He thought it was a good fit and it is currently in the Beta version of the book.

I wrote up my notes in the Recipe format, then Mike basically rewrote it for Rails 2.0 and added some additional content. Thanks Mike! I almost feel bad being cited as the author since after editing it is drastically different from the original. :-)

The core concepts are still there and some thoughts were dropped since Recipes should be short. So, Here are some elaborations..

The Ping Protocol

There is a warning in the book about excessively pinging to Google to have them read your sitemap. I would recommend letting search engines crawl your sitemaps at their own speed. The ping example in the book was a nice overview of when to use an Observer and also provided complete coverage on how to submit sitemaps. Please use ping sparingly, if at all. :-)

Sitemaps with over 50,000 entries

I work on sites where we use siteindex files because we submit well over 50,000 URLs to the search engines. I didn’t provide an example on how to build these in Rails because I’m not sure they provide any value to the typical site.

My theory is that if you build a sitemap with the 50,000 pages that were most recently updated you will give the search engines all they need. If a page isn’t updated for a while and it falls off the list is that really a problem? If the page was worth anything someone externally would be linking to it before it fell off the list. Now if your site is creating millions of pages a day this may not be the case.

If your pages are islands (no links to them) and you’re afraid they won’t be found unless they are all in the sitemap, I would suggest building the sitemap via a rake task that is kicked off via a cron job. This will also give you an opportunity to gzip the files. I’ll try to writeup some example code for the this when I find some free time.

Do I really need a sitemap?

If your site has navigation to all its pages, then a sitemap will probably not benefit you. I suggest checking what pages the search engines have in their index and if key content is missing then pursue a sitemap. Even if they are finding all your pages a sitemap certainly couldn’t hurt.

Just in case you didn’t know how to find the pages Google knows about on your site you can simply type site:youdomain.com in the Google or Yahoo search box.

Example results for my site are here

Leave a Comment : more...