rails
Rack-Cache on Rails 3 – the unadvertised caching option
by Tony Primerano on Mar.28, 2012, under rails, tech
Today I was poking around in the rails tmp/cache directory and I saw an entry I did not expect. It appeared that a controller action was being cached but I was not using, page, action or fragment caching.
I was setting an expiration header on the action response and as it turns out there is a handy gem in rails now called rack-cache. Rack-cache will create cache entries for items that you set future a expiration for. Since this isn’t mentioned in the Rails Caching Guide it took me a little while to track it down.
Here’s a quick example of it in action
Create a dummy app
rails new rack-cache cd rack-cache bundle exec rails generate controller RackCache cache_this
Edit the cache_this method
class RackCacheController < ApplicationController
def cache_this
render :text => Time.zone.now.to_s
end
end
Fire it up in production mode so caching is enabled
RAILS_ENV=production bundle exec rails s -p6666
When you hit the page in the browser and hit reload you’ll see the time change
http://localhost:6666/rack_cache/cache_this
2012-03-29 00:40:09 UTC 2012-03-29 00:40:29 UTC
Now lets change the action slightly
class RackCacheController < ApplicationController
def cache_this
expires_in(5.minutes, :public => true)
render :text => "The time is #{Time.zone.now.to_s}"
end
end
Now each time you reload the page you get the same time
The time is 2012-03-29 00:58:07 UTC
The time is 2012-03-29 00:58:07 UTC
Without a reload the browser won’t bother asking for the resource for another 5 minutes. With the reload we get a 304 message.
Normally a simple clear of your browser cache would get you a new time but Rack-Cache also cached this on the server. So clear all you want, you’ll not get an update until 5 minutes after the time in the window. If there are 10 folks hitting this page they will all see the same time. The 1st person to hit it after expiration will update the time.
The time is 2012-03-29 01:04:01 UTC
So where is the output cached? By default Rails used the ActiveSupport::Cache::FileStore which lives in tmp/cache — configurable via config.cache_store.
ls -tr tmp/cache/ assets B15 2A5 A92 B2C B31 # B31 is newest dir find tmp/cache/B31 tmp/cache/B31 tmp/cache/B31/A20 tmp/cache/B31/A20/1c4fa8db52f778f10f535fe2253f3a63f92fd87b
The cached response is in
tmp/cache/B31/A20/1c4fa8db52f778f10f535fe2253f3a63f92fd87b
cat tmp/cache/B31/A20/1c4fa8db52f778f10f535fe2253f3a63f92fd87b o: ActiveSupport::Cache::Entry :@compressedF:@expires_in0:@created_atf1332983041.303097: @value[I"(The time is 2012-03-29 01:04:01 UTC:EF
This is a pretty powerful caching option that Rails developers should understand and use when appropriate. In some cases, not knowing this feature was in place has broken applications. Maybe I'll edit the Rails Guide to include this information.
I noticed that rack-cache was filling our apache error log with entries
cache: [GET /somepath] miss cache: [GET /anotherpath/logo?1333186605] fresh cache: [POST /yet/another/path] invalidate, pass
Turn off the verbose option by adding this to your production.rb
config.action_dispatch.rack_cache = {:metastore=>"rails:/", :entitystore=>"rails:/", :verbose=>false}
To disable rack-cache altogether just do this
config.action_dispatch.rack_cache = nil
Some useful links
Setting up Sunspot/Solr for OR queries, stemming and lower memory usage
by Tony Primerano on Jan.06, 2011, under rails, tech
As I keep finding in Rails 3, the Gems I used in Rails 2 no longer work or have fallen out of favor. In Rails 2 acts_as_ferret met my searching needs but after submitting some fixes for Rails 3 and Ruby 1.9.2, I was still having issues so I moved on to Sunspot.
One of the 1st things I wanted to change with Sunspot was to make the default boolean operator OR. This means when someone searches for “car window” they will get results that match car or window.
Not being a Solr expert my 1st thought was that all I needed to do was change
<solrQueryParser defaultOperator="AND"/>
to
<solrQueryParser defaultOperator="OR"/>
But it didn’t work. After some research and digging through the logs I learned that Sunspot is using the dismax request handler. To make a long story short, dismax ignores the defaultOperator and uses a minimum_match field. The good news here is that setting this field to 1 in your search query is easy and gives you the same function as defaultOperator=”OR”.
In your controller your search would look something like this.
@articles = Article.search do
keywords(actual_search) {minimum_match 1}
end
Next thing I wanted was for car searches to return results for cars and other stems. This required a 1 line change in schema.xml
In the <analyzer> block just add <filter class=”solr.SnowballPorterFilterFactory” language=”English” />
<analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StandardFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.SnowballPorterFilterFactory" language="English" /> </analyzer>
Finally, because the model I am searching is small and Java eats quite a bit of memory I wanted to reduce the Solr server’s memory footprint. This may come back to bite me as my dataset grows but for now this is working fine. To adjust the memory parameters used when using rake sunspot:solr:start just edit your sunspot.yml file and add min_memory and max_memory lines.
development: solr: hostname: localhost port: 8982 log_level: DEBUG min_memory: 64M max_memory: 64M
This will result in -Xms64M -Xmx64M being sent to java on startup.
<analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StandardFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.SnowballPorterFilterFactory" language="English" /> </analyzer>
My sitemap notes are in the Advanced Rails Recipes Book
by Tony Primerano on Mar.08, 2008, under rails, sitemap, tech
After I blogged about building a sitemap for Rails I contacted Mike Clark and asked him if he thought it would make a good Recipe for his upcoming book, Advanced Rails Recipes. He thought it was a good fit and it is currently in the Beta version of the book.
I wrote up my notes in the Recipe format, then Mike basically rewrote it for Rails 2.0 and added some additional content. Thanks Mike! I almost feel bad being cited as the author since after editing it is drastically different from the original.
The core concepts are still there and some thoughts were dropped since Recipes should be short. So, Here are some elaborations..
The Ping Protocol
There is a warning in the book about excessively pinging to Google to have them read your sitemap. I would recommend letting search engines crawl your sitemaps at their own speed. The ping example in the book was a nice overview of when to use an Observer and also provided complete coverage on how to submit sitemaps. Please use ping sparingly, if at all.
Sitemaps with over 50,000 entries
I work on sites where we use siteindex files because we submit well over 50,000 URLs to the search engines. I didn’t provide an example on how to build these in Rails because I’m not sure they provide any value to the typical site.
My theory is that if you build a sitemap with the 50,000 pages that were most recently updated you will give the search engines all they need. If a page isn’t updated for a while and it falls off the list is that really a problem? If the page was worth anything someone externally would be linking to it before it fell off the list. Now if your site is creating millions of pages a day this may not be the case.
If your pages are islands (no links to them) and you’re afraid they won’t be found unless they are all in the sitemap, I would suggest building the sitemap via a rake task that is kicked off via a cron job. This will also give you an opportunity to gzip the files. I’ll try to writeup some example code for the this when I find some free time.
Do I really need a sitemap?
If your site has navigation to all its pages, then a sitemap will probably not benefit you. I suggest checking what pages the search engines have in their index and if key content is missing then pursue a sitemap. Even if they are finding all your pages a sitemap certainly couldn’t hurt.
Just in case you didn’t know how to find the pages Google knows about on your site you can simply type site:youdomain.com in the Google or Yahoo search box.
Example results for my site are here