tech
Rack-Cache on Rails 3 – the unadvertised caching option
by Tony Primerano on Mar.28, 2012, under rails, tech
Today I was poking around in the rails tmp/cache directory and I saw an entry I did not expect. It appeared that a controller action was being cached but I was not using, page, action or fragment caching.
I was setting an expiration header on the action response and as it turns out there is a handy gem in rails now called rack-cache. Rack-cache will create cache entries for items that you set future a expiration for. Since this isn’t mentioned in the Rails Caching Guide it took me a little while to track it down.
Here’s a quick example of it in action
Create a dummy app
rails new rack-cache cd rack-cache bundle exec rails generate controller RackCache cache_this
Edit the cache_this method
class RackCacheController < ApplicationController
def cache_this
render :text => Time.zone.now.to_s
end
end
Fire it up in production mode so caching is enabled
RAILS_ENV=production bundle exec rails s -p6666
When you hit the page in the browser and hit reload you’ll see the time change
http://localhost:6666/rack_cache/cache_this
2012-03-29 00:40:09 UTC 2012-03-29 00:40:29 UTC
Now lets change the action slightly
class RackCacheController < ApplicationController
def cache_this
expires_in(5.minutes, :public => true)
render :text => "The time is #{Time.zone.now.to_s}"
end
end
Now each time you reload the page you get the same time
The time is 2012-03-29 00:58:07 UTC
The time is 2012-03-29 00:58:07 UTC
Without a reload the browser won’t bother asking for the resource for another 5 minutes. With the reload we get a 304 message.
Normally a simple clear of your browser cache would get you a new time but Rack-Cache also cached this on the server. So clear all you want, you’ll not get an update until 5 minutes after the time in the window. If there are 10 folks hitting this page they will all see the same time. The 1st person to hit it after expiration will update the time.
The time is 2012-03-29 01:04:01 UTC
So where is the output cached? By default Rails used the ActiveSupport::Cache::FileStore which lives in tmp/cache — configurable via config.cache_store.
ls -tr tmp/cache/ assets B15 2A5 A92 B2C B31 # B31 is newest dir find tmp/cache/B31 tmp/cache/B31 tmp/cache/B31/A20 tmp/cache/B31/A20/1c4fa8db52f778f10f535fe2253f3a63f92fd87b
The cached response is in
tmp/cache/B31/A20/1c4fa8db52f778f10f535fe2253f3a63f92fd87b
cat tmp/cache/B31/A20/1c4fa8db52f778f10f535fe2253f3a63f92fd87b o: ActiveSupport::Cache::Entry :@compressedF:@expires_in0:@created_atf1332983041.303097: @value[I"(The time is 2012-03-29 01:04:01 UTC:EF
This is a pretty powerful caching option that Rails developers should understand and use when appropriate. In some cases, not knowing this feature was in place has broken applications. Maybe I'll edit the Rails Guide to include this information.
I noticed that rack-cache was filling our apache error log with entries
cache: [GET /somepath] miss cache: [GET /anotherpath/logo?1333186605] fresh cache: [POST /yet/another/path] invalidate, pass
Turn off the verbose option by adding this to your production.rb
config.action_dispatch.rack_cache = {:metastore=>"rails:/", :entitystore=>"rails:/", :verbose=>false}
To disable rack-cache altogether just do this
config.action_dispatch.rack_cache = nil
Some useful links
Is Ubuntu Server winning over CentOS folks?
by Tony Primerano on Jun.28, 2011, under tech, work
For years I have used Ubuntu for my desktop environment and CentOS in production. Why? Ubuntu makes a great desktop distro and since CentOS is basically a copy of Red Hat, it is considered an enterprise OS.
The trouble with being an Enterprise OS is you avoid the latest updates to the OS and aggressively patch proven packages. CentOS has worked fine for me up until recently and I suspect all my future deployments will use Ubuntu. CentOS packages are lagging behind and this lag is causing pain. Here are some examples of my recent pain points.
Every 3 months on the dot I fail my PCI compliance scan with the following error.
OpenSSH 4.3 is vulnerable Severity: Critical Problem
OpenSSH is up to version 5.8 but RedHat keeps patching 4.3. It is totally secure, it has the latest patches but every 3 months I need to contact the scan company and prove that I have a patched release. Not fun.
CentOS is using gcc 4.1.2. Gcc 4.1.2 was released in 2007 and many tools are requiring newer versions to work. Most recently I tried using opscode/chef and while the site says it works with CentOS you’ll need to update the compiler to 4.2 or higher. This defeats the purpose of using Chef IMO.
I also find myself building things like git on CentOS that are part of the standard repository on Ubuntu. Sure, I can start adding random repositories to get these things but I’d rather work with an OS that has them in the default/supported repository.
I’ve talking with colleagues at several other companies over the past few weeks and several are using Ubuntu Server or are planning on getting off CentOS in the near future. A side note on Rails from my talks, there seems to be little excitement about CoffeeScript or Sass in Rails 3.1 (just learn css and js already) and folks prefer test-unit and shoulda over rspec. I totally agree with this sentiment.
Setting up Sunspot/Solr for OR queries, stemming and lower memory usage
by Tony Primerano on Jan.06, 2011, under rails, tech
As I keep finding in Rails 3, the Gems I used in Rails 2 no longer work or have fallen out of favor. In Rails 2 acts_as_ferret met my searching needs but after submitting some fixes for Rails 3 and Ruby 1.9.2, I was still having issues so I moved on to Sunspot.
One of the 1st things I wanted to change with Sunspot was to make the default boolean operator OR. This means when someone searches for “car window” they will get results that match car or window.
Not being a Solr expert my 1st thought was that all I needed to do was change
<solrQueryParser defaultOperator="AND"/>
to
<solrQueryParser defaultOperator="OR"/>
But it didn’t work. After some research and digging through the logs I learned that Sunspot is using the dismax request handler. To make a long story short, dismax ignores the defaultOperator and uses a minimum_match field. The good news here is that setting this field to 1 in your search query is easy and gives you the same function as defaultOperator=”OR”.
In your controller your search would look something like this.
@articles = Article.search do
keywords(actual_search) {minimum_match 1}
end
Next thing I wanted was for car searches to return results for cars and other stems. This required a 1 line change in schema.xml
In the <analyzer> block just add <filter class=”solr.SnowballPorterFilterFactory” language=”English” />
<analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StandardFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.SnowballPorterFilterFactory" language="English" /> </analyzer>
Finally, because the model I am searching is small and Java eats quite a bit of memory I wanted to reduce the Solr server’s memory footprint. This may come back to bite me as my dataset grows but for now this is working fine. To adjust the memory parameters used when using rake sunspot:solr:start just edit your sunspot.yml file and add min_memory and max_memory lines.
development: solr: hostname: localhost port: 8982 log_level: DEBUG min_memory: 64M max_memory: 64M
This will result in -Xms64M -Xmx64M being sent to java on startup.
<analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StandardFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.SnowballPorterFilterFactory" language="English" /> </analyzer>
My WordPress Notes
by Tony Primerano on Feb.22, 2010, under tech, wordpress
While I’ve been using WordPress for my blog for over 5 years I dig into the internals so rarely that I often forget some of the cool things I’ve learned and done. In addition to my blog I’ve used it more like a CMS system where there were only pages and no posts, in this case the site used a static front page. I haven’t written any themes from scratch but I’m pretty comfortable hacking other people’s themes and adding new features.
I should probably do this on my wiki but I’m going to try putting some useful notes here.
Viewing database queries
Recently I tried out a calendar plugin that allowed you to enter and display future events. For some reason it only let you show events 99 days in the future. I hacked the code to let me use 3 digits and it took forever to load my events so I added the following to the bottom of my footer.php (before </body>)
<?php
if (current_user_can('administrator')){
global $wpdb;
echo "<pre>";
print_r($wpdb->queries);
echo "</pre>";
}
?>
This dumps all database queries that are made to the bottom of the page. Turned out that the plugin was querying the database once for each date in the future. So when I set it to 200 it was making over 200 queries. Rather than figure out why it was coded tis way I switched to using The Events Calendar Plugin. It lets you specify if a post is an event and then you can add details. It works well.
Header and Footer snippets
I had an application that I wanted to use the header and footer from a live wordpress blog. Any time the blog header/footer was updated the application would update too. The idea here is the wordpress and application maintain the same look and feel and navigation.
To do this I created a header.php and footer.php file in my wordpress root directory (where wp-config.php is). The files started with
<?php
require('./wp-blog-header.php');
?>
And then I stole code from the theme header.php and footer.php files to build the code snippets needed for the header and footer for my application. I then did server side pulls of these live files, cached them and displayed them as my application header and footer.
I guess instead of copying the code I should try to factor out the common code between the theme files and these files so I don’t repeat it. Has anyone done this? I haven’t tried it yet.
No HTTPS? Use OpenId
I hate sending passwords in the clear but often the cost of SSL is more than the cost of hosting your site so we just do without. Personally I always link my openId to my main account and use that to sign in. My assumption is this is much more secure than sending credentials in the clear. I use the OpenID plugin for this. I’m happy that Google now has OpenId support via their profiles.
Caching
DB Cache Reloaded or WP Super Cache?
I’ve used both although there are times when I think a database query is faster than building all these local files. For low traffic sites does the cache help or hurt? I don’t have any stats here. Just thinking out loud.
Broken Link Checker
Some sites have a lot of links. This plugin lets you know when you’re pointing to a dead end. Very handy. http://wordpress.org/extend/plugins/broken-link-checker/
Akismet
Install this or spend all day deleting spam comments.
Backups
Moving my homeowners association web site from static files to wordpress allowed us to have multiple editors but it makes backing up the system much more difficult. Before the site consisted of about 80 files. Now there is a database and many more things can go wrong. I had downloaded a wordpress backup plugin a few months ago but it wasn’t ready for prime time. At the moment my backup consists of daily mysqldumps via cron that reside with my wordpress installation. I then rsync that to my local machine (the wordpress install with db backups). It’s not perfect but it works for now. I guess I could share these scripts in another post.
One more thing
I forgot what I was going to say, maybe it will come to me later tonight. This is why I should have done this on the wiki.
I hope someone finds this useful.
My experience with Rackspace Cloud Sites
by Tony Primerano on Nov.04, 2009, under tech
I’ve been using Rackspace’s Cloud Servers for months now and I thought moving some of our standard PHP apps (like wordpress) to Cloud Sites would save me some time as a sysadmin/developer. I also figured I would setup a database there and use it for my Rails application that runs on Cloud servers (instead of building my own or using Amazon’s RDS).
I’m sorry to say that after a few days of working with Cloud Sites, I think managing the sites myself on Cloud Servers would be easier. Fortunately, many of the problems I am having can be easily fixed.
My issues
1) No rsync or scp access. Installing a wordpress site via FTP or Rackspaces file manager was just plain painful. The file manager didn’t allow me to move files and it always extracted zip files to the root directory. Maybe it works correctly on IE? I only run linux so I have no way of knowing.
2) No easy way to do backups. Cloud Sites allow you to run cron jobs but the example backups build tarballs on the same host that the site is running on. When I saw they had ruby as a cron shell option I assumed they had the CloudFiles gem installed, but they didn’t. I want my backups off moved off the host. I don’t want to log into my account and download them manually.
3) No access to database binary logs. I like to take a snapshot every 15 minutes or so in case I lose my database but this is not an option with the cloudsites database. You could do a mysql dump from another host but you probably don’t want to do this every 15 minutes.
4) For my Cloud Server rails app there was no LAN address to access my Cloud Site database so all my queries incurred bandwidth charges.
Feature Requests (easy to hard)
1) Give people a cron job (or simple backup tab like cloud servers has) that can be used to dump their database daily to Cloud Files. Its a win for everyone. The user gets automated backup and you get more Cloud Files revenue. Here is a script I run from a Cloud Server to back up my Cloud Site database nightly (I can not run it as a Cloud Site job as the CloudFiles gem is not installed on the hosts.. as i mentioned above).
def run(command)
result = system(command)
raise("error, process exited with status #{$?.exitstatus}") unless result
end
cf = CloudFiles::Connection.new(@account_name, @cloud_key)
container = cf.container(@directory_name)
cmd = "mysqldump -C --opt -h #{@mysql_host} -u#{@mysql_user} "
cmd += " -p'#{@mysql_password}'" unless @mysql_password.nil?
cmd += " #{@mysql_database} | gzip > #{@backup_directory}#{@db_file}"
run(cmd)
t = container.create_object(@db_file)
t.load_from_filename "#{@backup_directory}#{@db_file}"
2) Advertise the LAN address of the Database Hosts… of course the address given now is probably a switch that hits several DB machines.
3) Fix the File Manager. Maybe it’s just me but on Firefox/Linux moving files doesn’t work. A crippled version of rsync may also be nice. I suspect there are security issues here but shell access sure would be nice.
4) This is probably hard but it would be nice if I could get access to the MySQL binary logs.
Thanks for listening.
Why I chose The Rackspace Cloud over AWS
by Tony Primerano on Sep.03, 2009, under tech
Last October at BarCamp DC 2 I ran a session called “To Cloud or Not? AWS, EC2, S3 or build your own“. Unfortunately the barcamp wiki died and my notes are gone but at the time it seemed that everyone loved Amazon’s services. I tried using EC2 in April and while the ablity to select from several pre-configured AMIs was nice, building your own AMI should have been easier. I wanted to configure my machine and then push a button to have my image created. With Amazon you needed to install tools and go through several steps to create an image.
Then I found Slicehost. It was owned by Rackspace and had servers for as little as $20/month (for a 256MB instance). A few weeks later I stumbled on Mosso, also owned by Rackspace and it had servers for about $11/month (plus bandwidth). Since my applications were using very little bandwidth, I moved to Mosso which is now called The Rackspace Cloud. With the Rackspace offerings you install your operating system image, configure it and then, from their control panel you can then back it up with 1 click. You can also schedule backups. This was so much easier than EC2.
Then there is the pricing. Amazon’s small instance is a big vitrual machine and at $0.10/hour it runs around $70/month (i think it was 0.12/hour when I 1st started using it). This is probably a good price if you need that much horsepower. What could you possibly run from a 256MB instance anyway? Here’s what I am running.
- A full rails app using Apache/Passenger and MySql (I had to remove several unused modules from apache config and my database is small at the moment)
- Apache PHP — I don’t have a database here but I suspect there is room
I suspect a 512MB instance a safe bet for most applications and I will lilely upgrade as my traffic and database size increases. Depending on the situation, I may just spin up more instances of the same server as redunancy is a good thing. Sure I could run everything on 1 AWS instance but if it dies I’m really SOL.
If you ever need a bigger slice you can upgrade in the control panel with 1 click. All your configurations and IP address are kept the same. I usually make a backup (1 click) before doing this just in case something bad happens.
Rackspace is still making improvements to their APIs and Image Management so while they don’t offer as many services as Amazon, they have offered all important features to make developer’s lives easier, IMO.
For the record, I actually backup my Rackspace Databases to Amazon’s S3, I feel better knowing my backups are in a different datacenter.
If you sign up (and found this post helpful) please use my referral code when creating your account. It is REF-TONYCODE
Don’t let internal wikis leak company secrets.
by Tony Primerano on Feb.01, 2009, under tech, work
Good wiki pages have good titles and the title is usually in the URL. This is all well and good but if you’re creating pages on a corporate wiki that is hidden from the outside world, your title may be giving away too much information.
A few days ago on twitter one of my colleagues noticed this post on twitter.
JonGretar: I think AOL is using Erlang for it’s chat system redesign. about 6 hours ago · Reply · View Tweet
His initial thought was that all our candid discussions about Erlang on twitter gave us away. But this wasn’t the case.
JonGretar: @tumka Because of people viewing my erlang tutorial with referrer: wiki.office.aol.com/wiki/Open_Chat_Backend_Redesign about 5 hours ago · Reply · View Tweet
Ooops. Fortunately our Chat Backend Redesign is not a secret and our wiki page has links to all sorts of external sites.
For those non-techies out there let me explain what a referrer is.
When you click a link on a web page, the destination site is sent information on what page sent it the traffic. The source page is the referrer (or referer as it is misspelled in the HTTP spec). There are several useful applications that use the referrer information that I won’t discuss here. Naturally, Wikipedia has a good article on the subject. http://en.wikipedia.org/wiki/Referer
What would have been worse is if we had a page called wiki.office.aol.com/wiki/Technology_Acquisitions_In_Process and someone at company X noticed this referrer in their access logs. If company X was a public company that person might run off and buy a pile of stock based on this simple observation.
Now most wikis are public and people wouldn’t be creating pages like this in the public space. But more and more companies have internal wikis and it is becoming common to discuss things on these pages that should not be shared with the public. Rather than having to worry about your page titles giving away too much information I have created a mediawiki extension that will prevent referrer information from being sent to external sites.
Information on the extension can be found here.
Good Times at BarCampDC2
by Tony Primerano on Oct.20, 2008, under barcampdc2, tech
On Saturday, I spent the day at BarCampDC2. Like last year there were plenty of great sessions. I really wanted to discuss Amazon’s EC2 as I think it is where most small companies should be moving their sites and I see huge business opportunities in this space. As the session board was being built I kept looking for something on Amazon’s EC2, S3 or AWS. I had not used the services yet, but I was determined to discuss them as I know many of the companies present were using them. In desperation, I grabbed a pen and a post it note and wrote “To Cloud or Not? AWS, EC2, S3 or build your own“. I owned the session and I hoped I would have a few experts in the room so I could act as a moderator instead of presenter.
The room was packed and I started out by telling everyone that I had no presentation or experience with these technologies and hoped we had some experts in the room. As I expected there were plenty of experts in the room and we had a great discussion on what Amazon had to offer and other offerings that companies can leverage.
The room was filled with some of the greatest minds from the DC Tech scene.

More pics are here.
My notes from the session are here. Nikolas Coukouma helped clean them up and added some additional pointers.
After my session I attended several hard core geek sessions, as usual there were many sessions I was unable to attend. Maybe we can videotape the sessions next time?
I attended
- 11AM – Nitrogen Web Framework
- 1PM – Git
- 2PM – MySQL Optimization
- 3PM-5PM wandered through various sessions.
- 5PM – Beer! Great presentation by Chris Williams. I thought this was an early happy hour room but instead Chris schooled us on the history of beer.
Afterward we headed to McFadden’s where I consumed several pints of Guinness. Fortunately I had taken the bus and metro from my house so getting home was a non-issue.
Thanks to Center for Digital Imaging Arts at Boston University for letting us have the conference there and thanks to all the folks that helped put this together. Can we do this again in 6 months?
LifeStream Aggregators
by Tony Primerano on Aug.15, 2008, under tech, work
A few months ago I was asked to do a Technology Due Diligence of SocialThing. My 1st reaction was why do we need another aggregator? We have BuddyFeed and several technologies that are proven to scale. People always snicker when AOL and technology are mentioned in the same sentence but the fact of the matter is that we’ve been building products that need to scale to millions of users, on day one, for well over 15 years. There are some amazing software engineers at AOL.
Back to SocialThing….
As TechCrunch and others have noted, there are several social aggregators out there, but SocialThing found a powerful niche in the LifeStream aggregation market. Let me start by defining what a LifeStream is and then I’ll get to what makes SocialThing unique.
LifeStreams focus on feeds about your life.
- What you are doing right now
- What pictures you recently uploaded
- What you posted to your blog.
Feeds on news and other events that are published by 3rd parties are not part of a LifeStream. They may still be feeds but they are not part of your life.
Life with a single social network
If everyone updated their status, uploaded pictures and blogged on a single social network LifeStreams would be easy. The standard Facebook News Feed would be all we needed to keep up to date with what our friend were up to. Of course, if there was a single network it would probably be pretty boring. Competition between social networks means we’ll always have the latest and greatest features, and if we don’t we’ll eventually move to where the best features are.
The problem with this competition is that our online life is fragmented and our friends are in various places. If we want to keep up with everyone we need to sign into several services or find a way to aggregate information on the people we care about.
LifeStream Aggregators try to make this easy. I’ll compare three of them here. These notes were made on my wiki a few months ago when I was hashing out what it was we were looking to buy.
FriendFeed
Many of the things our friends do are available via public feeds. This makes pulling them together in a meaningful way easy. FriendFeed makes this easy but you need to define what you want people to see in your feed (Me Feed) and then people can subscribe to your feed.
For example. I can quickly build a feed on FriendFeed that includes my Twitter, Pownce, Flickr and Blog feeds. Then my friends that use FriendFeed can subscribe to my feed to build a LifeStream of people they follow. FriendFeed makes this easy but its another account you need to create.
AIM Buddy Feed/ Buddy Updates
AIM has a buddy feed that does almost the same thing as FriendFeed. The AIM Buddy Feed feature is not well advertised but if you set it up your updates will show up in your friends buddy lists. Setup your feed here. If AOL promotes this feature and your friends are already on AIM this will save you from registering at yet another site. The aggregation of all LifeStreams of your buddies has been at dashboard.aim.com at times and then disappears. I hope it comes back soon.
Social Thing
SocialThing makes aggregation simpler than FriendFeed and AIM Buddy Feed because they took a different approach. Instead of requiring all your friends to join SocialThing they just pull your friend’s feeds from the networks you already belong to. They also let you post messages to your various networks.
They can do this because they ask you for your name and password on these sites. Personally I find this scary. Some sites like Facebook can give 3rd party sites tokens that they can store and use to access your account so the password is never sent to the 3rd party site. But sites like Twitter do not have this capability so sites like SocialThing need to save your username and password.
What do you get for giving up your usernames and passwords? Power!
Some services like Twitter have APIs to fetch your aggregated LifeStream in a single call. This makes SocialThing’s job easy. Other APIs require a 2 step process to get the aggregated LifeStream.
- Step 1 — Fetch my friends
- Step 2 — Fetch my friend’s “Me” Feeds.
This second scenario presents a scaling nightmare. If I login to SocialThing and I have 5 networks that require a 2 step process and I have approximately 50 friends in each, they need to make (5 + 50*5) 255 calls anytime I visit. They then need to keep polling these services to keep them up to date. This is a lot of work for SocialThing to be doing but it is also beating up on my 5 networks. As SocialThing grows its user base they might find their IP Address block as they overload the sites they are polling to build LifeStreams.
Another nice feature that you get by giving SocialThing your password is the ability to send messages to your various networks from SocialThing. With 1 click you can update your status on Twitter, Facebook and Pownce.
What’s next?
Honestly, I don’t know.
At this point the product folks are in control. We have an opportunity to make a great product even better and to bring our AOL, AIM and Bebo users into the world of LifeStream aggregation.
Building Bebo Apps. Part 2 (Auto-Updating the Profile Page)
by Tony Primerano on Apr.16, 2008, under bebo, tech
Handles are the key to having applications update on the profile page without a user taking action. Some notes on how to do this on Facebook are here. Since Bebo has implemented most of the Facebook API this should be easy but I think some things (like Infinite session keys) are still missing. Hoping someone will leave me pointer proving me wrong on this.
By placing a handle on a user’s page you can (in theory) push updates to a user’s profile page without them interacting with the application. For example. If you had a “Washington Capitals Fans” application that pushed news to every user’s profile that had the application, all these pages would have the same handle. Having a single handle means that a single update of the handle content would update all profiles using the application.
In other cases you might want a unique handle per profile. For example, if your application injected an RSS feed of the user’s choice onto their profile.
Here is a simple example of setting a handle on a user’s page. I use the user’s UID as the handle in this case.
Set a handle on a user’s profile
I then write an update.php file that injects the date on the user’s profile
Update a handle on a user’s profile
That’s all fine and good but the update wasn’t automatic. The user had to hit the update.php file. In a real application we would have a database of all the users and we could periodically update the handles. Assuming there is an infinite session concept in the Bebo API. Since I could not figure out how to implement an infinite session here is my hack.
I place a hidden image on the user’s page that calls the update function.
Now when a user views their profile, the hidden image calls refresh.php and the handle content is updated. Unfortunately, the update happens after the page is rendered so you’ll need to reload to see the change.
The code…
Set a handle on a user’s profile and use hidden image for update
Unfortunately, the update happens after the page is rendered so you’ll need to reload the page to see the change. It should also be noted that anytime someone views your page they will call the update.php. If they also have the application installed your handle will be updated.
Assuming infinite sessions keys are unavailable I would leverage app users to update their peers. In your database of users keep track of the last time they were updated. Anytime the update.php function is called do an update on all handles that haven’t been updated in a specific period of time. As the user base grows this should allow for a more even distribution of updates. This might even be better than the cron jobs that infinite session key users use in their cron jobs.