Customer requested email spoofing: SPF, DKIM and the desire to please your customers

Feb 14, 2014 Published by Tony Primerano

Innocent Beginnings

I have worked on a handful of web applications that send mail as a core feature. In all cases, no matter how much I protest, we end up with customers that insist on mail sent on their behalf to come from their domain.

Here is the scenario.  (these domain names are made up but likely active so don't go there)

  • My application runs on wonderfulapp.com
  • For Acme, one of my customers I run a branded version at acme.wonderfulapp.com
  • Acme's site is at acme.com

Now Acme has several people using acme.wonderfulapp.com and because it has the Acme brand they tend to feel that it is the Acme site.  Lets say acme.wonderfulapp.com has a checkout process where anyone can buy things and they are sent an email receipt.

The customer is always right?

Now in version 1.0 of wonderfulapp.com the email would probably come from no-reply@wonderfulapp.com, with a reply-to line with an email address of someone at acme.com in case anyone replied to the receipt email.  This works fine but the folks at Acme think the email should come from an acme.com email address.  So we release V1.1 where we send the email receipts from receipts@acme.com to avoid any wonderfulapp.com branding confusion.

This is when all hell breaks loose.

The email that claims to be from receipts@acme.com is being sent from our wonderfulapp.com server.   We spent a lot of time ensuring mail sent from this server would not be marked as spam when coming from our @wonderfulapp.com email addresses.  Some of the things we did to prevent items from being marked as spam were

  1. Ensure a reverse DNS lookup on the IP address for wonderfulapp.com returns wonderfulapp.com.   When a server gets mail from an IP address it does a quick check to see if that IP address is related to the domain in the email From line.
  2. Setup an SPF record for wonderfulapp.com.   An SPF record specifies which IP addresses are allowed to send mail for a specific domain.
  3. Make sure wonderfulapp.com's email server is secure and not accessible to hackers.  Keeps us off blacklists

In version 1.0 everything was fine but now in 1.1 we are sending mail on behalf of receipts@acme.com.   When a mail server like google gets this receipt email they do some of these things

  • Look at the IP address of the sender and see if it belongs to the domain in the From line.
    • Our IP address does not belong to acme.com.   So we fail this check
  • See if acme.com has a SPF record stating who is allowed to send mail from acme.com
    • Many orgs don't have a SPF record so we'll fail on this account.  (and if they did we would fail harder if we were not included)

At this point we have 2 strikes against us. our server might have a good reputation and the mail might go thru regardless.   Success will vary between different email services as they all have different spam algorithims

There is also the possibility that our server's IP could start being blocked as a spammer because we are now sending emails that are being marked spam (the acme.com emails).  This in turn could cause our  emails from wonderfulapp.com to start being marked as spam.

Sending mail as someone you are not is known as spoofing and us sending mail from receipts@acme.com is clearly spoofing but it is exactly what our customer wanted.

If you give into this customer requirement you should agree to spend time with their IT departments to get things setup right.  This can be a huge effort as many small businesses will not know where to start setting up SPF records or how to add you to an existing one.

Lets go with the easiest case.   acme.com has a SPF record that is working correctly.  At this point we'll just need their IT guy to add us to their record.   Folks that use google apps to send their mail do this all the time.

v=spf1  ip4:1.2.3.4 a mx include:_spf.google.com ~all

If this was Acme's SPF record it says.  I send mail from 1.2.3.4, the mail servers in my MX records, acme.com and any google server.

Now we just need to tell Acme to add us

v=spf1  ip4:1.2.3.4 a mx include:_spf.google.com include:wonderfulapp.com ~all

Good luck finding the person who does this.    But if you can you should have them add you.

If they don't have an SPF record then asking them to add one is not a good route to go.   If they mess it up they could break their mail delivery.    Skip the SPF..  Move on.

What about reply-to??

A compromise I have tried in the past is to tell them.  We will use @wonderfulapp.com in the From: line and you can have receipts@acme.com in Reply-To:.   This way if people reply to the email it is sent to Acme.

The limitation with this approach is when sending to a bad email.   Say our app sent daily messages to all their customers.  If a customer stopped using that email address,their email server will often times reply with an Undeliverable email message, but this goes to the person in the from line.  If that was noreply@wonderfulapp.com they aren't going to see it.

That said, if the from line was receipts@acme.com and the receiving email server doesn't trust that we are a valid sender for acme.com the bounce email will not be sent anyway.

Forwarders?

If it is easy for your IT department to setup forwarders for all your customers this may be the way to go.   (assuming you're not in the 1000s of customer).    Instead of using receipts@acme.com in the from line set up acme@wonderfulapp.com to forward to receipts@acme.com.   Now you are no longer spoofing any users and you are helping to ensure that mail from your server will not be marked as spam.

DKIM?

If I could finally get wonderfulapp.com to only send email from wonderfulapp.com it would be wise to setup DKIM to help give my server more credibility as a sender of @wonderfulapp.com emails.

That said I have always lost the product argument so the spoofing continues and delivery can be questionable.  :-\

Connecting to customer mail servers

There are many apps that let you enter your email username and password and they will send mail as you.   I never want to be in the business of knowing or managing a customer's email credentials.   Besides the risk of you being hacked these things often expire.   This is a maintenance and security nightmare.

More?

This post is mainly about me getting my thoughts on this subject written down..  I often think of new ideas as I write and I welcome comments and suggestions.     I'm also going to send this to my product folks.

On-Behalf-Of

The On-Behalf-Of header seems to be getting more traction now.   More and more email clients are using it.  It allows you to keep your From header honest while letting receivers who the email is being sent on behalf of.

Companies like HubSpot use On-Behalf-Of and because they are all about metrics they capture bounces via the Return-Path header.    I think Return-Path is generated on the receiving server from the "Mail From:" header.   I suspect this header has the same domain restrictions via DMARC....

 

still working on this

 

 Links

 

Speeding up Rails apps by tuning the Ruby GC

Apr 29, 2013 Published by Tony Primerano

Ever since moving to Ruby 1.9 I've suffered through slow startup times with Rails apps. The first improvement came when a patch to require was included in 1.9.3. This sped things up a bit but for some reason the GC configs have escaped my radar until now.

Rather than blindly use the configuration variables that have been posted around I wanted to understand them a little before implementing them.

Several posts, including this very good one Improve Rails loading time - Stack Overflow, talk about the following parameters that can be set in your environment.

RUBY_HEAP_MIN_SLOTS=800000
RUBY_HEAP_FREE_MIN=100000
RUBY_HEAP_SLOTS_INCREMENT=300000
RUBY_HEAP_SLOTS_GROWTH_FACTOR=1
RUBY_GC_MALLOC_LIMIT=79000000

The trouble is, only 3 of these 5 exist in 1.9.3. One has been renamed and 2 are gone.

Here is what we have

grep getenv gc.c     
malloc_limit_ptr = getenv("RUBY_GC_MALLOC_LIMIT");     
heap_min_slots_ptr = getenv("RUBY_HEAP_MIN_SLOTS");     
free_min_ptr = getenv("RUBY_FREE_MIN");

RUBY_FREE_HEAP_MIN is just RUBY_FREE_MIN and the HEAP_SLOTS are gone. I haven't looked if the were in 1.9.2 or 1.8.7.

Ok.  so we have the 3 settings that are floating around the web.

RUBY_HEAP_MIN_SLOTS=800000   # in code default is HEAP_MIN_SLOTS 10000
RUBY_FREE_MIN=100000   # in code default is FREE_MIN  4096
RUBY_GC_MALLOC_LIMIT=79000000  # in code default is GC_MALLOC_LIMIT 8000000

Your first question, assuming you know a bit about C and heaps, is what is a slot and how big is it.  As it turns out it varies.  On some platforms it is as low as 20 bytes packed.

from gc.c

#if defined(_MSC_VER) || defined(__BORLANDC__) || defined(__CYGWIN__)
#pragma pack(push, 1) /* magic for reducing sizeof(RVALUE): 24 -> 20 */
#endif

Doing a quick build on CentOS we can find the size of the RVALUE.

mkdir ~/ruby
cd ~/ruby
wget ftp://ftp.ruby-lang.org/pub/ruby/1.9/ruby-1.9.3-p392.tar.gz
tar -zxvf ruby-1.9.3-p392.tar.gz
# make sure debug is on
CFLAGS='-g -ggdb' ./configure --enable-debug-env   --prefix=/tmp/ruby --disable-install-doc --with-opt-dir=/tmp/ruby/lib
make 
make install
gdb /tmp/ruby/bin/ruby
p sizeof(RVALUE)
$1 = 40

It is 40 bytes on CentOS.  If you want to verify the environment variables I grep'd above, they are in ~/ruby/ruby-1.9.3-p392/gc.c

Allocating 800,000 40 Byte slots would consume 32,000,000 bytes.  This seems a reasonable number as even a basic rails application will quickly consume 32MB.  Yes I know this is < 32MB.  I'm not sure why we're not working with multiples of 1024.

Ok, next what is the RUBY_GC_MALLOC_LIMIT?

One post I was reading said it was the number of structures allocated. If that was the case I think it would just be called RUBY_HEAP_MAX_SLOTS.. Looking at the code it appears to be the size in bytes before garbage collection kicks in.  I haven't done C code in a while so please correct me.

That leaves the RUBY_FREE_MIN setting. I'll read the code later but apparently this is the target number of free slots after GC is run. If not met a new heap is allocated. A heap by the way is 16K if I am reading the code correctly.

Now lets look at the default settings

HEAP_MIN_SLOTS 10000
FREE_MIN  4096
GC_MALLOC_LIMIT 8000000

Without any changes (using the defaults),

  • We will run GC every time 8MB are allocated.
  • The initial heap will have 10,000 slots which consume 400,00 bytes.
  • After each GC run more heap is added if there are less than 4096 slots.

In general, for rails apps these are much too low and should be bumped. But how much?

The settings that have been mentioned before are.

  • RUBY_HEAP_MIN_SLOTS=800000
  • RUBY_FREE_MIN=100000
  • RUBY_GC_MALLOC_LIMIT=79000000

I have a MONSTER rails app so I wanted to see what using similar values would get me.

Running

time bundle exec rails runner 'puts "x"'
results in
34.16user 2.17system 0:36.42elapsed 99%CPU (0avgtext+0avgdata 642832maxresident)

RUBY_HEAP_MIN_SLOTS=800000 RUBY_FREE_MIN=100000 RUBY_GC_MALLOC_LIMIT=89000000 time bundle exec rails runner 'puts "x"'
results in
14.41user 2.18system 0:16.67elapsed 99%CPU (0avgtext+0avgdata 884672maxresident)

Wow. 17 seconds is still pretty miserable but not as bad as 36 seconds.

The issue with setting RUBY_GC_MALLOC_LIMIT high is your app will likely consume more memory as the GC is not running as often.

After playing with the values for a while and looking at my app's memory consumption I came up with these values.

export    RUBY_HEAP_MIN_SLOTS=800000  # Start with 800000 40 byte slots for 32M which is about 2000 heaps
export          RUBY_FREE_MIN=32768  # 80 heaps at a time for about 1MB steps  -- good compromise
export RUBY_GC_MALLOC_LIMIT=30000000  # When to start GC, - app is at 5.6% vs 5.1 memory- start time at 20.5 - 10% mem growth.

This cut my startup time by about 40% and only increased memory usage by about 10%.   If I had more free memory I would go higher on the RUBY_GC_MALLOC_LIMIT.

One more thing of note. When using passenger I don't think a wrapper script is needed anymore..

See http://blog.phusion.nl/2008/12/16/passing-environment-variables-to-ruby-from-phusion-passenger/

I simply added the above variables to /etc/profile, did an apache restart and they were in effect.   I suspect newer versions of passenger load the environment of the user they run as.   I need to confirm.  Perhaps it is something with my setup that allows it to work.

 

 

Running a Rails app behind a Proxy/F5

Jul 24, 2012 Published by Tony Primerano

Recently we moved our Rails 3 application into a new datacenter and as part of that move it was placed behind a F5, a proxy and then another F5. Love that security.

As a result the our certs were moved to the outer F5 and everything inside the datacenter ran on port 80. This actually required a few configuration changes, some of which required me to dig through the rails code to find.

First off. We do some IP whitelisting in parts of our application so we needed X-Forwarded-For (XFF) set in the F5 and proxy. This was not on by default. Now when we got the XFF header there is a string of IP addresses representing the machines that have forwarded it along with the client IP address we needed. Turns out request.remote_ip was returning the proxy or F5 IP instead of the client IP.

To fix this I needed to mark our proxy/F5 IP addresses as trusted.

config.action_dispatch.trusted_proxies='^12.34'

ex: if our datacenter machines were all on 12.34.x.x

See actionpack-3.2.6/lib/action_dispatch/middleware/remote_ip.rb

Next, I discovered that my application links were all http instead of https because the server was running on port 80 now. After digging around the rails code I found the HTTP header I needed the F5 to send that would tell rails to act like it was running on 443.

X-Forwarded-SSL on

Not true, not 1. It must be on. See rack-1.4.1/lib/rack/request.rb

That's all. Hope I save someone some time. :-)

How often do my Rackspace Cloud servers have issues?

Apr 20, 2012 Published by Tony Primerano

I have about 6 cloud server instances running and they seem to have problems too often. But I never actually listed them out, so going back through my cloudkick and rackspace emails here's what I have for the past 6 months or so.

I have emails going back through Fall of 2009. There is rarely a month without an outage. While most are short, this level of uptime is probably unacceptable to many businesses.

Oddly enough, my server with the highest level of traffic has not died yet. I'm not sure if that is because it is on newer hardware or if something in the OpenStack system doesn't like my mostly idle boxes.

One of the failures earlier this year left me with a corrupted filesystem. While restoring from a backup is simple enough it took a little while to get my Master/Slave replication on Mysql working again. (Slave was ahead of the master). The rest of these issues took no work on my part other than pinging the RS team if they weren't already working on the issue.

Rack-Cache on Rails 3 - the unadvertised caching option

Mar 29, 2012 Published by Tony Primerano

Today I was poking around in the rails tmp/cache directory and I saw an entry I did not expect. It appeared that a controller action was being cached but I was not using, page, action or fragment caching.

I was setting an expiration header on the action response and as it turns out there is a handy gem in rails now called rack-cache. Rack-cache will create cache entries for items that you set future a expiration for. Since this isn't mentioned in the Rails Caching Guide it took me a little while to track it down.

Here's a quick example of it in action

Create a dummy app

rails new rack-cache
cd rack-cache
bundle exec rails generate controller RackCache cache_this

Edit the cache_this method

class RackCacheController < ApplicationController
def cache_this
    render :text => Time.zone.now.to_s
  end
end

Fire it up in production mode so caching is enabled

RAILS_ENV=production bundle exec rails s -p6666

When you hit the page in the browser and hit reload you'll see the time change

http://localhost:6666/rack_cache/cache_this

2012-03-29 00:40:09 UTC
2012-03-29 00:40:29 UTC

Now lets change the action slightly

class RackCacheController < ApplicationController
  def cache_this
    expires_in(5.minutes, :public => true)
    render :text => "The time is #{Time.zone.now.to_s}"
  end
end

Now each time you reload the page you get the same time

The time is 2012-03-29 00:58:07 UTC
The time is 2012-03-29 00:58:07 UTC

Without a reload the browser won't bother asking for the resource for another 5 minutes.  With the reload we get a 304 message.

Normally a simple clear of your browser cache would get you a new time but Rack-Cache also cached this on the server.  So clear all you want, you'll not get an update until 5 minutes after the time in the window.   If there are 10 folks hitting this page they will all see the same time.   The 1st person to hit it after expiration will update the time.

The time is 2012-03-29 01:04:01 UTC

So where is the output cached?  By default Rails used the ActiveSupport::Cache::FileStore which lives in tmp/cache -- configurable via config.cache_store.

ls -tr tmp/cache/
 assets    B15    2A5    A92    B2C    B31
 # B31 is newest dir
 find tmp/cache/B31
 tmp/cache/B31
 tmp/cache/B31/A20
 tmp/cache/B31/A20/1c4fa8db52f778f10f535fe2253f3a63f92fd87b

The cached response is in

tmp/cache/B31/A20/1c4fa8db52f778f10f535fe2253f3a63f92fd87b

cat tmp/cache/B31/A20/1c4fa8db52f778f10f535fe2253f3a63f92fd87b
 o: ActiveSupport::Cache::Entry    :@compressedF:@expires_in0:@created_atf1332983041.303097:
 @value[I"(The time is 2012-03-29 01:04:01 UTC:EF

This is a pretty powerful caching option that Rails developers should understand and use when appropriate.    In some cases, not knowing this feature was in place has broken applications. Maybe I'll edit the Rails Guide to include this information.

I noticed that rack-cache was filling our apache error log with entries

cache: [GET /somepath] miss
cache: [GET /anotherpath/logo?1333186605] fresh
cache: [POST /yet/another/path] invalidate, pass

Turn off the verbose option by adding this to your production.rb

config.action_dispatch.rack_cache =  {:metastore=>"rails:/", :entitystore=>"rails:/", :verbose=>false}

To disable rack-cache altogether just do this

config.action_dispatch.rack_cache =  nil

Some useful links

Simple logic problems from interviews

Jun 30, 2011 Published by Tony Primerano

I love logic/algorithm problems but during an interview I can sometime just blank out on these questions.   I don't feel like taking a lot of time but at the same time some of the problems are quite complex if not approached in the right way.

Problem 1:

You have an array that contains 0s and 1s.   Order the array.

Now if memory and performance weren't a factor this is as simple as doing a ones_and_zeros.sort in ruby (where ones_and_zeros is the array).

You could also build an array of 0s and 1s and then merge them but if the array huge and taking up too much memory this is less than ideal.   You could also add up all the 1s and build a new array but then you are also building a second array and iterating twice.

So the correct answer is to do this inline.

My solution is to loop through the array, replacing any 0 with a 1 and putting 0s on the front of the array.  There is the main loop index and then the zero index that points to the location of the last 0.

 

def sort_it!(ones_and_zeros)
 zero_index = -1
 for i in 0 .. ones_and_zeros.size
   if ones_and_zeros[i] == 0
     zero_index +=1
     unless zero_index == i
       ones_and_zeros[zero_index]=0
       ones_and_zeros[i] = 1
     end
   end
 end
 return ones_and_zeros
end

This works fine but I got stuck and came back to this problem later.  :-\   I should probably google this and see if there is a better way.

Problem 2:

Next problem was..  you have 5 bottles of pills,  1 has bad pills.  The bad bills weigh 9oz and good pills are 10oz.  You can use the scale once.  You can weigh pills, bottles or any combination but you can only use the scale once.

Now you could weigh pills 3 pills from different bottles and determine narrow down to 2 or 3 which bottle has the bad pills but we need to know which one has the bad pills.

After what seemed like an eternity I figured it out.   put 1 pill from bottle #1, 2 from #2, 3 from #3, etc on the scale.   if all the batches are were good these 15 pills would weigh 150oz.  but there are bad pills.  If they were in bottle #1 it would weigh 149oz, in #2 148oz. etc.   Isn't this fun!    It is fun unless you are under a the pressure of a interview...  that said I guess this tests how people react under pressure as well as their logic skills.

Final problem

you need to determine what the highest floor on a 100 story building that you can drop a marble from before it will shatter on the ground.   You have 2 marbles to work with and you need to minimize the number of tries.   You could go floor 1, 2,3,4, etc but then worse case you have made 100 tries.

I immediately started thinking about dividing the problem like a binary tree.  Try floor 50, then 75 if no breakage, then 62 if it broke but you only have 2 marbles so the worse case is 50 tries.  start at 50.  breaks,  go back to 1 and progress to 49.

Now what if I moved up 5 floors at a time.  Worse case is 99 with 24 tries.   5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100,96,97,98,99

Ok, how about 10s?  10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 91, 92, 93,94,95,96,97,98,99

19 tries.  better

There must be a formula here..  assuming a fixed increment we have ceiling(floors/increment) + increment -1

This seems to work ceiling(100/10) + 9 = 19 and ceiling(100/5) + 4 = 24 check out.

Ok,  graph this thing and find the lowest point..  turns out with this technique 19 tries is the best by using 10 as the fixed increment.

But the real answer is 14 using an increment that changes.   I'm out of time for today but I'll probably dig into that one while i sleep.  :-)

 

 

 

Is Ubuntu Server winning over CentOS folks?

Jun 28, 2011 Published by Tony Primerano

For years I have used Ubuntu for my desktop environment and CentOS in production.  Why?   Ubuntu makes a great desktop distro and since CentOS is basically a copy of Red Hat, it is considered an enterprise OS.

The trouble with being an Enterprise OS is you avoid the latest updates to the OS and aggressively patch proven packages.   CentOS has worked fine for me up until recently and I suspect all my future deployments will use Ubuntu.  CentOS packages are lagging behind and this lag is causing pain.  Here are some examples of my recent pain points.

Every 3 months on the dot I fail my PCI compliance scan with the following error.

OpenSSH 4.3 is vulnerable Severity: Critical Problem

OpenSSH is up to version 5.8 but RedHat keeps patching 4.3.  It is totally secure, it has the latest patches but every 3 months I need to contact the scan company and prove that I have a patched release.  Not fun.

CentOS is using gcc 4.1.2.  Gcc 4.1.2 was released in 2007 and many tools are requiring newer versions to work.  Most recently I tried using opscode/chef and while the site says it works with CentOS you'll need to update the compiler to 4.2 or higher.  This defeats the purpose of using Chef IMO.

I also find myself building things like git on CentOS that are part of the standard repository on Ubuntu.  Sure, I can start adding random repositories to get these things but I'd rather work with an OS that has them in the default/supported repository.

I've talking with colleagues at several other companies over the past few weeks and several are using Ubuntu Server or are planning on getting off CentOS in the near future.   A side note on Rails from my talks,  there seems to be little excitement about CoffeeScript or Sass in Rails 3.1 (just learn css and js already) and folks prefer test-unit and shoulda over rspec.    I totally agree with this sentiment.  :-)

 

 

Getting Rails/Nginx installed on Ubuntu 10.10

Apr 10, 2011 Published by Tony Primerano

I did a fresh install of Ubuntu today and figured I would share the steps and packages necessary to get Ruby on Rails running.    I always miss a package or 2 and need to go back.   Here is what worked for me today.

Rails3 / Ubuntu 10.10 / MySql / Nginx / Ruby 1.9.2

Core packages

  
sudo apt-get install curl # RVM needs it and it is good to have
sudo apt-get install libcurl3-dev  # needed by several gems and nginx i think
sudo apt-get install git # RVM needs it and it is good to have
# Packages needed by rails and some popular Gems (also ssl for nginx)
sudo  apt-get install build-essential bison openssl libreadline6  libreadline6-dev curl git-core zlib1g zlib1g-dev libssl-dev libyaml-dev  libsqlite3-0 libsqlite3-dev sqlite3 libxml2-dev libxslt-dev autoconf  libc6-dev ncurses-dev
# MySql
sudo apt-get install libmysqlclient-dev  sudo apt-get install mysql-server
# needed by nginx
sudo apt-get install libpcre3-dev  #nginx

Install RVM

bash < <(curl -s https://rvm.beginrescueend.com/install/rvm)
#fix .bashrc then
rvm install 1.9.2  # or whatever ruby version you like
rvm --default use 1.9.2
gem install rails

Install Passenger/Nginx with SSL

Grab Nginx tarball

cd /tmp
wget http://sysoev.ru/nginx/nginx-0.8.54.tar.gz
tar -zxvf nginx-0.8.54.tar.gz

# install the gem
gem install passenger

# build Nginx
rvmsudo passenger-install-nginx-module
  • Watch Phusion Passenger do its thing and when it asks you “Automatically download and install Nginx?”, answer 2
  • Specify the directory where you unzipped the nginx source code (Please specify the directory: /tmp/nginx-0.8.54))
  • Specify the directory where you want to install nginx to (/usr/local/nginx in my case)

You'll need a init script for nginx.  Get it here and follow directions

http://wiki.nginx.org/Nginx-init-ubuntu

To start Nginx use

sudo /etc/init.d/nginx start

Now at this point your nginx.conf needs some changes.  You need to point to your Rails apps and setup passenger.

Here is my http section for /usr/local/nginx/conf/nginx.conf .. this is a dev setup.  don't read too much into it.

 

http {
 passenger_root /home/tony/.rvm/gems/ruby-1.9.2-p180/gems/passenger-3.0.6;
 passenger_ruby /home/tony/.rvm/wrappers/ruby-1.9.2-p180/ruby;
 passenger_user_switching on;
include       mime.types;
 default_type  application/octet-stream;

 log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
 '$status $body_bytes_sent "$http_referer" '
 '"$http_user_agent" "$http_x_forwarded_for" "$http_host"';

 access_log  /tmp/nginx.access.log  main;
 sendfile        on;
 keepalive_timeout  65;

 server {
 listen       *:80;
 server_name  localhost;
 rails_env development;
 root   /<path to my rails app>/public;
  if ($request_method !~ ^(GET|HEAD|POST|PUT|DELETE)$ ) {
   return 444;  # block requests that Rails doesn't handle
  }
passenger_enabled on;
}
# HTTPS server
server {
  listen       443;
  server_name  localhost;
  rails_env development;
  root   /<path to my rails app>/public;
  if ($request_method !~ ^(GET|HEAD|POST|PUT|DELETE)$ ) {
    return 444;  # block requests that Rails doesn't handle
  }
  ssl                  on;
  ssl_certificate      local.crt;
  ssl_certificate_key  local.key;
  ssl_session_timeout  5m;
  ssl_protocols  SSLv2 SSLv3 TLSv1;
  ssl_ciphers  ALL:!ADH:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv2:+EXP;
  ssl_prefer_server_ciphers   on;
  passenger_enabled on;
  }
}

restart your Nginx if it is already running

sudo /etc/init.d/nginx restart

That's all..  hopefully this helps someone.  :-)   My nginx.conf is much longer so hopefully I didn't cut out anything too important.

Setting up Sunspot/Solr for OR queries, stemming and lower memory usage

Jan 6, 2011 Published by Tony Primerano

As I keep finding in Rails 3, the Gems I used in Rails 2 no longer work or have fallen out of favor.   In Rails 2 acts_as_ferret met my searching needs but after submitting some fixes for Rails 3 and Ruby 1.9.2, I was still having issues so I moved on to Sunspot.

One of the 1st things I wanted to change with Sunspot was to make the default boolean operator OR.   This means when someone searches for "car window" they will get results that match car or window.

Not being a Solr expert my 1st thought was that all I needed to do was change

<solrQueryParser defaultOperator="AND"/>

to

<solrQueryParser defaultOperator="OR"/>

But it didn't work.   After some research and digging through the logs I learned that Sunspot is using the dismax request handler.  To make a long story short, dismax ignores the defaultOperator and uses a minimum_match field.   The good news here is that setting this field to 1 in your search query is easy and gives you the same function as  defaultOperator="OR".

In your controller your search would look something like this.

@articles = Article.search do
  keywords(actual_search) {minimum_match 1}
end

Next thing I wanted was for car searches to return results for cars and other stems.   This required a 1 line change in schema.xml

In the <analyzer> block just add <filter class="solr.SnowballPorterFilterFactory" language="English" />

      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StandardFilterFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English" />
      </analyzer>

Finally, because the model I am searching is small and Java eats quite a bit of memory I wanted to reduce the Solr server's memory footprint.  This may come back to bite me as my dataset grows but for now this is working fine.  To adjust the memory parameters used when using rake sunspot:solr:start just edit your sunspot.yml file and add min_memory and max_memory lines.

development:
  solr:
    hostname: localhost
    port: 8982
    log_level: DEBUG
    min_memory: 64M
    max_memory: 64M

This will result in -Xms64M -Xmx64M being sent to java on startup.

      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StandardFilterFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English" />
      </analyzer>

My Wordpress Notes

Feb 23, 2010 Published by Tony Primerano

While I've been using Wordpress for my blog for over 5 years I dig into the internals so rarely that I often forget some of the cool things I've learned and done. In addition to my blog I've used it more like a CMS system where there were only pages and no posts, in this case the site used a static front page. I haven't written any themes from scratch but I'm pretty comfortable hacking other people's themes and adding new features.

I should probably do this on my wiki but I'm going to try putting some useful notes here.

Viewing database queries

Recently I tried out a calendar plugin that allowed you to enter and display future events.  For some reason it only let you show events 99 days in the future. I hacked the code to let me use 3 digits and it took forever to load my events so I added the following to the bottom of my footer.php (before </body>)

<?php
if (current_user_can('administrator')){
 global $wpdb;
 echo "<pre>";
 print_r($wpdb->queries);
 echo "</pre>";
}
?>

This dumps all database queries that are made to the bottom of the page.  Turned out that the plugin was querying the database once for each date in the future.  So when I set it to 200 it was making over 200 queries.  Rather than figure out why it was coded tis way I switched to using The Events Calendar Plugin.  It lets you specify if a post is an event and then you can add details.  It works well.

Header and Footer snippets

I had an application that I wanted to use the header and footer from a live wordpress blog.  Any time the blog header/footer was updated the application would update too.  The idea here is the wordpress and application maintain the same look and feel and  navigation.

To do this I created a header.php and footer.php file in my wordpress root directory (where wp-config.php is).  The files started with

<?php
require('./wp-blog-header.php');
?>

And then I stole code from the theme header.php and footer.php files to build the code snippets needed for the header and footer for my application.  I then did server side pulls of these live files, cached them and displayed them as my application header and footer.

I guess instead of copying the code I should try to factor out the common code between the theme files and these files so I don't repeat it.  Has anyone done this?  I haven't tried it yet.

No HTTPS?  Use OpenId

I hate sending passwords in the clear but often the cost of SSL is more than the cost of hosting your site so we just do without.  Personally I always link my openId to my main account and use that to sign in.  My assumption is this is much more secure than sending credentials in the clear.  I use the OpenID plugin for this.  I'm happy that Google now has OpenId support via their profiles.

Caching

DB Cache Reloaded or WP Super Cache?

I've used both although there are times when I think a database query is faster than building all these local files.  For low traffic sites does the cache help or hurt?  I don't have any stats here.  Just thinking out loud.  :-)

Broken Link Checker

Some sites have a lot of links.  This plugin lets you know when you're pointing to a dead end.  Very handy.  http://wordpress.org/extend/plugins/broken-link-checker/

Akismet

Install this or spend all day deleting spam comments.

Backups

Moving my homeowners association web site from static files to wordpress allowed us to have multiple editors but it makes backing up the system much more difficult.  Before the site consisted of about 80 files.   Now there is a database and many more things can go wrong.   I had downloaded a wordpress backup plugin a few months ago but it wasn't ready for prime time.   At the moment my backup consists of daily mysqldumps via cron that reside with my wordpress installation.  I then rsync that to my local machine (the wordpress install with db backups).   It's not perfect but it works for now. I guess I could share these scripts in another post.

One more thing

I forgot what I was going to say,  maybe it will come to me later tonight.   This is why I should have done this on the wiki.  :-)  I hope someone finds this useful.