Setting up Sunspot/Solr for OR queries, stemming and lower memory usage

by on Jan.06, 2011, under rails, tech

As I keep finding in Rails 3, the Gems I used in Rails 2 no longer work or have fallen out of favor.   In Rails 2 acts_as_ferret met my searching needs but after submitting some fixes for Rails 3 and Ruby 1.9.2, I was still having issues so I moved on to Sunspot.

One of the 1st things I wanted to change with Sunspot was to make the default boolean operator OR.   This means when someone searches for “car window” they will get results that match car or window.

Not being a Solr expert my 1st thought was that all I needed to do was change

<solrQueryParser defaultOperator="AND"/>

to

<solrQueryParser defaultOperator="OR"/>

But it didn’t work.   After some research and digging through the logs I learned that Sunspot is using the dismax request handler.  To make a long story short, dismax ignores the defaultOperator and uses a minimum_match field.   The good news here is that setting this field to 1 in your search query is easy and gives you the same function as  defaultOperator=”OR”.

In your controller your search would look something like this.

@articles = Article.search do
  keywords(actual_search) {minimum_match 1}
end

Next thing I wanted was for car searches to return results for cars and other stems.   This required a 1 line change in schema.xml

In the <analyzer> block just add <filter class=”solr.SnowballPorterFilterFactory” language=”English” />

      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StandardFilterFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English" />
      </analyzer>

Finally, because the model I am searching is small and Java eats quite a bit of memory I wanted to reduce the Solr server’s memory footprint.  This may come back to bite me as my dataset grows but for now this is working fine.  To adjust the memory parameters used when using rake sunspot:solr:start just edit your sunspot.yml file and add min_memory and max_memory lines.

development:
  solr:
    hostname: localhost
    port: 8982
    log_level: DEBUG
    min_memory: 64M
    max_memory: 64M

This will result in -Xms64M -Xmx64M being sent to java on startup.

      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StandardFilterFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English" />
      </analyzer>

8 Comments for this entry

Leave a Reply