Setting up Sunspot/Solr for OR queries, stemming and lower memory usage

Jan 6, 2011 Published by Tony Primerano

As I keep finding in Rails 3, the Gems I used in Rails 2 no longer work or have fallen out of favor.   In Rails 2 acts_as_ferret met my searching needs but after submitting some fixes for Rails 3 and Ruby 1.9.2, I was still having issues so I moved on to Sunspot.

One of the 1st things I wanted to change with Sunspot was to make the default boolean operator OR.   This means when someone searches for "car window" they will get results that match car or window.

Not being a Solr expert my 1st thought was that all I needed to do was change

<solrQueryParser defaultOperator="AND"/>


<solrQueryParser defaultOperator="OR"/>

But it didn't work.   After some research and digging through the logs I learned that Sunspot is using the dismax request handler.  To make a long story short, dismax ignores the defaultOperator and uses a minimum_match field.   The good news here is that setting this field to 1 in your search query is easy and gives you the same function as  defaultOperator="OR".

In your controller your search would look something like this.

@articles = do
  keywords(actual_search) {minimum_match 1}

Next thing I wanted was for car searches to return results for cars and other stems.   This required a 1 line change in schema.xml

In the <analyzer> block just add <filter class="solr.SnowballPorterFilterFactory" language="English" />

        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StandardFilterFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English" />

Finally, because the model I am searching is small and Java eats quite a bit of memory I wanted to reduce the Solr server's memory footprint.  This may come back to bite me as my dataset grows but for now this is working fine.  To adjust the memory parameters used when using rake sunspot:solr:start just edit your sunspot.yml file and add min_memory and max_memory lines.

    hostname: localhost
    port: 8982
    log_level: DEBUG
    min_memory: 64M
    max_memory: 64M

This will result in -Xms64M -Xmx64M being sent to java on startup.

        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StandardFilterFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English" />
  • A photo of Tom Tom says:

    Hey Tony, thanks a lot for this blogpost! I was trying to set up Solr with the AND query you're describing, but it didn't work by changing the solrconfig. {minimum_match 1} fixed it, so that's really cool! Thanks :)

  • A photo of Tomas Tomas says:

    Hello, thank you for this article. I wanted OR in query too without success. This article helped me.

  • Wow Thank you very very much for this!

  • A photo of John Wang John Wang says:

    Amazing. Just the two things I've been scouring the web for. Where do you go to find this stuff? I can't find a good API reference anywhere.

  • Hey John, It has been a while since I wrote this so I don't recall where I got all the information. :-) I suspect it was a combination of reading the code, posting to a listserv (or 2) and a bit of googling. I'm glad I posted my notes as several folks seem to be benefiting from them.

  • A photo of Jakub Godawa Jakub Godawa says:

    Thanks a lot man... I was just about to build 10 dismaxes for 10 different languages. This saved me a lot of work as a user experience after all is not so bad :D

  • A photo of santosh santosh says:

    Thanks a lot, this info saved me a lot of time.

  • A photo of Relicset Relicset says:

    I need to search each and every keyword using rails sunspot solr,But lots of problem solving it. Here is my SO link which describes my problem can you solve it.