Setting up Sunspot/Solr for OR queries, stemming and lower memory usage
by Tony Primerano on Jan.06, 2011, under rails, tech
As I keep finding in Rails 3, the Gems I used in Rails 2 no longer work or have fallen out of favor. In Rails 2 acts_as_ferret met my searching needs but after submitting some fixes for Rails 3 and Ruby 1.9.2, I was still having issues so I moved on to Sunspot.
One of the 1st things I wanted to change with Sunspot was to make the default boolean operator OR. This means when someone searches for “car window” they will get results that match car or window.
Not being a Solr expert my 1st thought was that all I needed to do was change
<solrQueryParser defaultOperator="AND"/>
to
<solrQueryParser defaultOperator="OR"/>
But it didn’t work. After some research and digging through the logs I learned that Sunspot is using the dismax request handler. To make a long story short, dismax ignores the defaultOperator and uses a minimum_match field. The good news here is that setting this field to 1 in your search query is easy and gives you the same function as defaultOperator=”OR”.
In your controller your search would look something like this.
@articles = Article.search do
keywords(actual_search) {minimum_match 1}
end
Next thing I wanted was for car searches to return results for cars and other stems. This required a 1 line change in schema.xml
In the <analyzer> block just add <filter class=”solr.SnowballPorterFilterFactory” language=”English” />
<analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StandardFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.SnowballPorterFilterFactory" language="English" /> </analyzer>
Finally, because the model I am searching is small and Java eats quite a bit of memory I wanted to reduce the Solr server’s memory footprint. This may come back to bite me as my dataset grows but for now this is working fine. To adjust the memory parameters used when using rake sunspot:solr:start just edit your sunspot.yml file and add min_memory and max_memory lines.
development: solr: hostname: localhost port: 8982 log_level: DEBUG min_memory: 64M max_memory: 64M
This will result in -Xms64M -Xmx64M being sent to java on startup.
<analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StandardFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.SnowballPorterFilterFactory" language="English" /> </analyzer>
February 25th, 2011 on 1:03 pm
Hey Tony, thanks a lot for this blogpost!
I was trying to set up Solr with the AND query you’re describing, but it didn’t work by changing the solrconfig. {minimum_match 1} fixed it, so that’s really cool!
Thanks
July 12th, 2011 on 3:18 am
Hello, thank you for this article. I wanted OR in query too without success. This article helped me.
September 15th, 2011 on 2:55 pm
Wow Thank you very very much for this!
November 5th, 2011 on 2:31 pm
Amazing. Just the two things I’ve been scouring the web for. Where do you go to find this stuff? I can’t find a good API reference anywhere.
November 7th, 2011 on 10:58 pm
Hey John, It has been a while since I wrote this so I don’t recall where I got all the information.
I suspect it was a combination of reading the code, posting to a listserv (or 2) and a bit of googling.
I’m glad I posted my notes as several folks seem to be benefiting from them.
December 15th, 2011 on 7:59 am
Thanks a lot man… I was just about to build 10 dismaxes for 10 different languages. This saved me a lot of work as a user experience after all is not so bad
April 17th, 2013 on 4:02 am
Thanks a lot, this info saved me a lot of time.