
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Tony&#039;s Place &#187; rails</title>
	<atom:link href="http://blog.tonycode.com/archives/category/rails/feed" rel="self" type="application/rss+xml" />
	<link>http://blog.tonycode.com</link>
	<description>Random thoughts</description>
	<lastBuildDate>Wed, 01 Feb 2012 01:12:52 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Setting up Sunspot/Solr for OR queries, stemming and lower memory usage</title>
		<link>http://blog.tonycode.com/archives/192</link>
		<comments>http://blog.tonycode.com/archives/192#comments</comments>
		<pubDate>Thu, 06 Jan 2011 18:15:04 +0000</pubDate>
		<dc:creator>Tony Primerano</dc:creator>
				<category><![CDATA[rails]]></category>
		<category><![CDATA[tech]]></category>

		<guid isPermaLink="false">http://www.tonycode.com/blog/?p=192</guid>
		<description><![CDATA[As I keep finding in Rails 3, the Gems I used in Rails 2 no longer work or have fallen out of favor.   In Rails 2 acts_as_ferret met my searching needs but after submitting some fixes for Rails 3 and Ruby 1.9.2, I was still having issues so I moved on to Sunspot. One of [...]]]></description>
			<content:encoded><![CDATA[<p>As I keep finding in Rails 3, the Gems I used in Rails 2 no longer work or have fallen out of favor.   In Rails 2 <a href="https://github.com/jkraemer/acts_as_ferret" target="_blank">acts_as_ferret</a> met my searching needs but after submitting some fixes for Rails 3 and Ruby 1.9.2, I was still having issues so I moved on to <a href="https://github.com/outoftime/sunspot" target="_blank">Sunspot</a>.</p>
<p>One of the 1st things I wanted to change with Sunspot was to make the default boolean operator OR.   This means when someone searches for &#8220;car window&#8221; they will get results that match car or window.</p>
<p>Not being a Solr expert my 1st thought was that all I needed to do was change</p>
<pre>&lt;solrQueryParser defaultOperator="AND"/&gt;</pre>
<p>to</p>
<pre>&lt;solrQueryParser defaultOperator="OR"/&gt;</pre>
<p>But it didn&#8217;t work.   After some research and digging through the logs I learned that Sunspot is using the dismax request handler.  To make a long story short, dismax ignores the defaultOperator and uses a minimum_match field.   The good news here is that setting this field to 1 in your search query is easy and gives you the same function as  defaultOperator=&#8221;OR&#8221;.</p>
<p>In your controller your search would look something like this.</p>
<pre>@articles = Article.search do
  keywords(actual_search) {minimum_match 1}
end</pre>
<p>Next thing I wanted was for car searches to return results for cars and other stems.   This required a 1 line change in schema.xml</p>
<p>In the &lt;analyzer&gt; block just add &lt;filter class=&#8221;solr.SnowballPorterFilterFactory&#8221; language=&#8221;English&#8221; /&gt;</p>
<pre>      &lt;analyzer&gt;
        &lt;tokenizer class="solr.StandardTokenizerFactory"/&gt;
        &lt;filter class="solr.StandardFilterFactory"/&gt;
        &lt;filter class="solr.LowerCaseFilterFactory"/&gt;
        &lt;filter class="solr.SnowballPorterFilterFactory" language="English" /&gt;
      &lt;/analyzer&gt;
</pre>
<p>Finally, because the model I am searching is small and Java eats quite a bit of memory I wanted to reduce the Solr server&#8217;s memory footprint.  This may come back to bite me as my dataset grows but for now this is working fine.  To adjust the memory parameters used when using rake sunspot:solr:start just edit your sunspot.yml file and add min_memory and max_memory lines.</p>
<pre>development:
  solr:
    hostname: localhost
    port: 8982
    log_level: DEBUG
    min_memory: 64M
    max_memory: 64M
</pre>
<p>This will result in -Xms64M -Xmx64M being sent to java on startup.</p>
<pre id="_mcePaste" style="position: absolute; left: -10000px; top: 434px; width: 1px; height: 1px; overflow: hidden;">      &lt;analyzer&gt;
        &lt;tokenizer class="solr.StandardTokenizerFactory"/&gt;
        &lt;filter class="solr.StandardFilterFactory"/&gt;
        &lt;filter class="solr.LowerCaseFilterFactory"/&gt;
        &lt;filter class="solr.SnowballPorterFilterFactory" language="English" /&gt;
      &lt;/analyzer&gt;</pre>
]]></content:encoded>
			<wfw:commentRss>http://blog.tonycode.com/archives/192/feed</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>My sitemap notes are in the Advanced Rails Recipes Book</title>
		<link>http://blog.tonycode.com/archives/72</link>
		<comments>http://blog.tonycode.com/archives/72#comments</comments>
		<pubDate>Sat, 08 Mar 2008 23:05:01 +0000</pubDate>
		<dc:creator>Tony Primerano</dc:creator>
				<category><![CDATA[rails]]></category>
		<category><![CDATA[sitemap]]></category>
		<category><![CDATA[tech]]></category>
		<category><![CDATA[code]]></category>

		<guid isPermaLink="false">http://www.tonycode.com/blog/archives/72</guid>
		<description><![CDATA[After I blogged about building a sitemap for Rails I contacted Mike Clark and asked him if he thought it would make a good Recipe for his upcoming book, Advanced Rails Recipes. He thought it was a good fit and it is currently in the Beta version of the book. I wrote up my notes [...]]]></description>
			<content:encoded><![CDATA[<p>After I <a target="_blank" href="http://www.tonycode.com/blog/archives/68">blogged about building a sitemap</a> for Rails I contacted <a target="_blank" href="http://clarkware.com/about.html">Mike Clark</a> and asked him if he thought it would make a good Recipe for his upcoming book, <a target="_blank" href="http://www.pragprog.com/titles/fr_arr">Advanced Rails Recipes</a>.  He thought it was a good fit and it is currently in the Beta version of the book.</p>
<p>I wrote up my notes in the Recipe format, then Mike basically rewrote it for Rails 2.0 and added some additional content.  Thanks Mike!  I almost feel bad being cited as the author since after editing it is drastically different from the original.  <img src='http://blog.tonycode.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p>The core concepts are still there and some thoughts were dropped since Recipes should be short.   So, Here are some elaborations..</p>
<p><strong>The Ping Protocol</strong></p>
<p>There is a warning in the book about excessively pinging to Google to have them read your sitemap.  I would recommend letting search engines crawl your sitemaps at their own speed.  The ping example in the book was a nice overview of when to use an Observer and also provided complete coverage on how to submit sitemaps.  Please use ping sparingly, if at all.  <img src='http://blog.tonycode.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p><strong>Sitemaps with over 50,000 entries</strong></p>
<p>I work on sites where we use siteindex files because we submit well over 50,000 URLs to the search engines. I didn&#8217;t provide an example on how to build these in Rails because I&#8217;m not sure they provide any value to the typical site.</p>
<p>My theory is that if you build a sitemap with the 50,000 pages that were most recently updated you will give the search engines all they need.  If a page isn&#8217;t updated for a while and it falls off the list is that really a problem?   If the page was worth anything someone externally would be linking to it before it fell off the list.  Now if your site is creating millions of pages a day this may not be the case.</p>
<p>If your pages are islands (no links to them) and you&#8217;re afraid they won&#8217;t be found unless they are all in the sitemap, I would suggest building the sitemap via a rake task that is kicked off via a cron job.  This will also give you an opportunity to gzip the files.   I&#8217;ll try to writeup some example code for the this when I find some free time.</p>
<p><strong>Do I really need a sitemap?</strong></p>
<p>If your site has navigation to all its pages, then a sitemap will probably not benefit you.   I suggest checking what pages the search engines have in their index and if key content is missing then pursue a sitemap.  Even if they are finding all your pages a sitemap certainly couldn&#8217;t hurt.</p>
<p>Just in case you didn&#8217;t know how to find the pages Google knows about on your site you can simply type site:youdomain.com in the Google or Yahoo search box.</p>
<p>Example results for my site are <a target="_blank" href="http://www.google.com/search?q=site%3Atonycode.com">here</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.tonycode.com/archives/72/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

