LRBlog

Logical Reality Design: Web Design and Software Development

Fixing problems with sphinx search

July 24, 2008

I've been working a lot this week with sphinx and ultrasphinx on a project that's a fork of Insoshi.    Insoshi is in the process of switching search from ferret to sphinx, and sphinx has been integrated into the Insoshi edge branch.

I've had dozens of problems, in fact it's fair to say I've spent upwards of 15 hours just debugging ultrasphinx and getting my tests to pass.   There were several problems; here are the main three and how I fixed each one.

This should be useful to anyone upgrading Insoshi to the sphinx version, or to anyone else trying to get ultrasphinx working in their Rails project. I definitely don't recommend starting with this post if you're just starting out with sphinx. Instead, go read this much better introductory tutorial from the guys over at Insoshi. Then if you have problems, come back here and you may find solutions.

Getting search tests (or specs) to pass with sphinx

This one is pretty simple, in retrospect, but it can be frustrating and opaque if you are used to ferret.  Unlike ferret, sphinx (at least via ultrasphinx) runs only via a daemon.   Where acts_as_ferret uses a daemon only for the production environment and just accesses the index files directly in test or development, ultrasphinx can only get to the indexes through the daemon.

So, to run your tests, you just build up the indexes for test and run them.  In this case, I'm running the specs for Insoshi's searches controller:

From the command line in $RAILS_ROOT:

rake db:test:prepare
rake ultrasphinx:configure RAILS_ENV=test
rake ultrasphinx:index RAILS_ENV=test
rake ultrasphinx:daemon:start RAILS_ENV=test
script/spec spec/controllers/searches_controller_spec.rb

The problem, of course, is that it doesn't work!   The reason is that db:test:prepare creates the structure of your database, but doesn't load any of your fixtures as data: the test db is empty..  So when you run the index command, an empty index is built.   You can see this from the output of that first index command, which will look something like this:

collected 0 docs, 0.0 MB
total 0 docs, 0 bytes
total 0.078 sec, 0.00 bytes/sec, 0.00 docs/sec

Ultrasphinx has built an empty index.

The solution

The solution, believe it or not, is to run the tests, let them fail, re-index, and run the tests again (Many thanks to Long Nguyen at Insoshi for helping me figure this one out):

rake db:test:prepare
rake ultrasphinx:configure RAILS_ENV=test
rake ultrasphinx:index RAILS_ENV=test
rake ultrasphinx:daemon:start RAILS_ENV=test
script/spec spec/controllers/searches_controller_spec.rb   #FAIL!!
rake ultrasphinx:index RAILS_ENV=test
script/spec spec/controllers/searches_controller_spec.rb   #PASS!!

The first attempt to run the specs loads the fixtures, and leaves them in the database, thus letting the subsequent index command build an actual index.

Running sphinx for both test and development environments at the same time

The next big challenge was enabling behavior-driven development. I like to work with autotest and growl running constantly in the background. But this was tough to do with sphinx, because the daemon needed to be stopped and re-started, and the index re-created for each environment, alternately running all of the above commands either with or without RAILS_ENV=test.

The solution is to set up your ultrasphinx base configuration to completely separate both the test and development indexes and to let the daemons for the two environments listen on different ports. I had tried something like this and come close, but not quite, when Long at Insoshi again bailed me out. You need to change the port (in two places), and the paths of the logs, pidfile, and index directories so that test and development daemons are using entirely separate resources. Here's a diff of my test.conf and default.conf:

33c33
<   port = 3312
---
>   port = 3322
35c35
<   log = log/searchd.log
---
>   log = log/searchd_test.log
39c39
<   pid_file = log/searchd.pid
---
>   pid_file = log/searchd_test.pid
50c50
<   server_port = 3312
---
>   server_port = 3322
57c57
<   sql_range_step = 5000   
---
>   sql_range_step = 999999999   
64c64
<   path = sphinx
---
>   path = sphinx_test

The sql_range_step is related to the next issue, which is that sphinx does not play well with foxy fixtures. Anyway, make the above changes and you should be able to run test and development sphinx daemons at the same time:

rake db:test:prepare
rake ultrasphinx:configure
rake ultrasphinx:configure RAILS_ENV=test
rake ultrasphinx:index
rake ultrasphinx:index RAILS_ENV=test
rake ultrasphinx:daemon:start
rake ultrasphinx:daemon:start RAILS_ENV=test

If it worked, you should see separate indexes in $RAILS_ROOT/sphinx and $RAILS_ROOT/sphinx_test, and two daemons running, which you can confirm with ps waux | grep searchd:

evan      1339   0.0  0.0    78100    292 s000  S     5:37PM   0:00.52 searchd --config <YOUR_RAILS_ROOT>/config/ultrasphinx/test.conf
evan      1326   0.0  0.0    78100    292 s000  S     5:36PM   0:00.68 searchd --config <YOUR_RAILS_ROOT>/config/ultrasphinx/development.conf

Getting sphinx to play well with foxy fixtures

The next problem I discovered was that on some machines, but not others, running my search specs would result in these weird errors:

1)
ActiveRecord::RecordNotFound in 'SearchesController Person searches should search by name'
Couldn't find Person with ID=328556765
/var/www/domains/unithrive/vendor/plugins/ultrasphinx/lib/ultrasphinx/search/internals.rb:308:in `reify_results'
/var/www/domains/unithrive/vendor/plugins/ultrasphinx/lib/ultrasphinx/search/internals.rb:286:in `each'
/var/www/domains/unithrive/vendor/plugins/ultrasphinx/lib/ultrasphinx/search/internals.rb:286:in `reify_results'
/var/www/domains/unithrive/vendor/plugins/ultrasphinx/lib/ultrasphinx/search.rb:362:in `run'
/var/www/domains/unithrive/vendor/plugins/ultrasphinx/lib/ultrasphinx/search/internals.rb:352:in `perform_action_with_retries'
/var/www/domains/unithrive/vendor/plugins/ultrasphinx/lib/ultrasphinx/search.rb:342:in `run'
/var/www/domains/unithrive/app/controllers/searches_controller.rb:38:in `index'
./spec/controllers/searches_controller_spec.rb:51:
script/spec:4:

When I poked into this "Couldn't find Person with ID=328556765" error, it seemed like sphinx was almost working. The index was set up, and the search was finding someone in the index during the test. Ultrasphinx was passing back the id 328556765, which didn't exist in the database. So why would Sphinx "find" a record in its index but then pass back an ID for a database record that didn't exist?

And furthermore, why would it work on one machine, but not on another?

The brainstorm came when I checked what the actual database IDs were for this particular record, with Person.find_by_name("fixtures' name").id. On machines where it worked, the id was a huge number (is it generally is with foxy fixtures), but on machines where it didn't work, the id was an even huger number.

Sphinx tries to make sure that all items that get indexed have a different index in sphinx, and it does this by multiplying all of your id's by N, where N is the number of models getting indexed, and adding an offset of 0 for the first model, 1 for the second, etc. This guarantees that every record from every table will have a unique id. In the case of this application, all of my Person records were getting indexed by sphinx as (Person#id * 4 + 2).

Danger, Will Robinson: 32-bit int rollover!

The problem is that foxy fixtures generate their own ids from a hash of the fixture label, and those ids can be anywhere in the 32-bit unsigned integer space. But Sphinx also stores ids as 32-bit unsigned integers. This means if you happen to get a large fixture id, and then sphinx multiplies it by 4 (or whatever; it could be higher if you have more indexed models), your id will rollover and come out as (id * N + n) % (2^32). Sphinx will store that result, and then when it finds the record in a search, it will try to recreate the original id by subtracting n and dividing by N ... giving you the wrong id. Your test will fail to find the record.

Incidentally, this problem with foxy fixtures is why your test.base file needs the line sql_range_step = 999999999. Sphinx builds indexes by searching a few ids at a time. But the ids generated by foxy fixtures are so big that if sphinx only collects them in ranges of 5000 at a time, it will take forever to find them all.

After some googling, I found that these issues are discussed in a thread over at RubyForge.

The solution

I'm working on a plugin that monkeypatches foxy fixtures to create sequential, low-numbered IDs. In the meantime, you can just compile sphinx to support 64-bit ids, which should give you plenty of headroom to handle foxy fixture ids multiplied by N in sphinx*:

In your sphinx source directory:

configure --enable-id64
make
sudo make install

That should do it. Let me know in comments if any of this information helped you.

*At least until you start approaching 2^32 models in your application, that is.

  1. Neil Said,

    I’m glad I went back to the comments on the Thinking Sphinx Railscast episode now - it’s looks like you may have a fix for the DOCID MAX errors I was seeing during indexing with Thinking Sphinx (I did manage to work out it was the size of the IDs that was causing the error). I haven’t worked out a clean solution yet (I imagine things will be a little different in TS as opposed to Ultrasphinx) - going through all my fixtures and manually setting an ID doesn’t appeal, but I would be interested in trying your plugin when it’s ready.

  2. Michael Hartl Said,

    Thanks for the simultaneous test/development tip. I hadn’t needed it until today, but it was very helpful to know it was sitting here waiting for me.

Add A Comment