LRBlog

Logical Reality Design: Web Design and Software Development

Fixing problems with sphinx search

July 24, 2008

I've been working a lot this week with sphinx and ultrasphinx on a project that's a fork of Insoshi.    Insoshi is in the process of switching search from ferret to sphinx, and sphinx has been integrated into the Insoshi edge branch.

I've had dozens of problems, in fact it's fair to say I've spent upwards of 15 hours just debugging ultrasphinx and getting my tests to pass.   There were several problems; here are the main three and how I fixed each one.

This should be useful to anyone upgrading Insoshi to the sphinx version, or to anyone else trying to get ultrasphinx working in their Rails project. I definitely don't recommend starting with this post if you're just starting out with sphinx. Instead, go read this much better introductory tutorial from the guys over at Insoshi. Then if you have problems, come back here and you may find solutions.

Getting search tests (or specs) to pass with sphinx

This one is pretty simple, in retrospect, but it can be frustrating and opaque if you are used to ferret.  Unlike ferret, sphinx (at least via ultrasphinx) runs only via a daemon.   Where acts_as_ferret uses a daemon only for the production environment and just accesses the index files directly in test or development, ultrasphinx can only get to the indexes through the daemon.

So, to run your tests, you just build up the indexes for test and run them.  In this case, I'm running the specs for Insoshi's searches controller:

From the command line in $RAILS_ROOT:

rake db:test:prepare
rake ultrasphinx:configure RAILS_ENV=test
rake ultrasphinx:index RAILS_ENV=test
rake ultrasphinx:daemon:start RAILS_ENV=test
script/spec spec/controllers/searches_controller_spec.rb

The problem, of course, is that it doesn't work!   The reason is that db:test:prepare creates the structure of your database, but doesn't load any of your fixtures as data: the test db is empty..  So when you run the index command, an empty index is built.   You can see this from the output of that first index command, which will look something like this:

collected 0 docs, 0.0 MB
total 0 docs, 0 bytes
total 0.078 sec, 0.00 bytes/sec, 0.00 docs/sec

Ultrasphinx has built an empty index.

The solution

The solution, believe it or not, is to run the tests, let them fail, re-index, and run the tests again (Many thanks to Long Nguyen at Insoshi for helping me figure this one out):

rake db:test:prepare
rake ultrasphinx:configure RAILS_ENV=test
rake ultrasphinx:index RAILS_ENV=test
rake ultrasphinx:daemon:start RAILS_ENV=test
script/spec spec/controllers/searches_controller_spec.rb #FAIL!!
rake ultrasphinx:index RAILS_ENV=test
script/spec spec/controllers/searches_controller_spec.rb #PASS!!

The first attempt to run the specs loads the fixtures, and leaves them in the database, thus letting the subsequent index command build an actual index.

Running sphinx for both test and development environments at the same time

The next big challenge was enabling behavior-driven development. I like to work with autotest and growl running constantly in the background. But this was tough to do with sphinx, because the daemon needed to be stopped and re-started, and the index re-created for each environment, alternately running all of the above commands either with or without RAILS_ENV=test.

The solution is to set up your ultrasphinx base configuration to completely separate both the test and development indexes and to let the daemons for the two environments listen on different ports. I had tried something like this and come close, but not quite, when Long at Insoshi again bailed me out. You need to change the port (in two places), and the paths of the logs, pidfile, and index directories so that test and development daemons are using entirely separate resources. Here's a diff of my test.conf and default.conf:

33c33
< port = 3312
---
> port = 3322
35c35
< log = log/searchd.log
---
> log = log/searchd_test.log
39c39
< pid_file = log/searchd.pid
---
> pid_file = log/searchd_test.pid
50c50
< server_port = 3312
---
> server_port = 3322
57c57
< sql_range_step = 5000
---
> sql_range_step = 999999999
64c64
< path = sphinx
---
> path = sphinx_test

The sql_range_step is related to the next issue, which is that sphinx does not play well with foxy fixtures. Anyway, make the above changes and you should be able to run test and development sphinx daemons at the same time:

rake db:test:prepare
rake ultrasphinx:configure
rake ultrasphinx:configure RAILS_ENV=test
rake ultrasphinx:index
rake ultrasphinx:index RAILS_ENV=test
rake ultrasphinx:daemon:start
rake ultrasphinx:daemon:start RAILS_ENV=test

If it worked, you should see separate indexes in $RAILS_ROOT/sphinx and $RAILS_ROOT/sphinx_test, and two daemons running, which you can confirm with ps waux | grep searchd:
evan 1339 0.0 0.0 78100 292 s000 S 5:37PM 0:00.52 searchd --config /config/ultrasphinx/test.conf
evan 1326 0.0 0.0 78100 292 s000 S 5:36PM 0:00.68 searchd --config /config/ultrasphinx/development.conf

Getting sphinx to play well with foxy fixtures

The next problem I discovered was that on some machines, but not others, running my search specs would result in these weird errors:
1)
ActiveRecord::RecordNotFound in 'SearchesController Person searches should search by name'
Couldn't find Person with ID=328556765
/var/www/domains/unithrive/vendor/plugins/ultrasphinx/lib/ultrasphinx/search/internals.rb:308:in `reify_results'
/var/www/domains/unithrive/vendor/plugins/ultrasphinx/lib/ultrasphinx/search/internals.rb:286:in `each'
/var/www/domains/unithrive/vendor/plugins/ultrasphinx/lib/ultrasphinx/search/internals.rb:286:in `reify_results'
/var/www/domains/unithrive/vendor/plugins/ultrasphinx/lib/ultrasphinx/search.rb:362:in `run'
/var/www/domains/unithrive/vendor/plugins/ultrasphinx/lib/ultrasphinx/search/internals.rb:352:in `perform_action_with_retries'
/var/www/domains/unithrive/vendor/plugins/ultrasphinx/lib/ultrasphinx/search.rb:342:in `run'
/var/www/domains/unithrive/app/controllers/searches_controller.rb:38:in `index'
./spec/controllers/searches_controller_spec.rb:51:
script/spec:4:

When I poked into this "Couldn't find Person with ID=328556765" error, it seemed like sphinx was almost working. The index was set up, and the search was finding someone in the index during the test. Ultrasphinx was passing back the id 328556765, which didn't exist in the database. So why would Sphinx "find" a record in its index but then pass back an ID for a database record that didn't exist?

And furthermore, why would it work on one machine, but not on another?

The brainstorm came when I checked what the actual database IDs were for this particular record, with Person.find_by_name("fixtures' name").id. On machines where it worked, the id was a huge number (is it generally is with foxy fixtures), but on machines where it didn't work, the id was an even huger number.

Sphinx tries to make sure that all items that get indexed have a different index in sphinx, and it does this by multiplying all of your id's by N, where N is the number of models getting indexed, and adding an offset of 0 for the first model, 1 for the second, etc. This guarantees that every record from every table will have a unique id. In the case of this application, all of my Person records were getting indexed by sphinx as (Person#id * 4 + 2).

Danger, Will Robinson: 32-bit int rollover!

The problem is that foxy fixtures generate their own ids from a hash of the fixture label, and those ids can be anywhere in the 32-bit unsigned integer space. But Sphinx also stores ids as 32-bit unsigned integers. This means if you happen to get a large fixture id, and then sphinx multiplies it by 4 (or whatever; it could be higher if you have more indexed models), your id will rollover and come out as (id * N + n) % (2^32). Sphinx will store that result, and then when it finds the record in a search, it will try to recreate the original id by subtracting n and dividing by N ... giving you the wrong id. Your test will fail to find the record.

Incidentally, this problem with foxy fixtures is why your test.base file needs the line sql_range_step = 999999999. Sphinx builds indexes by searching a few ids at a time. But the ids generated by foxy fixtures are so big that if sphinx only collects them in ranges of 5000 at a time, it will take forever to find them all.

After some googling, I found that these issues are discussed in a thread over at RubyForge.

The solution

I'm working on a plugin that monkeypatches foxy fixtures to create sequential, low-numbered IDs. In the meantime, you can just compile sphinx to support 64-bit ids, which should give you plenty of headroom to handle foxy fixture ids multiplied by N in sphinx*:

In your sphinx source directory:

configure --enable-id64
make
sudo make install

That should do it. Let me know in comments if any of this information helped you.

*At least until you start approaching 2^32 models in your application, that is.

Getting mysql and sphinx to work on OS X 10.4 on a G5

May 16, 2008

After some significant problems using acts_as_ferret (for some unknown reason, it absolutely will not run on my deployment server; the models will load in test but not in development or production if they use acts_as_ferret), I decided to try sphinx on the recommendation of Michael Hartl. Unfortunately, I had some problems there too: sphinx would not compile on my development workstation, a Dual G5 mac running OS X 10.4.

I did manage to solve it, though, by switching from the package-installed version of MySQL to a compiled-from-source copy. In case anyone else runs into the problem, here's a record of how I solved it.

Symptoms of the problem

Sphinx would not compile on my mac. The error seemed to be a complaint about the header libraries, certain MySQL symbols were not getting correctly defined:

[20:05:10] sphinx-0.9.8-rc2$ make
Making all in src
g++ -o indexer indexer.o libsphinx.a -L/usr/local/mysql/lib -lmysqlclient -lz
/usr/libexec/gcc/powerpc-apple-darwin8/4.0.1/ld: Undefined symbols:
_mysql_close
_mysql_errno
_mysql_error
_mysql_fetch_fields
_mysql_fetch_row
_mysql_free_result
_mysql_init
_mysql_num_fields
_mysql_query
_mysql_real_connect
_mysql_use_result
collect2: ld returned 1 exit status
make[1]: * [indexer] Error 1
make: * [all-recursive] Error 1

It also fail to build with MacPorts (port install sphinx), with similar errors. So, after some digging I decided to try swapping out MySQL, which I'd installed from the .dmg package installer, to a copy compiled from source, hoping that this would get the header libraries set up correctly. It worked, and in the process I upgraded from 5.0.45 to the current 5.0.51b. Here's the sequence:

First, upgrade mysql to a hand-built one.

These instructions are straight from the HiveLogic instructions on building MySQL on OS X. All I've added is the backup and restore of databases plus a newer URLs for the most recent MySQL 5.0 release (as of May 16, 2008).

Back up and turn off the old server:

First, back up all your existing databases using mysqldump. I did it to a bzipped file in my home directory.


[11:39:03] evan$ mysqldump --all-databases -p | bzip2 -c > databasebackup.sql.bz2

This could take a while, depending on how many databases you have, and how large. About ten minutes, for me. Now, shut down the server and remove the symlink /usr/local/mysql:


[11:49:13] evan$ sudo mysqladmin shutdown
[11:51:56] bin$ cd /usr/local
[11:52:14] local$ sudo rm mysql

Now download and build the new mysql:


[11:47:10] evan$ cd ~/src
[11:47:23] src$ wget http://dev.mysql.com/get/Downloads/MySQL-5.0/mysql-5.0.51b.tar.gz/from/http://mysql.he.net/
[11:48:09] src$ tar xzf mysql-5.0.51b.tar.gz
[11:48:25] src$ cd mysql-5.0.51b

Configure and compile:

[11:49:10] mysql-5.0.51b$ CC=gcc CFLAGS="-O3 -fno-omit-frame-pointer" CXX=gcc \
CXXFLAGS="-O3 -fno-omit-frame-pointer -felide-constructors \
-fno-exceptions -fno-rtti" \
[11:49:32] mysql-5.0.51b$ ./configure --prefix=/usr/local/mysql \
--with-extra-charsets=complex --enable-thread-safe-client \
--enable-local-infile --enable-shared
[11:49:32] mysql-5.0.51b$ make; sudo make install;

This takes a while. I went to take a shower. Fortunately, for me, it worked the first time and required no hackery, which is a pleasant surprise when building OSS software. Now setup the initial directories and databases:


[12:10:17] mysql-5.0.51b$ cd /usr/local/mysql
[12:10:21] mysql-5.0.51b$ sudo ./bin/mysql_install_db --user=mysql
[12:10:28] mysql-5.0.51b$ sudo chown -R mysql ./var

And finally start the new server and import the backed-up databases:

cd /usr/local/mysql/bin
sudo ./mysqld_safe &
bzcat ~/databasebackup.sql.bz2 | mysql

After another ten-minute import, I'm back up and running with a fresh build of MySQL.

Installing Sphinx

After that, sphinx (0.9.8-rc2) compiled normally:


[13:20:51] src$ wget http://www.sphinxsearch.com/downloads/sphinx-0.9.8-rc2.tar.gz
[13:20:56] src$ tar xzf sphinx-0.9.8-rc2.tar.gz
[13:21:09] src$ cd sphinx-0.9.8-rc2
[13:21:12] sphinx-0.9.8-rc2$ ./configure; make
[13:23:30] sphinx-0.9.8-rc2$ make install
... success ...