LRBlog

Logical Reality Design: Web Design and Software Development

Archive for the ‘Development’ Category

Transactional Testing for Multiple Databases in ActiveRecord

March 18, 2010

We've been working on an app that needs to stand astride two databases - one local DB for the app itself, and another with restrictive policies about modifications that is nonetheless authoritative on many subjects. There's a fair amount of tricky interaction between the two, and testing has been a delightful challenge.

We're using the use_db plugin, and all it takes to make testing transactions happen around multiple DBs is:

In: spec/spec_helper.rb

require 'override_test_callbacks'

My concern comes from the fact that this is a direct and unfiltered monkeypatch on ActiveRecord::TestFixtures. So it relies on use_transactional_fixtures (which could certainly be used without using actual fixures, granted), and if the test transaction code moves within Rails, that's another integration to worry about. Or if we add a spec that doesn't wind up making ActiveRecord::TestFixtures load... Or if we decide to use something other than use_db...

So instead I'm using:

Spec::Runner.configure do |config|
  config.prepend_before do
    (UseDbPlugin.all_use_dbs - [ActiveRecord::Base]).each do |db|
      db.connection.increment_open_transactions
      db.connection.transaction_joinable = false
      db.connection.begin_db_transaction
    end
  end
 
  config.append_after do
    (UseDbPlugin.all_use_dbs - [ActiveRecord::Base]).reverse.each do |db|
      db.connection.rollback_db_transaction
      db.connection.decrement_open_transactions
    end
  end
end

If we weren't already using transactional fixtures, I might pull out the - [ActiveRecord::Base]. And if we were to change off of use_db, there's one place to change the transaction code. Finally, there's much less dependence on the innards of ActiveRecord - only it's published API.

Danger: ActiveRecord, param hashes, and symbol keys

March 10, 2010

Here's a little foible of ActiveRecord that cost me over an hour today. AR accepts both symbol keys and string keys when specifying attributes. Both of these are valid ways of mass assigning attributes to a Rails model:

MyModel.new(:field_1 => 'foo', :field_2 => 'bar')
MyModel.new('field_1' => 'foo', 'field_2' => 'bar')

It's convenient, often, to not have to worry about whether your keys are symbols are strings since they get converted around a bit when you pass parameters. The downside of this, however, is that it will happily accept BOTH without complaining, and will quietly default to the symbol key regardless of the order you specify them in:

>> model = MyModel.new(:field_1 => 'foo', 'field_1' => 'bar'); nil; 
>> mymodel.field_1
=> 'foo'
>> model = MyModel.new('field_1' => 'foo', :field_1 => 'bar'); nil;
>> mymodel.field_1
=> 'bar'

Okay, so that's kinda sloppy. Bad ActiveRecord! No Biscuit!

This can cause serious confusion for the unwary. When ActionController hands us a params hash, it always has String keys, like this:

>> eval params
=>  { 'article' => { 'title' => 'Awesome blog post', 'body' => 'I will make you smart' } }

But most of us, canonically, specify params and default AR values with symbols, like this:

   post :article => {:title => 'Awesome blog post', :body => 'I will make you smart'}

So we get used to thinking about them as symbols.

This means we can make mistakes like this one I made recently. Consider this block of code for a shopping cart model that pre-fills some fields for an associated Payment by pulling the address from the user's profile, to save the user re-typing their address:

class ShoppingCart < ActiveRecord::Base
  has_one :payment
 
  def build_default_payment(options = {}) 
    #prepopulate the billing address from the profile and merge
    #with params passed into options
    build_payment(prepopulated_fields.merge!(options)    
  end
 
  def prepopulated_fields
    if (addr = self.person.address)
      {
        :billing_address_1 => addr.line_1,
        :billing_address_2 => addr.line_2,
        :city => addr.city,
        :state => addr.state,
        :zip => addr.zipcode
      }
    else
      {}
    end
  end
end

Looks great, right? And if the user's address has a nil field (like no city, or no line_1), it will get overwritten by the hash merge.

Except not. I specified symbol keys in prepopulated_fields, but the hash getting passed to build_default_payment's 'options' argument has string keys, because it's coming from params. So the merge doesn't overwrite the value for :line_1, it simply adds a new key 'line_1'. So, if a user has a profile address but hadn't entered a line_1 (just city and state), and then manually entered line_1 in the payment form to submit, the Payment build during the create action was getting this hash:

build_payment({
   :line_1 => nil,
   :city => 'Pasadena',
   :state =>'CA',
   :zipcode => '91106'
   'line_1' => '100 Main St.'.
})

ActiveRecord was respecting the :line_1 => nil from the profile, and not the 'line_1' => '100 Main St.' from params. This meant that the user couldn't make payment! The payment had validates_inclusion_of line_1, and even though it was typed into the form it was getting ignored because of the nil from his profile address. Very frustrating for a user to manually type in a billing address and get back "Address Line 1 can't be blank." on every submit!

Nasty ... this one took a while to figure out. Beware of this little foible of ActiveRecord!

HOWTO: Setting up CruiseControl.rb on Slicehost

February 20, 2010

Continuous Integration is a key tool for collborative development, and CruiseControl.rb is the tool of choice for many Ruby and Rails teams, including us at Logical Reality.

Unfortunately, setting up CC.rb for a team can be a relatively frustrating experience: this guide (the first of a series of HOWTOs by LRD) will walk you through every step of setting up a team instance of CruiseControl.rb on a low-cost server from Slicehost.

Step 1: Lease a Ubuntu Slicehost account

I recommend a 384 slice or a 512 slice, as 256MB or RAM is pretty light for anything involving a Rails application.   Our CI server runs on a 512 slice; if you are running it on a smaller slice please let us know how it performs.

I used Ubuntu 9.10 (Karmic) for this post.

Step 2: Create a working user

Slicehost configures slices with an active root account - definitely a Ubuntu no-no - and no user account. Ick! Let's start by creating a user account with sudo access to do everything from. Log in as root using the information Slicehost sends you, run this (replace 'usename' with whatever name you like) and fill in the information it asks for:

 # adduser username

Then edit /etc/sudoers and add this line to the bottom of the file:

username    ALL=(ALL) ALL

Log out, and log back in as the user you've now configured, to make sure it work.

Step 3: Installing packages and gems

Reset your timezone:

sudo dpkg-reconfigure tzdata

Install a whole bunch of packages you'll want for running Rails applications and hosting CruiseControl:

sudo aptitude install locate emacs git-core ruby build-essential \
libopenssl-ruby ruby1.8-dev irb  apache2 apache2-mpm-prefork \
apache2-prefork-dev sqlite3 rubygems mysql-server mysql-client

Go grab a cup of coffee while those install. The mysql install will ask you to set a root password. Do so, and write it down for later use. When all the installs are done, come back and install the ruby gems you'll be needing:

sudo gem install sqlite3-ruby passenger mysql metric_fu reek roodi

Step 4: Assorted server configuration

Add this line to the bottom of your ~/.profile to put your gems in your path:

PATH="$PATH:/var/lib/gems/1.8/bin/"

And source it:

. ~/.profile

Some assorted config: set up the passenger module for Apache, set your hostname, and make /etc/hosts readable. (For some bizarre reason, /etc/hosts was only readable by root on my slice, and that has a tendency to break things down the road).

sudo /var/lib/gems/1.8/bin/passenger-install-apache2-module
sudo emacs /etc/hostname  # set it to "your.hostname.com"
sudo /bin/hostname -F /etc/hostname
sudo chmod a+r /etc/hosts

Step 5: Configure Passenger and Apache

We'll run CruiseControl.rb with Apache and Passenger. Start by enabling the Passenger module. The command below will walk you through a super-easy configuration:

sudo /var/lib/gems/1.8/bin/passenger-install-apache2-module

When the command completes, it will give you three lines to paste into your apache config, they should look pretty much like these below. Put these lines at the top of /etc/apache2/apache2.conf. I included the hostname I set in the previous step as ServerName.

LoadModule passenger_module /var/lib/gems/1.8/gems/passenger-2.2.8/ext/apache2/mod_passenger.so
PassengerRoot /var/lib/gems/1.8/gems/passenger-2.2.8
PassengerRuby /usr/bin/ruby1.8   
 
ServerName your.hostname.com

To set up the application itself, edit /etc/apache2/sites-available/default to look like this:

<VirtualHost *:80>
        ServerAdmin administrator@your-email-domain.com
        DocumentRoot /u/apps/cruisecontrol/public
        RailsEnv production
        RailsBaseURI /
        ServerName <IP Address from Slicehost>
        ServerAlias your.hostname.com
        SetEnv PATH /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/var/lib/gems/1.8/bin/
 
        ErrorLog /var/log/apache2/error.log
 
        # Possible values include: debug, info, notice, warn, error, crit,
        # alert, emerg.
        LogLevel warn
 
        CustomLog /var/log/apache2/access.log combined
</VirtualHost>

Make a home for the app. (I use /u/apps/ as a convention for apps in apache. Use whatever you like, but make sure your DocumentRoot in your config file above matches.)

sudo mkdir -p /u/apps

Step 6: Download and install CruiseControl.rb

Download cruisecontrol.rb from RubyForge (Check for the current version first; it was 1.4.0 when I installed), and give ownership to the web user www-data:

cd /u/apps
sudo wget http://rubyforge.org/frs/download.php/59598/cruisecontrol-1.4.0.tgz
sudo tar -zxf cruisecontrol-1.4.0.tgz     
sudo mv cruisecontrol-1.4.0 cruisecontrol   
sudo chown -R :www-data cruisecontrol

Give environment.rb to the web user; this prevents an Errno::EACCES accessing environment.rb from Passenger (see discussion at this forum post).

sudo chown www-data:www-data config/environment.rb

Turn off the built-in htaccess, it will break Passenger:

sudo mv public/.htaccess public/.htaccess-disabled
cd config  
sudo cp site_config.rb_example site_config.rb

Step 7: Setting up the user environment

CruiseControl.rb prefers, by default, to put project builds in the running user's ~/.cruise directory. This is unfortunate because the standard user for running Apache, www-data, doesn't have a user directory! There are ways to override this, but I've found that they cause significant problems down the line.

An example of the problem is letting CC.rb check out your source code. If you authenticate access to GitHub or another code repository with SSH, CC.rb — running as www-data — won't be able access your repo since www-data doesn't have a ~/.ssh directory to put the keys in!

After much hacking, I came to the unhappy conclusion that the best solution is simply to let CruiseControl.rb have its way and give user www-data a home directory. Boo, hiss, but here we go:

sudo /etc/init.d/apache2 stop   
sudo usermod -d /home/www-data www-data      
sudo usermod -s /bin/bash www-data
sudo /etc/init.d/apache2 start

If you give www-data standard config files as well, then you can set the PATH so that user www-data can find your gems, and you can set up ssh keys so that CruiseControl.rb can securely check out projects from GitHub or whatever source code repository you're using:

sudo cp -r /etc/skel /home/www-data 
sudo chown www-data:www-data /home/www-data
sudo su www-data               
cd
mkdir ~/.ssh  
cd ~/.ssh
ssh-keygen -t rsa
cat id_rsa.pub

Add this line to the bottom of ~/www-data/.profile:

PATH="$PATH:/var/lib/gems/1.8/bin/"

Re-start Apache:

sudo /etc/init.d/apache2 restart

At this point, you should be able to load CruiseControl.rb in a web browser at the IP address given to you by Slicehost, or at the domain name if you've set up DNS and it's resolving. Congratulations, you have CC.rb up and running! One last thing to configure.

Running CruiseControl.rb will have created a configuration directory . ~www-data/.cruise. You'll want to edit ~www-data/.cruise/site_config.rb to set two options. Uncomment and set appropriate values for this line:

 Configuration.email_from = 'cruisecontrolrb@mydomain.com'
 Configuration.dashboard_url = 'http://my.cruisecontrolrb.host/'

Okay, it's time to get a project installed!

Step 8: Setting up your first project

I'll use Logical Reality's open-source project, Convection, as an example project for CruiseControl.rb. This works best if you run it as user 'www-data'.

The command for adding a new project is really simple:

cd /u/apps/cruisecontrol 
sudo su www-data
./cruise add Convection -r git://github.com/LRDesign/Convection.git -s git

This will set up the build in ~www-data/.cruise/projects/Convection.

Create a test database for the application. For Convection, I'm going to use mysql, and prefix my database name with 'ci' for Continuous Integration.

mysqladmin -u root -p create ci_convection

We don't want to put a functioning database.yml in our GitHub repository, but at the same time we want CruiseControl.rb to be able to build and test the app without help from the user. For all our Rails projects, we use a custom rake task that generates a database.yml from command-line arguments, then rebuilds the database, run the specs, and generate output with metric_fu. For an example of how to do this, look at our integration.rake and ERB database.yml template from Convection.

To configure CruiseControl.rb to run Convection this, we need to add that task to the configuration file for this project. Edit ~www-data/.cruise/projects/Convection/cruise_config.rb so that it looks like this:

 
Project.configure do |project|
 
  # Send email notifications about broken and fixed builds to email1@your.site, email2@your.site (default: send to nobody)
  project.email_notifier.emails = ['sysadmin@lrdesign.com', 'judson@lrdesign.com']
 
  # Set email 'from' field
  project.email_notifier.from = 'sysadmin@lrdesign.com'
 
  # Build the project by invoking rake task 'custom'
  # project.rake_task = 'custom'
 
  # Build the project by invoking shell script "build_my_app.sh". Keep in mind that when the script is invoked,
  # current working directory is <em>[cruise&nbsp;data]</em>/projects/your_project/work, so if you do not keep build_my_app.sh
  # in version control, it should be '../build_my_app.sh' instead
  # project.build_command = 'build_my_app.sh'
  project.build_command = 'rake ci:run[localhost,root,<YOUR_MYSQL_ROOT_PASSWORD>,ci_convection] --trace RAILS_ENV=test'
 
  # Ping Subversion for new revisions every 5 minutes (default: 30 seconds)
  # project.scheduler.polling_interval = 5.minutes
end

Step 9: There is no step nine!

Okay, so it's not the simplest thing in the world to set up. But if you've done everything above correctly, you should have a running server your team can use for continuous integration. If you've included metric_fu in your build task, you should get both test output and a wealth of useful code metrics.

Did this sequence work for you? Did I omit a step or misspell a command? Let me know in comments, and I'll update/correct the post.

RailsTutorial.org launched

December 14, 2009

rails-tutorial-logo-2The new Ruby on Rails Tutorial book and website by Michael Hartl has launched at RailsTutorial.org.   Hartl is the author of RailsSpace and cofounder of the Insoshi Ruby on Rails social networking platform.

Logical Reality did the logo and layout design work for Rails Tutorial.

 

Using link_to (or other helper methods) in a controller

May 6, 2009

This one was a big aggravator to me lately. I have one controller that needs to call link_to and url_for, which are normally helper methods you'd call from a view. However, in this case during certain modifications to a record, I actually need to append user-visible HTML links to a block of HTML stored in that object, or possibly another one.

Specifically, I needed to put annotations in the description of a work order object that said, for example "this work order was escalated from Problem Report 293. This was done in a create action that redirected at the end and never rendered a view, so I really did need to generate that link in the controller. And for consistency with the rest of the application, I wanted to generate the link with link_to(@task).

Now, ActionView::Helpers::UrlHelper is not loaded in a Rails controller, even if you've put helper :all in application.rb (application_controller.rb in newer versions). So, when I tried to use link_to in the controller, I got an error:

NoMethodError: undefined method `link_to' for #
/Users/evan/Development/Ruby/eclipticdb/app/helpers/tasks_helper.rb:64:in `task_link'
/Users/evan/Development/Ruby/eclipticdb/app/controllers/tasks_controller.rb:103:in `escalate'
... etc ...

The first fix - but with a problem

A year ago, I fixed this just by adding include ActionView::Helpers::UrlHelper at the top of that controller. This worked great ... for a while.

Lately, I've been rewriting this application into a RESTful style - it had previously been a controller/action style application. In the process, I started linking things with resource paths and polymorphic paths ... a lot of link_to @task and edit_polymorphic_path(@task) sorts of bits. And these started breaking. I began seeing this mysterious error:

Error:

You have a nil object when you didn't expect it!
The error occurred while evaluating nil.url_for

... some code here that calls a link_to ...

Trace of template inclusion: /tasks/_task_panel.html.erb, /tasks/_task_tabbed_panel.html.erb, /tasks/index.html.erb

RAILS_ROOT: /Users/evan/Development/Ruby/eclipticdb
Application Trace | Framework Trace | Full Trace

vendor/rails/actionpack/lib/action_view/helpers/url_helper.rb:71:in `send'
vendor/rails/actionpack/lib/action_view/helpers/url_helper.rb:71:in `url_for'

This one was a real bitch to debug, I have to say. The line in question that was failing in url_helper.rb said this: url = @controller.send(:url_for, options). Clearly, @controller was nil ... which was very bizarre, because I never interact with that instance variable anywhere.

I thrashed around trying to find the cause of this error for quite some time. Eventually I realized that the link_to method was only failing when called from a view in TasksController, and not from any other controller. And then I realized that TasksController was the one where, a year ago, I'd put include ActionView::Helpers::UrlHelper at the top. Somehow, including that helper in the controller was nullifying @controller when those helper method we called from within the view. I removed the include and my polymorphic and resource links all started working again.

Now back to the original problem!

Of course, that then left me back with the problem I'd had a year ago ... needing to use link_to from within the controller and having no way to do it. After a fair bit of googling around I found this post from Neeraj, which had an interesting approach -- but a commenter had suggested a much easier solution:

[sourcecode language='ror']self.class.helpers.link_to[/sourcecode]

I'm not certain where one would find this in the docs, but it does seem to have solved my problem for now. Onward and upward!

Single Table Inheritance and RESTful Routes

March 17, 2009

I'm converting an old, controller/action/id style Rails application to a more RESTful way of doing things, and ran into a brief roadblock: one of my main tables uses single table inheritance to generate three subclasses of items. I never actually use the superclass "task", I only use the three subclasses "action item", "work order", and "problem report".

So, I ran into this little challenge: all three STI subclasses use the same controller, "tasks", because they all have essentially the same behavior and differ only in minor details. But, when I do a resources map:

map.resources :tasks

Then I get errors in much of my code when I say things like redirect_to @task, because if that task happens to be an ActionItem, it's trying to call action_item_path(@task), which doesn't exist.

I googled around a bit to no result. Striking out on my own, it turns out the answer is as simple as mapping each resource independently, and just overriding the controller in map.resources:

In config/routes.rb

map.resources :tasks
map.resources :action_items, :controller => 'tasks'
map.resources :work_orders, :controller => 'tasks'
map.resources :problem_reports, :controller => 'tasks'

Now, redirect_to @task works just fine regardless of which subclass @task happens to be.

Bypassing mass assignment for update_attributes

March 14, 2009

I've been following this excellent post by M. Hartl and this post by E. Chapweske banishing mass assignment from one of my Rails applications due to launch soon.

I'm following Chapweske's approach of blocking mass assignment by default in all models, by putting this line in an initializer:

ActiveRecord::Base.send(:attr_accessible, nil)

This had the expected side effect of breaking several zillion tests, because tests frequently use things like Model.build() and Model.create!() to generate on-demand fixtures during testing. Hartl has a great bit of code that creates unsafe_build() and unsafe_create() methods in ActiveRecord. You can use these methods instead of build() and create() to function as expected in your tests.

This works great, except that I also use the mass-assignment method update_attributes! in my tests and specs frequently, particularly when I want to spec the effect a change on one model has on an associated models' methods. So, I expanded on Hartl's helper code a bit, to give myself the necessary methods. In case it helps anyone else:

/lib/initializers/unsafe_build_and_create.rb

class ActiveRecord::Base

# Build and create records unsafely, bypassing attr_accessible.
# These methods are especially useful in tests and in the console.

def self.unsafe_build(attrs)
record = new
record.unsafe_attributes = attrs
record
end

def self.unsafe_create(attrs)
record = unsafe_build(attrs)
record.save
record
end

def self.unsafe_create!(attrs)
unsafe_build(attrs).save!
end

def unsafe_update_attributes!(attrs)
self.unsafe_attributes = attrs
self.save!
end

def unsafe_update_attributes(attrs)
self.unsafe_attributes = attrs
self.save
end

def unsafe_attributes=(attrs)
attrs.each do |k, v|
send("#{k}=", v)
end
end
end

Don’t overwrite Rails’ built-in instance variables

February 13, 2009

So I'm hammering away at a project tonight, writing a few specifications for a module.  I've changed very little - or so I think - when five of my specifications start reporting this error:

NoMethodError in 'LoansController POST 'create' with valid parameters should succeed'
undefined method `env' for

This happens on the line where I call post :create in a controller spec.   Undefined method 'env'?   What's that about? I'm certainly not trying to call a method named "env".

It took me a little bit to figure out what was going on.  See, this series of tests needed access to a particular LoanRequest object I was pulling out of fixtures.   So I'd put above the tests:

before(:each) do
... some other stuff ...
@request = loan_requests(:johns_loan_request) # fetch fixture
end

Well, kids, it just so turns out that it's a bad idea to overwrite the @request instance variable in any rails context. Who knew?

Come to think of it, it would be nice to change the accessibility and/or mutability of Rails' basic instance variables and classes to prevent this kind of accidental overwrite by the programmer. Because when you make that mistake, it's invariably a bit of a pain to figure out because the error it causes is obscure.

Maybe I'll have to dig into the code one of these days to see if anything can be done about it.

In Defense of Sass

February 11, 2009

I've been playing with Sass and Haml in my rails projects the last few months. While I'm a bit ambivalent about Haml, I've wholeheartedly adopted Sass. A friend just forwarded me this post at fecklessmind, which excoriates Sass as a maintainability nightmare.

While I understand the guy's complaints, I have to say I disagree. I think he's complaining about a code convention that he shouldn't be following in the first place, rather than the underlying language, and he's ignoring some of the other most useful things Sass brings to the table.

Nesting in Sass

One of the most common problems I've faced over the last eight years of writing stylesheets is interfering selectors. When you have a complex cascading selector, it's often not obvious exactly where it will apply, because of the way priorities work. So a hundred times I've set some styling on UL's and LI's (thinking of ones in my #content block), only to have them accidentally interfere with the layout of my suckerfish dropdowns back in #nav.

That's an easy case, but sometimes with complex selectors it can be hard to figure out who's interfering with whom. However, once you've found the culprit, the solution is generally to go back and wrap all of the rules in an outer selector, changing all of my li {rule} selectors to #nav li selectors, or whatever. When you have twenty different rules in that section, doing this is a royal pain in the butt. Especially when you have multiple tag selectors on one line: it's seriously annoying to change the nice clean h1, h2, h3 to #content h1, #content h2, #content h3!

When you do need these wraps, Sass makes it super easy via auto-nesting:

#nav
li
:color #whatever
:float left

a
:whatever etc

will compile to:
#nav li {
color: #whatever
float: left
}
#nav a {
whatever: etc
}

Now, the author of fecklessmind is complaining about how this makes rules harder to find, and how it slows down parsing. Both of these can be true, if you overdo it. But I don't - Sass doesn't force you to wrap your rules this way, and I frequently don't when it doesn't provide any benefit or when it would cause me to write redundant rules. I can and frequently do write single-line cascading selectors, and rules without wraps at all - the very things fecklessmind is complaining that Sass takes away from him.

Nothing about Sass prevents me from writing things like this:
body #nav ul li a
:float left

or even
#content h1, #nav h2, .article h3, p h4
:font-weight bold

If that's really what I want to do. I learned how and when to wrap selectors with a near-decade of writing CSS, and I apply those same guidelines when I write Sass - Sass just makes it easier when I do want to do it.

The benefits of nesting early

While I don't use Sass nesting everywhere I possibly could, I do often use it slightly more than would be absolutely required.

The reason is that it heads off a lot of annoying bugs with interfering selectors. For example, say a rule I wrote for .article .body p, and it's not getting applied. After some sleuthwork (long, painful, frustrating sleuthwork if I'm on a platform without firebug, like IE), it turns out this is because there's a #content p rule 2000 lines earlier in the CSS file that's obscuring it. When I nest things in Sass to create a clean cascade hierarchy, this kind of interference is far less likely to occur in the first place.

Meanwhile, the other benefits of Sass

CSS is riddled with problems, and Sass solves two of the most egregious: magic numbers/constants, and compiled server-side imports.

Eliminating magic numbers in CSS

For constants, Sass lets me define commonly used tokens (like colors, for example), and reuse them throughout my stylesheets. This means if I want to adjust a color, I can change it in only one place and the result is reflected throughout my code. Very handy:

!main_link_color = #48950a

a
:color= !main_link_color

#content h1
:border-width 0 0 1px 0
:border-color= !main_link_color

Now, if the client says "make the links blue, not green", I can change that constant and it gets automatically reflected everywhere else. Brilliant.

Organizing my code

fecklessmind says this:

... imagine that the stylesheet is 5000-lines long and you’re looking for p selector, rather than #article. In classic CSS you could just search for #main p, but in Sass they are miles apart. Swell, isn’t it?

A five-thousand-line line file? You're doing it wrong. No code should ever look like that. CSS is the only major language that compels you to work that way and Sass fixes it.

Every good programming language lets me put my code across multiple files, in a nice, organized heirarchy. One class per file and all that: essential for readability and maintainability. But if I use CSS, I can't very well organize my stylesheets into multiple files. If I do, I have to import them client-side, which generates extra hits for the user's browser and extra load for my server. As a result, CSS files tend to be monolithic multiple-kiloline monstrosities.

Sass fixes this. If I use @import to import a Sass file into another Sass file, Sass automatically and transparently compiles that server-side and ships out a single file to the user.

The result is that, writing Sass, I often have 20-30 files containing only a page or so of code, each for a specific feature or layout section. The client still only sees screen.css (and maybe print.css, mobile.css, and ie6.css), but screen.css contains the compiled contents of layout.sass, nav.sass, links.sass, content.sass, footer.sass, etc. In case I need to scan through the compiled screen.css and figure out where a rule came from, I start each file with a single comment containing the name of the file; /*------ nav.sass and such.

If the rule is for paragraphs that could appear anywhere in #main, in my Sass code it would be a file called main.sass, which is usually a relatively short file; 50-60 lines. (All the things destined for other elements would appear in articles.sass, or calendar.sass, or data_tables.sass, keeping main.sass short for only the universal elements).

That logical grouping — the way every other programming language does it — helps me find my CSS rules much more quickly, I think, than fecklessmind's "search for one-line selectors" would. Because with his approach, I might think I'm searching for #main p, when in fact what I really want is #main .section p, and thus my search won't find it.

In reality, there's no way to make a single 5000-line file easily maintainable, period. fecklessmind's little tricks are just that: tricks built from years of experience working in a broken system. Better to use logical organization to solve the problem, and Sass lets me do this.

The bottom line is, badly-written Sass could be horrible to maintain, and maybe worse in some ways than badly-written CSS (but better in others, particularly in weird cross-reactions between unwrapped css selectors). But the same is true of badly-written code of any type. And in my experience Sass gives me much better tools to write maintainable stylesheets than CSS alone does.

Fixing problems with sphinx search

July 24, 2008

I've been working a lot this week with sphinx and ultrasphinx on a project that's a fork of Insoshi.    Insoshi is in the process of switching search from ferret to sphinx, and sphinx has been integrated into the Insoshi edge branch.

I've had dozens of problems, in fact it's fair to say I've spent upwards of 15 hours just debugging ultrasphinx and getting my tests to pass.   There were several problems; here are the main three and how I fixed each one.

This should be useful to anyone upgrading Insoshi to the sphinx version, or to anyone else trying to get ultrasphinx working in their Rails project. I definitely don't recommend starting with this post if you're just starting out with sphinx. Instead, go read this much better introductory tutorial from the guys over at Insoshi. Then if you have problems, come back here and you may find solutions.

Getting search tests (or specs) to pass with sphinx

This one is pretty simple, in retrospect, but it can be frustrating and opaque if you are used to ferret.  Unlike ferret, sphinx (at least via ultrasphinx) runs only via a daemon.   Where acts_as_ferret uses a daemon only for the production environment and just accesses the index files directly in test or development, ultrasphinx can only get to the indexes through the daemon.

So, to run your tests, you just build up the indexes for test and run them.  In this case, I'm running the specs for Insoshi's searches controller:

From the command line in $RAILS_ROOT:

rake db:test:prepare
rake ultrasphinx:configure RAILS_ENV=test
rake ultrasphinx:index RAILS_ENV=test
rake ultrasphinx:daemon:start RAILS_ENV=test
script/spec spec/controllers/searches_controller_spec.rb

The problem, of course, is that it doesn't work!   The reason is that db:test:prepare creates the structure of your database, but doesn't load any of your fixtures as data: the test db is empty..  So when you run the index command, an empty index is built.   You can see this from the output of that first index command, which will look something like this:

collected 0 docs, 0.0 MB
total 0 docs, 0 bytes
total 0.078 sec, 0.00 bytes/sec, 0.00 docs/sec

Ultrasphinx has built an empty index.

The solution

The solution, believe it or not, is to run the tests, let them fail, re-index, and run the tests again (Many thanks to Long Nguyen at Insoshi for helping me figure this one out):

rake db:test:prepare
rake ultrasphinx:configure RAILS_ENV=test
rake ultrasphinx:index RAILS_ENV=test
rake ultrasphinx:daemon:start RAILS_ENV=test
script/spec spec/controllers/searches_controller_spec.rb #FAIL!!
rake ultrasphinx:index RAILS_ENV=test
script/spec spec/controllers/searches_controller_spec.rb #PASS!!

The first attempt to run the specs loads the fixtures, and leaves them in the database, thus letting the subsequent index command build an actual index.

Running sphinx for both test and development environments at the same time

The next big challenge was enabling behavior-driven development. I like to work with autotest and growl running constantly in the background. But this was tough to do with sphinx, because the daemon needed to be stopped and re-started, and the index re-created for each environment, alternately running all of the above commands either with or without RAILS_ENV=test.

The solution is to set up your ultrasphinx base configuration to completely separate both the test and development indexes and to let the daemons for the two environments listen on different ports. I had tried something like this and come close, but not quite, when Long at Insoshi again bailed me out. You need to change the port (in two places), and the paths of the logs, pidfile, and index directories so that test and development daemons are using entirely separate resources. Here's a diff of my test.conf and default.conf:

33c33
< port = 3312
---
> port = 3322
35c35
< log = log/searchd.log
---
> log = log/searchd_test.log
39c39
< pid_file = log/searchd.pid
---
> pid_file = log/searchd_test.pid
50c50
< server_port = 3312
---
> server_port = 3322
57c57
< sql_range_step = 5000
---
> sql_range_step = 999999999
64c64
< path = sphinx
---
> path = sphinx_test

The sql_range_step is related to the next issue, which is that sphinx does not play well with foxy fixtures. Anyway, make the above changes and you should be able to run test and development sphinx daemons at the same time:

rake db:test:prepare
rake ultrasphinx:configure
rake ultrasphinx:configure RAILS_ENV=test
rake ultrasphinx:index
rake ultrasphinx:index RAILS_ENV=test
rake ultrasphinx:daemon:start
rake ultrasphinx:daemon:start RAILS_ENV=test

If it worked, you should see separate indexes in $RAILS_ROOT/sphinx and $RAILS_ROOT/sphinx_test, and two daemons running, which you can confirm with ps waux | grep searchd:
evan 1339 0.0 0.0 78100 292 s000 S 5:37PM 0:00.52 searchd --config /config/ultrasphinx/test.conf
evan 1326 0.0 0.0 78100 292 s000 S 5:36PM 0:00.68 searchd --config /config/ultrasphinx/development.conf

Getting sphinx to play well with foxy fixtures

The next problem I discovered was that on some machines, but not others, running my search specs would result in these weird errors:
1)
ActiveRecord::RecordNotFound in 'SearchesController Person searches should search by name'
Couldn't find Person with ID=328556765
/var/www/domains/unithrive/vendor/plugins/ultrasphinx/lib/ultrasphinx/search/internals.rb:308:in `reify_results'
/var/www/domains/unithrive/vendor/plugins/ultrasphinx/lib/ultrasphinx/search/internals.rb:286:in `each'
/var/www/domains/unithrive/vendor/plugins/ultrasphinx/lib/ultrasphinx/search/internals.rb:286:in `reify_results'
/var/www/domains/unithrive/vendor/plugins/ultrasphinx/lib/ultrasphinx/search.rb:362:in `run'
/var/www/domains/unithrive/vendor/plugins/ultrasphinx/lib/ultrasphinx/search/internals.rb:352:in `perform_action_with_retries'
/var/www/domains/unithrive/vendor/plugins/ultrasphinx/lib/ultrasphinx/search.rb:342:in `run'
/var/www/domains/unithrive/app/controllers/searches_controller.rb:38:in `index'
./spec/controllers/searches_controller_spec.rb:51:
script/spec:4:

When I poked into this "Couldn't find Person with ID=328556765" error, it seemed like sphinx was almost working. The index was set up, and the search was finding someone in the index during the test. Ultrasphinx was passing back the id 328556765, which didn't exist in the database. So why would Sphinx "find" a record in its index but then pass back an ID for a database record that didn't exist?

And furthermore, why would it work on one machine, but not on another?

The brainstorm came when I checked what the actual database IDs were for this particular record, with Person.find_by_name("fixtures' name").id. On machines where it worked, the id was a huge number (is it generally is with foxy fixtures), but on machines where it didn't work, the id was an even huger number.

Sphinx tries to make sure that all items that get indexed have a different index in sphinx, and it does this by multiplying all of your id's by N, where N is the number of models getting indexed, and adding an offset of 0 for the first model, 1 for the second, etc. This guarantees that every record from every table will have a unique id. In the case of this application, all of my Person records were getting indexed by sphinx as (Person#id * 4 + 2).

Danger, Will Robinson: 32-bit int rollover!

The problem is that foxy fixtures generate their own ids from a hash of the fixture label, and those ids can be anywhere in the 32-bit unsigned integer space. But Sphinx also stores ids as 32-bit unsigned integers. This means if you happen to get a large fixture id, and then sphinx multiplies it by 4 (or whatever; it could be higher if you have more indexed models), your id will rollover and come out as (id * N + n) % (2^32). Sphinx will store that result, and then when it finds the record in a search, it will try to recreate the original id by subtracting n and dividing by N ... giving you the wrong id. Your test will fail to find the record.

Incidentally, this problem with foxy fixtures is why your test.base file needs the line sql_range_step = 999999999. Sphinx builds indexes by searching a few ids at a time. But the ids generated by foxy fixtures are so big that if sphinx only collects them in ranges of 5000 at a time, it will take forever to find them all.

After some googling, I found that these issues are discussed in a thread over at RubyForge.

The solution

I'm working on a plugin that monkeypatches foxy fixtures to create sequential, low-numbered IDs. In the meantime, you can just compile sphinx to support 64-bit ids, which should give you plenty of headroom to handle foxy fixture ids multiplied by N in sphinx*:

In your sphinx source directory:

configure --enable-id64
make
sudo make install

That should do it. Let me know in comments if any of this information helped you.

*At least until you start approaching 2^32 models in your application, that is.