Full-Text Search in Rails using elasticsearch | Adventures In Coding

Full-Text Search in Rails using elasticsearch

  • Kevin Faustino
Elasticsearch-logo

Searching on websites is important from a content discovery and user usability perspective. It allows readers to control the way they look for content instead of navigating through menus. Given this blog was missing full-text search, I decided to implement it using elasticsearch. Here is a detailed step-by-step guide on how I did that using outside-in development.

elasticsearch

elasticsearch is a distributed open source search server based on Apache Lucene. It allows for real-time searching and the ability to scale easily through replicas. Getting started is easy as elasticsearch is schema less. You only have to pass it a typed JSON document and it will automatically be indexed for you. Types are automatically determined by the server. It also allows you to define your own mappings to set boost levels, analyzers, and types.

Blog Engine Background

The code examples are in the post are extracted from the Adventures in Coding blog engine source code. It is a Ruby on Rails 3.1 application with a MongoDB database. All testing is done via Cucumber and RSpec. Views and CSS are generated with Haml and Sass respectively.

Getting Started

The first step in getting started is running an elasticsearch server locally for development. I do all my development on a Mac, so I easily installed it via homebrew:

$ brew install elasticsearch

If you are running another platform, you can build elasticsearch from source.

Once installed, I launched the server running:

$ elasticsearch -f -D es.config=/usr/local/Cellar/elasticsearch/0.19.3/config/elasticsearch.yml

Tire

Knowing that I am going to need a elasticsearch client, I selected the most popular Ruby client Tire. It is actively maintained, has a large amount of users, and also integrates nicely with Rails and ActiveModel.

To add it to the project, I added it to my Gemfile, and ran bundle.

# Gemfile
gem 'tire'

Search Feature

Now that everything is setup, I defined my Cucumber "Search" feature and scenarios which will drive development.

# features/search.feature
Feature: Search
  As a User
  I want to search for content
  In order to find content easily over paginating

  Background:
   Given I am on the home page

  Scenario: Find posts by content
    Given the following posts:
      | title             |
      | Demon in a Bottle |
      | Extremis          |
    When I search for "Demon"
    Then I should be on the search page
    And I should see the post called "Demon in a Bottle" in the post list
    But I should not see the post called "Extremis" in the post list

  Scenario: No posts found
    When I search for "Armor Wars"
    Then I should be on the search page
    And I should see a message indicating no posts were found

When I ran cucumber, my search scenarios failed because I haven't defined two step definitions. Cucumber informs me of this with the following output:

You can implement step definitions for undefined steps with these snippets:

When /^I search for "([^"]*)"$/ do |arg1|
  pending # express the regexp above with the code you wish you had
end

Then /^I should see a message indicating no posts were found$/ do
  pending # express the regexp above with the code you wish you had
end

Wanting to keep all search related step definitions together to keep things organized, I created the file features/step_definitions/search_steps.rb:

# features/step_definitions/search_steps.rb
When /^I search for "([^"]*)"$/ do |query|
  fill_in 'query', with: query
  click_button 'Search'
end

Then /^I should see a message indicating no posts were found$/ do
  within('section.posts') do
    page.should have_content('No posts were found')
  end
end

Running cucumber again indicated that it couldn't find a query text field in the markup.

When I search for "Demon" # features/step_definitions/search_steps.rb:1
  cannot fill in, no text field, text area or password field with id, name, or label 'query' found (Capybara::ElementNotFound)

To solve this, I needed to create a new search controller, a form to submit the search query, and a search index view. Once implemented, Cucumber would be able to successfully interact with the search form and endpoints. Since I hadn't wired up the Post model to Tire yet, the scenarios were still failing.

Search Form

%section{ id: 'search' }
  = form_tag search_path, method: :get do
    = text_field_tag :query, params[:query], autocomplete: :off, placeholder: 'e.g. Ruby'
    = submit_tag('Search')

Controller

# app/controllers/search_controller.rb
class SearchController < ApplicationController
  def index
    @posts = []
  end
end

Route

# config/routes.rb:
get 'search', to: 'search#index'

Helper

# app/helpers/posts_helper.rb:
module PostsHelper
  def render_posts(posts)
    if posts.to_a.size > 0
      render(posts)
    else
      content_tag(:div, "No posts were found", class: 'message')
    end
  end
end

Search index view

%section.posts= render_posts(@posts)

Adding Tire to ActiveModel

To make my Post model searchable, I included two modules:

  include Tire::Model::Search
  include Tire::Model::Callbacks

Now when I save a document, it will automatically create/update the elasticsearch index.

Since I am running elasticsearch in both my test and development environments, I needed to create a unique elasticsearch index name per environment. This ensured when I ran my Cucumber features, that my development indexes were not wiped out. I accomplished this by setting index_name to "blog-engine-#{Rails.env}".

I also wanted control of what attributes are sent to Tire for indexing. If Tire finds a to_indexed_json method on your model, it will use that over to_json. I decided to index the model _id, title, content, and published_at date.

Finally instead of relying on a dynamic schema, I defined a mapping which is explicit in how posts are analyzed and scored.

  • Fields such as _id and published_at are indexed, but not analyzed.
  • Post title's are given a boost rating of 100 over the default 1.0.
  • Blog post text fields are analyzed with a snowball analyzer.

Here is the final app/models/post.rb code:

class Post
    ...
  include Tire::Model::Search
  include Tire::Model::Callbacks

  index_name "blog-engine-#{Rails.env}"

  mapping do
    indexes :_id, index: :not_analyzed
    indexes :title, analyzer: 'snowball', boost: 100
    indexes :content, analyzer: 'snowball'
    indexes :published_at, type: 'date', index: :not_analyzed
  end

  def to_indexed_json
    {
      _id: _id,
      title: title,
      content: content,
      published_at: published_at
    }.to_json
  end

    ...
end

Handling requests to elasticsearch from RSpec

To ensure no outside HTTP requests to elasticsearch are made when running specs, I added a dependency to webmock.

To install webmock add it to your Gemfile under the test group:

# Gemfile
group :test do
  gem 'webmock', require: nil
end

In my spec/spec_helper.rb file, I required the webmock library and stubbed out any request to the elasticsearch server (http://localhost:9200) in a before filter.

Here is an abbreviated version of my specs/spec_helper.rb:

ENV["RAILS_ENV"] ||= 'test'
require File.expand_path("../../config/environment", __FILE__)
require 'rspec/rails'
require 'webmock/rspec'
...
RSpec.configure do |config|
  config.before :each do
    stub_request(:post, /.*localhost:9200\/.*/).to_return(body: "{}")
    Mongoid.purge!
  end
end

If you are interested in finding out more about webmock, be sure to read the README on GitHub.

Handling search calls to elasticsearch

Finally, I need to implement a method that actually used elasticsearch to search for posts. I didn't want the method to live in my Post model, so I created a class PostRepository. I did this to keep view specific logic, such as pagination, away from my domain models.

PostRepository takes a hash of params from a controller. Using those params, it will extract the query and current page of results.

# app/models/post_repository.rb
class PostRepository
  POSTS_PER_PAGE = 5

  attr_accessor :params

  def initialize(params)
    self.params = params
  end

  def search
    query = params[:query]
    model.tire.search(load: true, page: params[:page], per_page: POSTS_PER_PAGE) do
      query { string query, default_operator: "AND" } if query.present?
      filter :range, published_at: { lte: Time.zone.now }
      sort { by :published_at, "desc" } if query.blank?
    end
  end

  protected

  def model
    Post
  end
end

Here is a breakdown of what is occurs in the search method:

  • I accessed the Tire search method from the Post.tire accessor.
  • Passing in option load: true to search queries the database for the models. If you store all your attributes in elasticsearch then this isn't needed.
  • query { string query, default_operator: "AND" } if query.present?: I query elasticsearch for the query string passed by the user. This string is analyzed using a AND operator.
  • filter :range, published_at: { lte: Time.zone.now }: I ensure that only published articles are returned in the results by filtering for dates less than or equal to the current time.
  • sort { by :published_at, "desc" } if query.blank?: If a user enters no query string, I return the results in descending order of their published date.

I also updated the search_controller to use the new search method:

# app/controllers/search_controller.rb
class SearchController < ApplicationController
  def index
    @posts = PostRepository.new(params).search
  end
end

Returning to Cucumber

Running the Cucumber search scenarios still resulted in failure. One of the main reasons is that I was not creating the Post index on each scenario run. To fix this, I added a Before hook to delete all existing indexes and recreate them based on my mappings.

# features/support/hooks.rb
Before do
  Post.tire.index.delete
  Post.create_elasticsearch_index
end

The features ran again, but still ended up in failure. Calls to elasticsearch were being done asynchronously, whereas my scenarios were being executed synchronously. The solution was to add a synchronization point in the code somewhere. The best place I thought was before any search execution. I updated the step definition /^I search for "([^"]*)"$/ to include an index refresh call Post.tire.index.refresh. This step definition would be executed after any data has been setup in a Given step.

# features/step_definitions/search_steps.rb
When /^I search for "([^"]*)"$/ do |query|
  Post.tire.index.refresh
  fill_in 'query', with: query
  click_button 'Search'
end
...

Running cucumber again resulted in success:

Feature: Search
  As a User
  I want to search for content
  In order to find content easily over paginating

  Background:
    Given I am on the home page 

  Scenario: Find posts by content 
    Given the following posts:   
      | title             |
      | Demon in a Bottle |
      | Extremis          |
    When I search for "Demon"    
    Then I should be on the search page 
    And I should see the post called "Demon in a Bottle" in the post list
    But I should not see the post called "Extremis" in the post list

  Scenario: No posts found  
    When I search for "Armor Wars" 
    Then I should be on the search page 
    And I should see a message indicating no posts were found

2 scenarios (2 passed)
10 steps (10 passed)
0m1.315s

After thoughts

I love developing outside-in as it allows you to know your progress every step of the development process. The process shown in this post was not your typical Cucumber example. Adding external services as a dependency to your integration test suite is something I always try to avoid. However, having the confidence that I will always know if my features are working outweighs that negative.

Also, be sure to read about using elasticsearch witin Heroku.


Comments

blog comments powered by Disqus