Full-Text Search in Rails using elasticsearch
Searching on websites is important from a content discovery and user usability perspective. It allows readers to control the way they look for content instead of navigating through menus. Given this blog was missing full-text search, I decided to implement it using elasticsearch. Here is a detailed step-by-step guide on how I did that using outside-in development.
elasticsearch
elasticsearch is a distributed open source search server based on Apache Lucene. It allows for real-time searching and the ability to scale easily through replicas. Getting started is easy as elasticsearch is schema less. You only have to pass it a typed JSON document and it will automatically be indexed for you. Types are automatically determined by the server. It also allows you to define your own mappings to set boost levels, analyzers, and types.
Blog Engine Background
The code examples are in the post are extracted from the Adventures in Coding blog engine source code. It is a Ruby on Rails 3.1 application with a MongoDB database. All testing is done via Cucumber and RSpec. Views and CSS are generated with Haml and Sass respectively.
Getting Started
The first step in getting started is running an elasticsearch server locally for development. I do all my development on a Mac, so I easily installed it via homebrew:
$ brew install elasticsearch
If you are running another platform, you can build elasticsearch from source.
Once installed, I launched the server running:
$ elasticsearch -f -D es.config=/usr/local/Cellar/elasticsearch/0.19.3/config/elasticsearch.yml
Tire
Knowing that I am going to need a elasticsearch client, I selected the most popular Ruby client Tire. It is actively maintained, has a large amount of users, and also integrates nicely with Rails and ActiveModel.
To add it to the project, I added it to my Gemfile, and ran bundle.
# Gemfile
gem 'tire'
Search Feature
Now that everything is setup, I defined my Cucumber "Search" feature and scenarios which will drive development.
# features/search.feature
Feature: Search
As a User
I want to search for content
In order to find content easily over paginating
Background:
Given I am on the home page
Scenario: Find posts by content
Given the following posts:
| title |
| Demon in a Bottle |
| Extremis |
When I search for "Demon"
Then I should be on the search page
And I should see the post called "Demon in a Bottle" in the post list
But I should not see the post called "Extremis" in the post list
Scenario: No posts found
When I search for "Armor Wars"
Then I should be on the search page
And I should see a message indicating no posts were found
When I ran cucumber, my search scenarios failed because I haven't defined two step definitions. Cucumber informs me of this with the following output:
You can implement step definitions for undefined steps with these snippets:
When /^I search for "([^"]*)"$/ do |arg1|
pending # express the regexp above with the code you wish you had
end
Then /^I should see a message indicating no posts were found$/ do
pending # express the regexp above with the code you wish you had
end
Wanting to keep all search related step definitions together to keep things organized, I created the file features/step_definitions/search_steps.rb:
# features/step_definitions/search_steps.rb
When /^I search for "([^"]*)"$/ do |query|
fill_in 'query', with: query
click_button 'Search'
end
Then /^I should see a message indicating no posts were found$/ do
within('section.posts') do
page.should have_content('No posts were found')
end
end
Running cucumber again indicated that it couldn't find a query text field in the markup.
When I search for "Demon" # features/step_definitions/search_steps.rb:1
cannot fill in, no text field, text area or password field with id, name, or label 'query' found (Capybara::ElementNotFound)
To solve this, I needed to create a new search controller, a form to submit the search query, and a search index view.
Once implemented, Cucumber would be able to successfully interact with the search form and endpoints. Since I hadn't wired up the Post model to Tire yet, the scenarios were still failing.
Search Form
%section{ id: 'search' }
= form_tag search_path, method: :get do
= text_field_tag :query, params[:query], autocomplete: :off, placeholder: 'e.g. Ruby'
= submit_tag('Search')
Controller
# app/controllers/search_controller.rb
class SearchController < ApplicationController
def index
@posts = []
end
end
Route
# config/routes.rb:
get 'search', to: 'search#index'
Helper
# app/helpers/posts_helper.rb:
module PostsHelper
def render_posts(posts)
if posts.to_a.size > 0
render(posts)
else
content_tag(:div, "No posts were found", class: 'message')
end
end
end
Search index view
%section.posts= render_posts(@posts)
Adding Tire to ActiveModel
To make my Post model searchable, I included two modules:
include Tire::Model::Search
include Tire::Model::Callbacks
Now when I save a document, it will automatically create/update the elasticsearch index.
Since I am running elasticsearch in both my test and development environments, I needed to create a unique elasticsearch index name per environment. This ensured when I ran my Cucumber features, that my development indexes were not wiped out. I accomplished this by setting index_name to "blog-engine-#{Rails.env}".
I also wanted control of what attributes are sent to Tire for indexing. If Tire finds a to_indexed_json method on your model, it will use that over to_json. I decided to index the model _id, title, content, and published_at date.
Finally instead of relying on a dynamic schema, I defined a mapping which is explicit in how posts are analyzed and scored.
- Fields such as _id and published_at are indexed, but not analyzed.
- Post title's are given a boost rating of 100 over the default 1.0.
- Blog post text fields are analyzed with a snowball analyzer.
Here is the final app/models/post.rb code:
class Post
...
include Tire::Model::Search
include Tire::Model::Callbacks
index_name "blog-engine-#{Rails.env}"
mapping do
indexes :_id, index: :not_analyzed
indexes :title, analyzer: 'snowball', boost: 100
indexes :content, analyzer: 'snowball'
indexes :published_at, type: 'date', index: :not_analyzed
end
def to_indexed_json
{
_id: _id,
title: title,
content: content,
published_at: published_at
}.to_json
end
...
end
Handling requests to elasticsearch from RSpec
To ensure no outside HTTP requests to elasticsearch are made when running specs, I added a dependency to webmock.
To install webmock add it to your Gemfile under the test group:
# Gemfile
group :test do
gem 'webmock', require: nil
end
In my spec/spec_helper.rb file, I required the webmock library and stubbed out any request to the elasticsearch server (http://localhost:9200) in a before filter.
Here is an abbreviated version of my specs/spec_helper.rb:
ENV["RAILS_ENV"] ||= 'test'
require File.expand_path("../../config/environment", __FILE__)
require 'rspec/rails'
require 'webmock/rspec'
...
RSpec.configure do |config|
config.before :each do
stub_request(:post, /.*localhost:9200\/.*/).to_return(body: "{}")
Mongoid.purge!
end
end
If you are interested in finding out more about webmock, be sure to read the README on GitHub.
Handling search calls to elasticsearch
Finally, I need to implement a method that actually used elasticsearch to search for posts. I didn't want the method to live in my Post model, so I created a class PostRepository. I did this to keep view specific logic, such as pagination, away from my domain models.
PostRepository takes a hash of params from a controller. Using those params, it will extract the query and current page of results.
# app/models/post_repository.rb
class PostRepository
POSTS_PER_PAGE = 5
attr_accessor :params
def initialize(params)
self.params = params
end
def search
query = params[:query]
model.tire.search(load: true, page: params[:page], per_page: POSTS_PER_PAGE) do
query { string query, default_operator: "AND" } if query.present?
filter :range, published_at: { lte: Time.zone.now }
sort { by :published_at, "desc" } if query.blank?
end
end
protected
def model
Post
end
end
Here is a breakdown of what is occurs in the search method:
- I accessed the Tire search method from the
Post.tireaccessor. - Passing in option
load: truetosearchqueries the database for the models. If you store all your attributes in elasticsearch then this isn't needed. -
query { string query, default_operator: "AND" } if query.present?: I query elasticsearch for the query string passed by the user. This string is analyzed using a AND operator. filter :range, published_at: { lte: Time.zone.now }: I ensure that only published articles are returned in the results by filtering for dates less than or equal to the current time.sort { by :published_at, "desc" } if query.blank?: If a user enters no query string, I return the results in descending order of their published date.
I also updated the search_controller to use the new search method:
# app/controllers/search_controller.rb
class SearchController < ApplicationController
def index
@posts = PostRepository.new(params).search
end
end
Returning to Cucumber
Running the Cucumber search scenarios still resulted in failure. One of the main reasons is that I was not creating the Post index on each scenario run. To fix this, I added a Before hook to delete all existing indexes and recreate them based on my mappings.
# features/support/hooks.rb
Before do
Post.tire.index.delete
Post.create_elasticsearch_index
end
The features ran again, but still ended up in failure. Calls to elasticsearch were being done asynchronously, whereas my scenarios were being executed synchronously. The solution was to add a synchronization point in the code somewhere. The best place I thought was before any search execution. I updated the step definition /^I search for "([^"]*)"$/ to include an index refresh call Post.tire.index.refresh. This step definition would be executed after any data has been setup in a Given step.
# features/step_definitions/search_steps.rb
When /^I search for "([^"]*)"$/ do |query|
Post.tire.index.refresh
fill_in 'query', with: query
click_button 'Search'
end
...
Running cucumber again resulted in success:
Feature: Search
As a User
I want to search for content
In order to find content easily over paginating
Background:
Given I am on the home page
Scenario: Find posts by content
Given the following posts:
| title |
| Demon in a Bottle |
| Extremis |
When I search for "Demon"
Then I should be on the search page
And I should see the post called "Demon in a Bottle" in the post list
But I should not see the post called "Extremis" in the post list
Scenario: No posts found
When I search for "Armor Wars"
Then I should be on the search page
And I should see a message indicating no posts were found
2 scenarios (2 passed)
10 steps (10 passed)
0m1.315s
After thoughts
I love developing outside-in as it allows you to know your progress every step of the development process. The process shown in this post was not your typical Cucumber example. Adding external services as a dependency to your integration test suite is something I always try to avoid. However, having the confidence that I will always know if my features are working outweighs that negative.
Also, be sure to read about using elasticsearch witin Heroku.
Comments
blog comments powered by Disqus