lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Re: Faceted search problem
Date Wed, 17 Jan 2007 06:36:24 GMT

On Jan 16, 2007, at 10:05 PM, Peter McPeterson wrote:
> Hi all, I'm trying this solr ruby DSL called Flare/solrb and I  
> don't really know how the faceted search works because I cant add  
> whatever fields I want to to the index. This is currently not working:
> conn ='http://localhost:8983/solr')
> doc = {:id => 1, :cat => 'eletronics', :features => 'video,  
> music', :product => 'iPod'}
> conn.send(
> => #<Solr::Response::AddDocument:0x554c2c  
> @status_message="ERROR:unknown field 'cat'", @status_code="400",  
> @raw_response="<result status=\"400\">ERROR:unknown field 'cat'</ 
> result>", @doc=<UNDEFINED> ... </>>
> In case that if it was working, what I'd like to do is:
> (pseudo-code)
> request =
> :query => 'ipod',
> :facets => {
>  :fields => :cat
>  }
> )
> Any help would be appreciated.

I'm copying in Ed Summers, who may not be on solr-user now, but is a  
key contributor to solrb at the moment also.

Good question Peter.  Bear with this, as I want to detail lots here  
so folks understand what is going on with solrb a bit more clearly  
than svn commits and brief allusions.

There are a couple of important things to note here specifically  
about Solr itself.  It is driven by a schema (see solr/solr/conf/ 
schema.xml) which defines how fields are handled within Solr/Lucene.   
Solr needs to know what to do with field text when it gets it from an  
<add>.  In the solrb version of Solr's schema, which varies from the  
Solr schema that ships with the Solr example application, locks down  
two only 3 field naming possiblities: id, *_text, and *_facet).  I  
intentionally started it as simple as I could for now, knowing that  
opening up the schema is inevitable and we want to do it wisely with  
a bit more knowledge of how we want Ruby and Solr to interoperate.

Two relatively quick fix options to get you started:

   (A) difficulty: easy Rename your non-id fields to *_text and  
*_facet.  For example:

        doc = {:id => 1, :cat_facet => 'eletronics', :features_facet  
=> 'video, music', :product_text => 'iPod'}

   (B) difficulty: solr experienced only.  You're welcome to tweak  
the schema.xml and go to town with Request::AddDocument and any field  
names you want.  Be sure you know what you're doing with faceting,  
tokenization, and sorting though.

-- NOTE: If you're familiar with Solr, this will make sense as a  
difference to the Solr proper example schema --
   id:  is mandatory, and is a unique identifier for a document, it  
can be any string you like.  how searchable this id is depends on  
what characters it contains.  minimizing special characters makes it  
easier to search for a specific id without worrying about query  
parser syntax conflicts.

   *_text: is tokenized and copied into the "text" field (so the  
client doesn't need to/shouldn't send a "text" field, only *_text  
field names).  the default search field is "text" and includes text  
from all *_text fields.

   *_facet: is not tokenized, and it is suitable for use with the  
faceting features that Solr supports

The faceting feature is only starting to come together through the  
API, and so its not quite easily exposed.  In fact, only earlier  
today did the response handling refactoring allow for facets to be  

*** Sidebar ***
Why does the facet data come back as outside the 'response'  
structure?  Here's an example:

	'q'=>'[* TO *]',
	 '20th century.'=>1251,
	 '20th century'=>1250,
	 'History and criticism.'=>1769,

   (yes, i'm refactoring to add Yonik's latest facet changes in now!)

Have a look at the latest API, thanks in large part to Ed's ideas on  
where a Sol.rb DSL should head:


Here's the example pasted below:

   require 'solr'  # load the library
   include Solr    # Allow Solr:: to be omitted from class/module  

   # connect to the solr instance
   conn ='http://localhost:8983/solr', :autocommit  
=> :on)

   # add a document to the index
   conn.add(:id => 123, :title_text => 'Lucene in Action')

   # update the document
   conn.update(:id => 123, :title_text => 'Solr in Action')

   # print out the first hit in a query for 'action'
   response = conn.query('action')
   print response.hits[0]

   # iterate through all the hits for 'action'
   conn.query('action') do |hit|
     puts hit.inspect

   # delete document by id

We'll expand this short little example to include a facet or two as  
well for demo purposes.  I'll do that in a day or so, after I upgrade  
solrb to Yonik's latest trunk changes for faceting.

In order to get facets from the trunk solrb API, I'm doing this  
currently in the Flare (unchecked in code, in a Rails action):

     field = "#{params[:value]}_facet"
     req = => "[* TO *]",
        :facets => {:fields => [field],
                    :limit => -1, :zeros => false, :missing => true
        :rows => 0

     results = SOLR.send(req)

     @facets =['facet_counts']

In your data, name the field cat_facet and if you've indexed various  
categories, you'll see a dump of how many of each unique category  
there are in all of the data.  To constrain, add :filter_queries or  
adjust the main :query parameters to Request::Standard.

Disclaimer: all of the API we've seen thus far is currently being  
tweaked daily, as we feel our way through this.  Early adopters  
welcome, that want to tinker.  I don't envision this being stabilized  
for a 1.0 release quality kinda thing for a couple to a few months  
and by then we'll have ironed out lots about field naming conventions  
(or schema.xml generation from a Ruby model perhaps, maybe even both  
dynafields and schema generation are worth having).     I am aiming  
for field name mapping magic to occur at some layer above the raw  
solrb stuff you're using now, so you're talking closer to the metal  
now than most RubySolrists will be in the near future.   Here's one  
vision of a possible straw man future: 

Welcome, Peter!


View raw message