lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Høydahl <jan....@cominvent.com>
Subject Re: Splitting fields
Date Tue, 31 May 2011 19:34:57 GMT
Hi,

Write a custom UpdateProcessor, which gives you full control of the SolrDocument prior to
indexing. The best would be if you write a generic FieldSplitterProcessor which is configurable
on what field to take as input, what delimiter or regex to split on and finally what fields
to write the result to. This way other may re-use your code for their splitting needs.

See http://wiki.apache.org/solr/UpdateRequestProcessor and http://wiki.apache.org/solr/SolrConfigXml#UpdateRequestProcessorChain_section

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 27. mai 2011, at 15.47, Joe Fitzgerald wrote:

> Hello,
> 
> 
> 
> I am in an odd position.  The application server I use has built-in
> integration with SOLR.  Unfortunately, its native capabilities are
> fairly limited, specifically, it only supports a standard/pre-defined
> set of fields which can be indexed.  As a result, it has left me
> kludging how I work with Solr and doing things like putting what I'd
> like to be multiple, separate fields into a single Solr field.
> 
> 
> 
> As an example, I may put a customer id and name into a single field
> called 'custom1'.  Ideally, I'd like this information to be returned in
> separate fields...and even better would be for them to be indexed as
> separate fields but I can live without the latter.  Currently, I'm
> building out a json representation of this information which makes it
> easy for me to deal with when I extract the results...but it all feels
> wrong.
> 
> 
> 
> I do have complete control over the actual Solr installation (just not
> the indexing call to Solr), so I was hoping there may be a way to
> configure Solr to take my single field and split it up into a different
> field for each key in my json representation.
> 
> 
> 
> I don't see anything native to Solr that would do this for me but there
> are a few features that I thought sounded similar and was hoping to get
> some opinions on how I may be able to move forward with this...
> 
> 
> 
> Poly fields, such as the spatial location, might help?  Can I build my
> own poly-field that would split up the main field into subfields?  Do
> poly-fields let me return the subfields?  I don't quite have my head
> around polyfields yet.
> 
> 
> 
> Another option although I suspect this won't be considered a good
> approach, but what about extending the copyField functionality of
> schema.xml to support my needs?  It would seem not entirely unreasonable
> that copyField would provide a means to extract only a portion of the
> contents of the source field to place in the destination field, no?  I'm
> sure people more familiar with Solr's architecture could explain why
> this isn't really an appropriate thing for Solr to handle (just because
> it could doesn't mean it should)...
> 
> The other - and probably best -- option would be to leverage Solr
> directly, bypassing the native integration of my application server,
> which we've already done for most cases.  I'd love to go this route but
> I'm having a hard time figuring out how to "easily" accomplish the same
> functionality provided by my app server integration...perhaps someone on
> the list could help me with this path forward?  Here is what I'm trying
> to accomplish:
> 
> 
> 
> I'm indexing documents (text, pdf, html...) but I need to include fields
> in the results of my searches which are only available from a db query.
> I know how to have Solr index results from a db query, but I'm having
> trouble getting it to index the documents that are associated to each
> record of that query (full path/filename is one of the fields of that
> query).
> 
> 
> 
> I started to try to use the dataImport handler to do this, by setting up
> a FileDataSource in addition to my jdbc data source.  I tried to
> leverage the filedatasource to populate a sub-entity based on the db
> field that contains the full path/filename, but I wasn't sure how to
> specify the db field from the root query/entity.  Before I spent too
> much time, I also realized I wasn't sure how to get Solr to deal with
> binary file types this way either which upon further reading seemed like
> I would need to leverage Tika - can that be done within the confines of
> dataimporthandler?
> 
> 
> 
> Any advice is greatly appreciated.  Thanks in advance,
> 
> 
> 
> Joe
> 


Mime
View raw message