lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <markus.jel...@openindex.io>
Subject Re: Splitting fields
Date Tue, 31 May 2011 19:41:45 GMT
I'd go for this option as well. The example update processor can't make it 
more easier and it's a very flexible approach. Judging from the patch in 
SOLR-2105 it should still work with the current 3.2 branch.

https://issues.apache.org/jira/browse/SOLR-2105


> Hi,
> 
> Write a custom UpdateProcessor, which gives you full control of the
> SolrDocument prior to indexing. The best would be if you write a generic
> FieldSplitterProcessor which is configurable on what field to take as
> input, what delimiter or regex to split on and finally what fields to
> write the result to. This way other may re-use your code for their
> splitting needs.
> 
> See http://wiki.apache.org/solr/UpdateRequestProcessor and
> http://wiki.apache.org/solr/SolrConfigXml#UpdateRequestProcessorChain_sect
> ion
> 
> --
> Jan H√łydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
> 
> On 27. mai 2011, at 15.47, Joe Fitzgerald wrote:
> > Hello,
> > 
> > 
> > 
> > I am in an odd position.  The application server I use has built-in
> > integration with SOLR.  Unfortunately, its native capabilities are
> > fairly limited, specifically, it only supports a standard/pre-defined
> > set of fields which can be indexed.  As a result, it has left me
> > kludging how I work with Solr and doing things like putting what I'd
> > like to be multiple, separate fields into a single Solr field.
> > 
> > 
> > 
> > As an example, I may put a customer id and name into a single field
> > called 'custom1'.  Ideally, I'd like this information to be returned in
> > separate fields...and even better would be for them to be indexed as
> > separate fields but I can live without the latter.  Currently, I'm
> > building out a json representation of this information which makes it
> > easy for me to deal with when I extract the results...but it all feels
> > wrong.
> > 
> > 
> > 
> > I do have complete control over the actual Solr installation (just not
> > the indexing call to Solr), so I was hoping there may be a way to
> > configure Solr to take my single field and split it up into a different
> > field for each key in my json representation.
> > 
> > 
> > 
> > I don't see anything native to Solr that would do this for me but there
> > are a few features that I thought sounded similar and was hoping to get
> > some opinions on how I may be able to move forward with this...
> > 
> > 
> > 
> > Poly fields, such as the spatial location, might help?  Can I build my
> > own poly-field that would split up the main field into subfields?  Do
> > poly-fields let me return the subfields?  I don't quite have my head
> > around polyfields yet.
> > 
> > 
> > 
> > Another option although I suspect this won't be considered a good
> > approach, but what about extending the copyField functionality of
> > schema.xml to support my needs?  It would seem not entirely unreasonable
> > that copyField would provide a means to extract only a portion of the
> > contents of the source field to place in the destination field, no?  I'm
> > sure people more familiar with Solr's architecture could explain why
> > this isn't really an appropriate thing for Solr to handle (just because
> > it could doesn't mean it should)...
> > 
> > The other - and probably best -- option would be to leverage Solr
> > directly, bypassing the native integration of my application server,
> > which we've already done for most cases.  I'd love to go this route but
> > I'm having a hard time figuring out how to "easily" accomplish the same
> > functionality provided by my app server integration...perhaps someone on
> > the list could help me with this path forward?  Here is what I'm trying
> > to accomplish:
> > 
> > 
> > 
> > I'm indexing documents (text, pdf, html...) but I need to include fields
> > in the results of my searches which are only available from a db query.
> > I know how to have Solr index results from a db query, but I'm having
> > trouble getting it to index the documents that are associated to each
> > record of that query (full path/filename is one of the fields of that
> > query).
> > 
> > 
> > 
> > I started to try to use the dataImport handler to do this, by setting up
> > a FileDataSource in addition to my jdbc data source.  I tried to
> > leverage the filedatasource to populate a sub-entity based on the db
> > field that contains the full path/filename, but I wasn't sure how to
> > specify the db field from the root query/entity.  Before I spent too
> > much time, I also realized I wasn't sure how to get Solr to deal with
> > binary file types this way either which upon further reading seemed like
> > I would need to leverage Tika - can that be done within the confines of
> > dataimporthandler?
> > 
> > 
> > 
> > Any advice is greatly appreciated.  Thanks in advance,
> > 
> > 
> > 
> > Joe

Mime
View raw message