lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From George Everitt <gever...@appliedrelevance.com>
Subject Re: Can you parse the contents of a field to populate other fields?
Date Thu, 08 Nov 2007 05:42:20 GMT
I'm not sure I fully understand your ultimate goal or Yonik's  
response.  However, in the past I've been able to represent  
hierarchical data as a simple enumeration of delimited paths:

<field name="taxonomy">root</field>
<field name="taxonomy">root/region</field>
<field name="taxonomy">root/region/north america</field>
<field name="taxonomy">root/region/south america</field>

Then, at response time, you can walk the result facet and build a  
hierarchy with counts that can be put into a tree view.  The tree can  
be any arbitrary depth, and documents can live in any combination of  
nodes on the tree.

In addition, you can represent any arbitrary name value pair  
(attribute/tuple) as a two level tree.   That way, you can put any  
combination of attributes in the facet and parse them out at results  
list time.  For example, you might be indexing computer hardware.    
Memory, Bus Speed and Resolution may be valid for some objects but not  
for others.   Just put them in a facet and specify a separator:

<field name="attribute">memory:1GB</name>
<field name="attribute">busspeed:133Mhz</name>
<field name="attribute">voltage:110/220</name>
<field name="attribute">manufacturer:Shiangtsu</field>


When you do a facet query, you can easily display the categories  
appropriate to the object.  And do facet selections like "show me all  
green things" and "show me all size 4 things".


Even if that's not your goal, this might help someone else.


George Everitt







On Nov 7, 2007, at 3:15 PM, Kristen Roth wrote:

> So, I think I have things set up correctly in my schema, but it  
> doesn't
> appear that any logic is being applied to my Category_# fields - they
> are being populated with the full string copied from the Category  
> field
> (facet1::facet2::facet3...facetn) instead of just facet1, facet2, etc.
>
> I have several different field types, each with a different regex to
> match a specific part of the input string.  In this example, I'm
> matching facet1 in input string facet1::facet2::facet3...facetn
>
>    <fieldtype name="cat1str" class="solr.TextField">
>    	<analyzer type="index">
>    	    <tokenizer class="solr.PatternTokenizerFactory"
> pattern="^([^:]+)" group="1"/>
> 		</analyzer>
>    </fieldtype>
>
> I have copyfields set up for each Category_# field.  Anything  
> obviously
> wrong?
>
> Thanks!
> Kristen
>
> -----Original Message-----
> From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of Yonik
> Seeley
> Sent: Wednesday, November 07, 2007 9:38 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Can you parse the contents of a field to populate other
> fields?
>
> On 11/6/07, Kristen Roth <kristen.roth@molecular.com> wrote:
>> Yonik - thanks so much for your help!  Just to clarify; where should
> the
>> regex go for each field?
>
> Each field should have a different FieldType (referenced by the "type"
> XML attribute).  Each fieldType can have it's own analyzer.  You can
> use a different PatternTokenizer (which specifies a regex) for each
> analyzer.
>
> -Yonik
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message