lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alice Wong <airwayw...@gmail.com>
Subject Re: Associated values for a field and its value
Date Fri, 04 Oct 2013 15:08:40 GMT
Okay, it makes complete sense. Thanks.


On Fri, Oct 4, 2013 at 5:15 AM, Michael Sokolov <
msokolov@safaribooksonline.com> wrote:

>  On 10/3/13 6:04 PM, Alice Wong wrote:
>
>  Mike,
>
>  That's an interesting idea. The only drawback is we have to re-parse the
> doc and find where it matches and what the associated values are. It could
> be a performance issue if the doc becomes bigger and more complex.
>
> It's true there is some overhead for document-oriented processing.  Lux
> ameliorates this by storing a predigested binary xml form that can be
> traversed efficiently without the need for xml parsing.  However,
>
>
>  I am wondering if there is a way to index a value a1 for a field A and
> store a different value "1,2" associated with a1 in Lucene. Or there might
> be a hack for this?
>
> If you want to use only low-level Lucene constructs, I think payloads
> and/or complicated field values are the way to go.  You could, for example,
> index for document D, a field called "extra" with values like "a1:1,2",
> "a2:2,3".  I think that's what Aditya suggested. You still have to parse
> these though, so why not use a prebuilt flexible parsing infrastructure?
>
>
>  Thanks.
>
>
> On Thu, Oct 3, 2013 at 1:49 PM, Michael Sokolov <
> msokolov@safaribooksonline.com> wrote:
>
>>  On 10/02/2013 07:12 PM, Alice Wong wrote:
>>
>>> Hello,
>>>
>>> We would like to index some documents. Each field of a document may have
>>> multiple values. And for each (field,value) pair there are some
>>> associated
>>> values. These associated values are just for retrieving, not searching.
>>>
>>> For example, a document D could have a field named A. This field has two
>>> values a1 and a2.
>>>
>>> It is easy to index D, adding term a1 and a2 to field A, so either query
>>> "A=a1" or "A=a2" will return D.
>>>
>>> Assuming we have other values associated with (A,a1) and (A,a2) for D. We
>>> would like to retrieve these associated values depending on whether
>>> "A=a1"
>>> or "A=a2" is queried.
>>>
>>> For example, if query "A=a1" returns D, we would like to return values 1
>>> and 2. And if query "A=a2" returns D, we want to return values 3 and 10.
>>>
>>> Is it possible to do this with Lucene? Initially we want to hack postings
>>> to return associated values, but this seems quite complex.
>>>
>>> Thanks!
>>>
>>>   Why not store a (nonindexed) text field with some internal structure
>> (XML, JSON, CSV) that you can analyze after retrieving.  For example,
>>
>> <D>
>>   <A>
>>      <value>a1</value>
>>      <associated-values>
>>        ... whatever you want ...
>>      </associated-values>
>>   </A>
>> </D>
>>
>> If you use Lux (luxdb.org), which is XML query support on top of Lucene,
>> you can do this all automatically, and retrieve the results with a simple
>> query like:
>>
>> /D[A=a1]/associated-values
>>
>> plus if you want to pull out the values and manipulate them, you have
>> XQuery to do it with.
>>
>> -Mike
>>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message