lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saïd Radhouani <r.steve....@gmail.com>
Subject Re: Setting many properties for a multivalued field. Schema.xml ? External file?
Date Sat, 26 Jun 2010 12:25:57 GMT
Thanks Geert-Jan, this is indeed very helpful.

The delimiters I gave were just for the need of the example. I will use non frequent delimiter.

Cheers,
-Saïd

On Jun 26, 2010, at 1:53 PM, Geert-Jan Brits wrote:

>> If I understand your suggestion correctly, you said that there's NO need to
> have many Dynamic Fields; instead, we can have one definitive field name,
> which can store a long string (concatenation of >information about tens of
> pictures), e.g., using "-" and "%" delimiters:
> pic_url_value1-pic_caption_value1-pic_description_value1%pic_url_value2-pic_caption_value2-pic_description_value2%...
>> I don't clearly see the reason of doing this. Is there a gain in terms of
> performance? Or does this make programming on the client-side easier? Or
> something else?
> 
> I think you should ask the exact opposite question. If you don't do anything
> with these fields which Solr is particularly good at (searching / filtering
> / faceting/ sorting) why go through the trouble of creating dynamic fields?
> (more fields is more overhead cost/ tracking cost no matter how you look at
> it)
> 
> Moreover, indeed from a client-view it's easier the way I suggested, since
> otherwise you:
> - would have to ask (through SolrJ) to include all dynamic fields to be
> returned in the Fl-field (
> http://wiki.apache.org/solr/CommonQueryParameters#fl). This is difficult,
> because a-priori you don't know how many dynamic-fields to query. So in
> other words you can't just ask SOlr (though SolrJ lik you asked) to just
> return all dynamic fields beginning with pic_*. (afaik)
> - your client iterate code (looping the pics) is a bit more involved.
> 
> HTH, Cheers,
> 
> Geert-Jan
> 
> 2010/6/26 Saïd Radhouani <r.steve.pdx@gmail.com>
> 
>> Thanks Geert-Jan for the detailed answer. Actually, I don't search at all
>> on these fields. I'm only filtering (w/ vs w/ pic) and sorting (based on the
>> number of pictures). Thus, your suggestion of adding an extra field NrOfPics
>> [0,N] would be the best solution.
>> 
>> Regarding the other suggestion:
>> 
>>> If you dont need search at all on these fields, the best thing imo is to
>>> store all pic-related info of all pics together by concatenating them
>> with
>>> some delimiter which you know how to seperate at the client-side.
>>> That or just store it in an external RDB since solr is just sitting on
>> the
>>> data and not doing anything intelligent with it.
>> 
>> If I understand your suggestion correctly, you said that there's NO need to
>> have many Dynamic Fields; instead, we can have one definitive field name,
>> which can store a long string (concatenation of information about tens of
>> pictures), e.g., using "-" and "%" delimiters:
>> pic_url_value1-pic_caption_value1-pic_description_value1%pic_url_value2-pic_caption_value2-pic_description_value2%...
>> 
>> I don't clearly see the reason of doing this. Is there a gain in terms of
>> performance? Or does this make programming on the client-side easier? Or
>> something else?
>> 
>> 
>> My other question was: in case we use Dynamic Fields, is there a
>> documentation about using SolrJ for this purpose?
>> 
>> Thanks
>> -Saïd
>> 
>> On Jun 26, 2010, at 12:29 PM, Geert-Jan Brits wrote:
>> 
>>> You can treat dynamic fields like any other field, so you can facet,
>> sort,
>>> filter, etc on these fields (afaik)
>>> 
>>> I believe the confusion arises that sometimes the usecase for dynamic
>> fields
>>> seems to be ill-understood, i.e: to be able to use them to do some kind
>> of
>>> wildcard search, e.g: search for a value in any of the dynamic fields at
>>> once like pic_url_*. This however is NOT possible.
>>> 
>>> As far as your question goes:
>>> 
>>>> Now, I'm trying to make facets on pictures: display doc w/ pic vs. doc
>> w/o
>>> pic
>>>> To the best of my knowledge, everyone is saying that faceting cannot be
>>> done on dynamic fields (only on definitive field names). Thus, I tried
>> the
>>> following and it's working: I assume that the stored > >pictures have a
>>> sequential number (_1, _2, etc.), i.e., if pic_url_1 exists in the index,
>> it
>>> means that the underlying doc has at least one picture:
>>>> ...&facet=on&facet.field=pic_url_1&facet.mincount=1&fq=pic_url_1:*
>>>> While this is working fine, I'm wondering whether there's a cleaner way
>> to
>>> do the same thing without assuming that pictures have a sequential
>> number.
>>> 
>>> If I understand your question correctly: faceting on docs with and
>> without
>>> pics could ofcourse by done like you mention, however it  would be more
>>> efficient to have an extra field defined:  hasAtLestOnePic with values (0
>> |
>>> 1)
>>> use that to facet / filter on.
>>> 
>>> you can extend this to NrOfPics [0,N)  if you need to filter / facet on
>> docs
>>> with a certain nr of pics.
>>> 
>>> also I wondered what else you wanted to do with this pic-related info. Do
>>> you want to search on pic-description / pic-caption for instance? In that
>>> case the dynamic-fields approach may not be what you want: how would you
>>> know in which dynamic-field to search for a particular term? Would if be
>>> pic_desc_1 , or pic_desc_x?  Of couse you could OR over all dynamic
>> fields,
>>> but you need to know how many pics an upperbound for the nr of pics and
>> it
>>> really doesn't feel right, to me at least.
>>> 
>>> If you need search on pic_description for instance, but don't mind what
>> pic
>>> matches, you could create a single field pic_description and put in the
>>> concat of all pic-descriptions and search on that, or just make it a a
>>> multi-valued field.
>>> 
>>> If you dont need search at all on these fields, the best thing imo is to
>>> store all pic-related info of all pics together by concatenating them
>> with
>>> some delimiter which you know how to seperate at the client-side.
>>> That or just store it in an external RDB since solr is just sitting on
>> the
>>> data and not doing anything intelligent with it.
>>> 
>>> I assume btw that you don't want to sort/ facet on pic-desc /
>> pic_caption/
>>> pic_url either ( I have a hard time thinking of a useful usecase for
>> that)
>>> 
>>> HTH,
>>> 
>>> Geert-Jan
>>> 
>>> 
>>> 
>>> 2010/6/26 Saïd Radhouani <r.steve.pdx@gmail.com>
>>> 
>>>> Thanks so much Otis. This is working great.
>>>> 
>>>> Now, I'm trying to make facets on pictures: display doc w/ pic vs. doc
>> w/o
>>>> pic
>>>> 
>>>> To the best of my knowledge, everyone is saying that faceting cannot be
>>>> done on dynamic fields (only on definitive field names). Thus, I tried
>> the
>>>> following and it's working: I assume that the stored pictures have a
>>>> sequential number (_1, _2, etc.), i.e., if pic_url_1 exists in the
>> index, it
>>>> means that the underlying doc has at least one picture:
>>>> 
>>>> ...&facet=on&facet.field=pic_url_1&facet.mincount=1&fq=pic_url_1:*
>>>> 
>>>> While this is working fine, I'm wondering whether there's a cleaner way
>> to
>>>> do the same thing without assuming that pictures have a sequential
>> number.
>>>> 
>>>> Also, do you have any documentation about handling Dynamic Fields using
>>>> SolrJ. So far, I found only issues about that on JIRA, but no
>> documentation.
>>>> 
>>>> Thanks a lot.
>>>> 
>>>> -Saïd
>>>> 
>>>> On Jun 26, 2010, at 1:18 AM, Otis Gospodnetic wrote:
>>>> 
>>>>> Saïd,
>>>>> 
>>>>> Dynamic fields could help here, for example imagine a doc with:
>>>>> id
>>>>> pic_url_*
>>>>> pic_caption_*
>>>>> pic_description_*
>>>>> 
>>>>> See http://wiki.apache.org/solr/SchemaXml#Dynamic_fields
>>>>> 
>>>>> So, for you:
>>>>> 
>>>>> <dynamicField name="pic_url_*"  type="string"  indexed="true"
>>>> stored="true"/>
>>>>> <dynamicField name="pic_caption_*"  type="text"  indexed="true"
>>>> stored="true"/>
>>>>> <dynamicField name="pic_description_*"  type="text"  indexed="true"
>>>> stored="true"/>
>>>>> 
>>>>> Then you can add docs with unlimited number of
>>>> pic_(url|caption|description)_* fields, e.g.
>>>>> 
>>>>> id
>>>>> pic_url_1
>>>>> pic_caption_1
>>>>> pic_description_1
>>>>> 
>>>>> id
>>>>> pic_url_2
>>>>> pic_caption_2
>>>>> pic_description_2
>>>>> 
>>>>> 
>>>>> Otis
>>>>> ----
>>>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>>>>> Lucene ecosystem search :: http://search-lucene.com/
>>>>> 
>>>>> 
>>>>> 
>>>>> ----- Original Message ----
>>>>>> From: Saïd Radhouani <r.steve.pdx@gmail.com>
>>>>>> To: solr-user@lucene.apache.org
>>>>>> Sent: Fri, June 25, 2010 6:01:13 PM
>>>>>> Subject: Setting many properties for a multivalued field. Schema.xml
?
>>>> External file?
>>>>>> 
>>>>>> Hi,
>>>>> 
>>>>> I'm trying to index data containing a multivalued field "picture",
>>>>>> that has three properties: url, caption and description:
>>>>> 
>>>>> <picture/>
>>>>>> 
>>>>>  <url/>
>>>>> 
>>>>>> <caption/>
>>>>>  <description/>
>>>>> 
>>>>> Thus, each
>>>>>> indexed document might have many pictures, each of them has a url,
a
>>>> caption,
>>>>>> and a description.
>>>>> 
>>>>> I wonder wether it's possible to store this data using
>>>>>> only schema.xml. I couldn't figure it out so far. Instead, I'm
>> thinking
>>>> of using
>>>>>> an external file to sore the properties of each picture, but I haven't
>>>> tried yet
>>>>>> this solution, waiting for your suggestions...
>>>>> 
>>>>> Thanks,
>>>>> -Saïd
>>>> 
>>>> 
>> 
>> 


Mime
View raw message