lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Applying Tokenizers and Filters to CopyFields
Date Wed, 25 Mar 2015 22:43:10 GMT
Martin:
Perhaps this would help

indexed=true, stored=true
field can be searched. The raw input (not analyzed in any way) can be
shown to the user in the results list.

indexed=true, stored=false
field can be searched. However, the field can't be returned in the
results list with the document.

indexed=false, stored=true
The field cannot be searched, but the contents can be returned in the
results list with the document. There are some use-cases where this is
desirable behavior.

indexed=false, stored=false
The entire field is thrown out, it's just as if you didn't send the
field to be indexed at all.

And one other thing, the copyField gets the _raw_ data not the
analyzed data. Let's say you have two fields, "src" and "dst".
copying from src to dest in schema.xml is identical to
<add>
  <doc>
    <field name=src>original text</field>
   <field name=dst>original text</field>
</doc>
</add>

that is, copyfield directives are not chained.

Also, watch out for your query syntax. Michael's comments are spot-on,
I'd just add this:

http://localhost:8983/solr/windex/select?q=Sprache&fq=original&wt=json&indent=true

is kind of odd. Let's assume you mean "qf" rather than "fq". That
_only_ matters if your query parser is "edismax", it'll be ignored in
this case I believe.

You'd want something like
q=src:Sprache
or
q=dst:Sprache
or even
http://localhost:8983/solr/windex/select?q=Sprache&df=src
http://localhost:8983/solr/windex/select?q=Sprache&df=dst

where "df" is "default field" and the search is applied against that
field in the absence of a field qualification like my first two
examples.

Best,
Erick

On Wed, Mar 25, 2015 at 2:52 PM, Michael Della Bitta
<michael.della.bitta@appinions.com> wrote:
> I agree the terminology is possibly a little confusing.
>
> Stored refers to values that are stored verbatim. You can retrieve them
> verbatim. Analysis does not affect stored values.
> Indexed values are tokenized/transformed and stored inverted. You can't
> recover the literal analyzed version (at least, not easily).
>
> If what you really want is to store and retrieve case folded versions of
> your data as well as the original, you need to use something like a
> UpdateRequestProcessor, which I personally am less familiar with.
>
>
> On Wed, Mar 25, 2015 at 5:28 PM, Martin Wunderlich <martin_wu@gmx.net>
> wrote:
>
>> So, the pre-processing steps are applied under <analyzer type=„index“>.
>> And this point is not quite clear to me: Assuming that I have a simple
>> case-folding step applied to the target of the copyField: How or where are
>> the lower-case tokens stored, if the text isn’t added to the index? How is
>> the query supposed to retrieve the lower-case version?
>> (sorry, if this sounds like a naive question, but I have a feeling that I
>> am missing something really basic here).
>>
>
>
> Michael Della Bitta
>
> Senior Software Engineer
>
> o: +1 646 532 3062
>
> appinions inc.
>
> “The Science of Influence Marketing”
>
> 18 East 41st Street
>
> New York, NY 10017
>
> t: @appinions <https://twitter.com/Appinions> | g+:
> plus.google.com/appinions
> <https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts>
> w: appinions.com <http://www.appinions.com/>

Mime
View raw message