lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Della Bitta <michael.della.bi...@appinions.com>
Subject Re: Applying Tokenizers and Filters to CopyFields
Date Thu, 26 Mar 2015 15:24:48 GMT
Glad you are sorted out!

Michael Della Bitta

Senior Software Engineer

o: +1 646 532 3062

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions <https://twitter.com/Appinions> | g+:
plus.google.com/appinions
<https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts>
w: appinions.com <http://www.appinions.com/>

On Thu, Mar 26, 2015 at 10:09 AM, Martin Wunderlich <martin_wu@gmx.net>
wrote:

> Thanks so much, Erick and Michael, for all the additional explanation.
> The crucial information in the end turned out to be the one about the
> Default Search Field („df“). In solrconfig.xml this parameter was to point
> to the original text, which is why the expanded queries didn’t work. When I
> set the df parameter to one of the fields with the expanded text, the
> search works fine. I have also removed the copyField declarations.
>
> It’s all working as expected now. Thanks again for the help.
>
> Cheers,
>
> Martin
>
>
>
>
> > Am 25.03.2015 um 23:43 schrieb Erick Erickson <erickerickson@gmail.com>:
> >
> > Martin:
> > Perhaps this would help
> >
> > indexed=true, stored=true
> > field can be searched. The raw input (not analyzed in any way) can be
> > shown to the user in the results list.
> >
> > indexed=true, stored=false
> > field can be searched. However, the field can't be returned in the
> > results list with the document.
> >
> > indexed=false, stored=true
> > The field cannot be searched, but the contents can be returned in the
> > results list with the document. There are some use-cases where this is
> > desirable behavior.
> >
> > indexed=false, stored=false
> > The entire field is thrown out, it's just as if you didn't send the
> > field to be indexed at all.
> >
> > And one other thing, the copyField gets the _raw_ data not the
> > analyzed data. Let's say you have two fields, "src" and "dst".
> > copying from src to dest in schema.xml is identical to
> > <add>
> >  <doc>
> >    <field name=src>original text</field>
> >   <field name=dst>original text</field>
> > </doc>
> > </add>
> >
> > that is, copyfield directives are not chained.
> >
> > Also, watch out for your query syntax. Michael's comments are spot-on,
> > I'd just add this:
> >
> >
> http://localhost:8983/solr/windex/select?q=Sprache&fq=original&wt=json&indent=true
> >
> > is kind of odd. Let's assume you mean "qf" rather than "fq". That
> > _only_ matters if your query parser is "edismax", it'll be ignored in
> > this case I believe.
> >
> > You'd want something like
> > q=src:Sprache
> > or
> > q=dst:Sprache
> > or even
> > http://localhost:8983/solr/windex/select?q=Sprache&df=src
> > http://localhost:8983/solr/windex/select?q=Sprache&df=dst
> >
> > where "df" is "default field" and the search is applied against that
> > field in the absence of a field qualification like my first two
> > examples.
> >
> > Best,
> > Erick
> >
> > On Wed, Mar 25, 2015 at 2:52 PM, Michael Della Bitta
> > <michael.della.bitta@appinions.com> wrote:
> >> I agree the terminology is possibly a little confusing.
> >>
> >> Stored refers to values that are stored verbatim. You can retrieve them
> >> verbatim. Analysis does not affect stored values.
> >> Indexed values are tokenized/transformed and stored inverted. You can't
> >> recover the literal analyzed version (at least, not easily).
> >>
> >> If what you really want is to store and retrieve case folded versions of
> >> your data as well as the original, you need to use something like a
> >> UpdateRequestProcessor, which I personally am less familiar with.
> >>
> >>
> >> On Wed, Mar 25, 2015 at 5:28 PM, Martin Wunderlich <martin_wu@gmx.net>
> >> wrote:
> >>
> >>> So, the pre-processing steps are applied under <analyzer type=„index“>.
> >>> And this point is not quite clear to me: Assuming that I have a simple
> >>> case-folding step applied to the target of the copyField: How or where
> are
> >>> the lower-case tokens stored, if the text isn’t added to the index?
> How is
> >>> the query supposed to retrieve the lower-case version?
> >>> (sorry, if this sounds like a naive question, but I have a feeling
> that I
> >>> am missing something really basic here).
> >>>
> >>
> >>
> >> Michael Della Bitta
> >>
> >> Senior Software Engineer
> >>
> >> o: +1 646 532 3062
> >>
> >> appinions inc.
> >>
> >> “The Science of Influence Marketing”
> >>
> >> 18 East 41st Street
> >>
> >> New York, NY 10017
> >>
> >> t: @appinions <https://twitter.com/Appinions> | g+:
> >> plus.google.com/appinions
> >> <
> https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
> >
> >> w: appinions.com <http://www.appinions.com/>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message