lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aurélien MAZOYER <aurelien.mazo...@gmail.com>
Subject Re: Running query against a single document
Date Thu, 27 Sep 2018 08:58:28 GMT
Hi Tom and Erick,
Thank you a lot for your answers.

@Tom : Yes, we have considered MemoryIndex. But as far as I understood, we
will have to create a MemoryIndex that contains 1 single document every
time we will want to test our query against a document. I think we'll have
to perform some tests to be sure that this is efficient.
@Erick :
We use this piece of code to run the highlighter directly on a TokenStream
created from a text string (fieldTextValue) :

QueryScorer queryScorer = new QueryScorer(luceneQuery);
TokenStream stream = TokenSources.getTokenStream(fieldName, fieldTextValue,
analyzer);
Highlighter highlighter = new Highlighter(new SimpleHTMLFormatter(),
queryScorer);
TextFragment[] frag = highlighter.getBestTextFragments(stream,
fieldTextValue, true, 1000);

It seems to work pretty well for some queries, but I am afraid it works on
a kind of per-token basis and doesn't consider the context (I mean the
adjacent terms) to detect if a term is involved in the match or not.
The lucene explainer can totally address our needs, but as far as I know
it, it is not very efficient in term of performance. We will test it as
well.
We can combine Tom's suggestion about using MemoryIndex with the documents
and then run the explainer on this index.

Aurelien and Andrey
Tchiota GMBH

Le ven. 21 sept. 2018, à 16 h 57, Erick Erickson <erickerickson@gmail.com>
a écrit :

> bq. We would like to know if there is a way to test a query against a
> document
> without creating an index. We were thinking that maybe we could use lucene
> highlighter component
> to achieve this,
>
> I don't really understand this at all. How are you using the
> highlighter component without creating an index? Custom code?
>
> But that aside, there are dozens, if not hundreds of examples of this
> in the Solr test code. You could write a Solr junit test, which
> is "just some Java code" and run that.
>
> To execute this within the test framework, you have two options:
> 1> from the top level "ant -Dtestcase=custom_test test", which takes a
> long time to run
> 2> from solr/core "ant -Dtestcase=custom_test test-nocompile". You
> have to have compiled your code of course for this to work.
>
> BTW, if you skip all that and just use a Solr instance, one very
> useful trick is to use &debug=true&debug.explainOther
> (https://lucene.apache.org/solr/guide/6_6/common-query-parameters.html).
> That will show you exactly how the doc was
> scored _whether or not_ it would have been returned by the primary query.
>
> Best,
> Erick
> On Fri, Sep 21, 2018 at 6:16 AM Tom Mortimer <tom@flax.co.uk> wrote:
> >
> > Hi,
> >
> > Have you considered using MemoryIndex
> > <
> https://lucene.apache.org/core/6_5_1/memory/org/apache/lucene/index/memory/MemoryIndex.html
> >
> > ?
> >
> > cheers,
> > Tom
> >
> >
> > tel +44 8700 118334 : mobile +44 7876 741014 : skype tommortimer
> >
> >
> > On Fri, 21 Sep 2018 at 13:58, Aurélien MAZOYER <
> aurelien.mazoyer@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > We would like to know if there is a way to test a query against a
> document
> > > without creating an index. We were thinking that maybe we could use
> lucene
> > > highlighter component
> > > to achieve this, but it seems it doesn't work as expected with complex
> > > queries.
> > > For example, we create a SpanQuery (+spanFirst(field:saint, 1)
> > > +spanNear([field:saint, field:quentin], 0, true)) and we tested it
> against
> > > two documents :
> > > D1={field=eglise saint quentin}
> > > D2={field=saint quentin deladadoupa}
> > > We expect to get these entries from the highlighter :
> > > D1 eglise saint quentin
> > > D2 <B>saint</B> <B>quentin</B> deladadoupa
> > > But we got
> > > eglise <B>saint</B> <B>quentin</B> for D1, which is
unexpected from our
> > > perspective because it doesn't match our SpanQuery.
> > > Do you have any ideas if this approach is correct or if we better use
> some
> > > other way to achieve this functionality.
> > > FYI we use Lucene 6.5.1.
> > >
> > > Thank you for your help,
> > >
> > > Regards,
> > >
> > > Aurelien and Andrey
> > > Tchiota GMBH
> > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message