lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: field:* vs field:[* TO *]
Date Thu, 18 Apr 2019 07:13:12 GMT
Hi,

> I was pointed to Lucene from the Solr list. I am wondering if the
> performance of the below two queries is expected to be quite different and
> would they return the same set of results?
> 
> field:*
> field:[* TO *]

>From the Lucene side they are identical, but it depends on the implementation in Solr's
query parser. They both iterate all terms in the field (if it’s a string field).

> The use case I am trying to optimize is returning all documents that
> contain any value for a given field, and I've noticed the queries to be
> quite slow especially for fields that have a large number of distinct
> values.

Unfortunately Solr has no optimized support for that. There are 2 issues open:

https://issues.apache.org/jira/browse/SOLR-11437
https://issues.apache.org/jira/browse/SOLR-12488

This is the same way how Elasticsearch is doing this today. I can look into implementing this
(it's on my TODO list of issues).

In the meantime there is another efficient way to do this, but it requires you to index an
additional field. The nice thing with that one is, that it does not require the field properties
to be correct (e.g, it does not need to differentiate between different field types, if there
are docuvalues or norms). The idea came also from Elasticsearch, which had this since the
first day. Elasticsearch indexed (until they switched to the above approach using DocValues/NormsExistsQuery)
an hidden internal field (invisible to the user) that was powering the exists query. This
field was basically (in Solr speak) a "multivalued, non-tokenized, string" field. This field
just contains the field names of all fields that have a value. E.g., if you have a document:

{ "foo": "hello", "bar": 20, "text": "all fine" }

Your indexing code would extend this document to add an additional field (Solr won't do this
automatically like Elasticsearch):

{ "foo": "hello", "bar": 20, "text": "all fine", "fields ": ["foo", "bar", "text"] }

Then you can query: &fq=fields:bar to filter all field that have a value in "bar".

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message