lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joel Bernstein <joels...@gmail.com>
Subject Re: Question regarding the SQL interface
Date Fri, 20 May 2016 01:35:25 GMT
I just reviewed the testPredicate method in the test cases:

https://github.com/apache/lucene-solr/blob/master/solr/core/src/test/org/apache/solr/handler/TestSQLHandler.java

All the test cases in testPredicate() are formatted like regular SQL. I
don't think the way things are designed you could make a valid query that
combined fields like a typical search.

You will have to make separate calls for each aggregation. To get faceting
performance you would use the facet aggregationMode.

The SQL predicate gets rewritten to a valid Solr query, and then gets
handled by the QueryComponent, like a regular query. So any field
definitions should work fine. But scoring is only performed for queries
with a LIMIT clause.

With the cardinality issue you'll need to experiment a little to see where
the facet mode starts to slow down and lose accuracy. In the future we'll
be moving to streaming facets so cardinality won't be an issue even in
facet mode. So in future releases MapReduce will only be used to handle
distributed joins.

In facet mode it uses the JSON facet API. It scales reasonable well, but I
don't believe it provides fully accurate counts because it doesn't do the
refinement step. But in my testing I didn't push it far enough to where it
fell over. But it eventually will fall over because it's keeping all the
aggregation buckets in memory at once. MapReduce mode is always accurate no
matter how the high cardinality gets.





Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, May 19, 2016 at 4:05 PM, Vachon, Jean-Sébastien <
jvachon@cebglobal.com> wrote:

> Hi all,
>
> I am planning into migrating our application from SolrJ to the SQL
> interface and I have some questions regarding some of Solr features…
>
>
>   *   How can we specify multiple search fields on a keyword. Do we have
> to handle everything by ourselves like in regular SQL?
>
> SELECT x,y,z FROM collection1 WHERE title=‘abc” OR description=‘abc’
>
> Is there a special syntax to allow to search into multiple fields at once?
>
>
>   *   Do you have to generate separate requests to get faceting
> information? Would translating the following query into its SQL equivalent
> require 3 queries?
>
> /select?q=title:abc&facet=true&facet.field=xyz&facet.field=def
>
>
>   *   If our schema contains a fieldType using a custom similarity class…
> will the SQL interface honour that mapping?
>
>   *   The documentation about Streaming Expressions and SQL interface are
> referring to terms like “high cardinality” and “very high cardinality”.
> What do they exactly mean? Are we talking about hundreds, thousands or
> millions of different values? Does this depend on other aspect of the
> collection like the size of the documents?
>
> Thanks for your input and guidance
>
>
>
> CEB Canada Inc. Registration No: 1781071. Registered office: 199 Bay
> Street Commerce Court West, # 2800, Toronto, Ontario, Canada, M5L 1AP.
>
>
>
> This e-mail and/or its attachments are intended only for the use of the
> addressee(s) and may contain confidential and legally privileged
> information belonging to CEB and/or its subsidiaries, including SHL. If you
> have received this e-mail in error, please notify the sender and
> immediately, destroy all copies of this email and its attachments. The
> publication, copying, in whole or in part, or use or dissemination in any
> other way of this e-mail and attachments by anyone other than the intended
> person(s) is prohibited.
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message