lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: docValues
Date Fri, 24 Nov 2017 18:16:17 GMT
Kojo:

bq: My question is, isn´t it to
expensive in terms of memory consumption to enable docValues on fields that
I dont need to facet, search etc?

Well, yes and no. The memory consumed is your OS memory space and a
small bit of control structures on your Java heap. It's a bit scary
that your _index_ size will increase significantly on disk, but your
Java heap requirements won't be correspondingly large.

But there's a bigger issue here. Streaming is built to handle very
large result sets in a map/reduce style form, i.e. subdivide the work
amongst lots of nodes. If you want to return _all_ the records to the
user along with description information and the like, what are they
going to do with them? 10,000,000 rows (small by some streaming
operations standards) is far too many to, say, display in a browser.
And it's an anti-pattern to ask for, say, 10,000,000 rows with the
select handler.

You can page through these results, but it'll take a long time. So
basically my question is whether this capability is useful enough to
spend time on. If it is and you are going to return lots of rows
consider paging through with cursorMark capabilities, see:
https://lucidworks.com/2013/12/12/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/

Best,
Erick

On Fri, Nov 24, 2017 at 9:38 AM, Kojo <rbsnkjmr@gmail.com> wrote:
> I Think that I found the solution. After analysis, change from /export
> request handler to /select request handler in order to obtain other fields.
> I will try that.
>
>
>
> 2017-11-24 15:15 GMT-02:00 Kojo <rbsnkjmr@gmail.com>:
>
>> Thank you very much for your answer, Shawn.
>>
>> That is it, I was looking for another way to include fields non docValues
>> to the filtered result documents.
>> I can enable docValues to other fields and reindex all if necessary. I
>> will tell you about the use case, because I am not sure  that I am on the
>> right track.
>>
>> As I said before, I am using Streaming Expressions to deal with different
>> collections. Up to this moment, it is decided that we will use this
>> approach.
>>
>> The goal is to provide our users a web interface where they can make some
>> queries. The backend will get Solr data using the Streaming Expressions
>> rest api and will return rolled up data to the frontend, which will display
>> some charts and aggregated data.
>> After that, the end user may want to have data used to generate this
>> aggregated information (not all fields of the filtered documents, but the
>> fields used to aggregate information), combined with some other fields
>> (title, description of document for example) which are not docValues. As
>> you said I need to add docValues to then. My question is, isn´t it to
>> expensive in terms of memory consumption to enable docValues on fields that
>> I dont need to facet, search etc?
>>
>> I think that to reconstruct a standard query that achieves the results
>> from a complex Streaming Expression is not simple. This is why I want to
>> use the same query used to make analysis, to return full data via export
>> handler.
>>
>> I am sorry if this is so much confusing.
>>
>> Thank you,
>>
>>
>>
>>
>> 2017-11-24 12:36 GMT-02:00 Shawn Heisey <apache@elyograg.org>:
>>
>>> On 11/23/2017 1:51 PM, Kojo wrote:
>>>
>>>> I am working on Solr to develop a toll to make analysis. I am using
>>>> search
>>>> function of Streaming Expressions, which requires a field to be indexed
>>>> with docValues enabled, so I can get it.
>>>>
>>>> Suppose that after someone finishes the analysis, and would like to get
>>>> other fields of the resultset that are not docValues enabled. How can it
>>>> be
>>>> done?
>>>>
>>>
>>> We did get this message, but it's confusing as to exactly what you're
>>> asking, which is why nobody responded.
>>>
>>> If you're saying that this theoretical person wants to use another field
>>> with the streaming expression analysis you have provided, and that field
>>> does not have docValues, then you'll need to add docValues to the field and
>>> completely reindex.
>>>
>>> If you're asking something else, then you're going to need to provide
>>> more details so we can actually know what you want to have happen.
>>>
>>> Thanks,
>>> Shawn
>>>
>>
>>

Mime
View raw message