lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kojo <rbsnk...@gmail.com>
Subject Re: docValues
Date Fri, 24 Nov 2017 19:46:30 GMT
Erick,
thanks for explaining the memory aspects.

Regarding the end user perspective, our intention is to provide a first
layer of filtering, where data will be rolled up in some buckets and be
displayed in charts and tables.
When I told about provide access to "full" documents, it was not to display
on the web, but to allow the researcher to download the data so he can dive
into the data with his own tools (R, spss, whatever).

With this in mind, using /select handler is the only solution to get data
with fields other than docValues that I visualized.

Now that I have a little bit more clear that memory will not be hardly
affected if I use docValues, I will start to think about disk usage grow
and how much it impacts the infrastructure.

Thanks again,









2017-11-24 16:16 GMT-02:00 Erick Erickson <erickerickson@gmail.com>:

> Kojo:
>
> bq: My question is, isn´t it to
> expensive in terms of memory consumption to enable docValues on fields that
> I dont need to facet, search etc?
>
> Well, yes and no. The memory consumed is your OS memory space and a
> small bit of control structures on your Java heap. It's a bit scary
> that your _index_ size will increase significantly on disk, but your
> Java heap requirements won't be correspondingly large.
>
> But there's a bigger issue here. Streaming is built to handle very
> large result sets in a map/reduce style form, i.e. subdivide the work
> amongst lots of nodes. If you want to return _all_ the records to the
> user along with description information and the like, what are they
> going to do with them? 10,000,000 rows (small by some streaming
> operations standards) is far too many to, say, display in a browser.
> And it's an anti-pattern to ask for, say, 10,000,000 rows with the
> select handler.
>
> You can page through these results, but it'll take a long time. So
> basically my question is whether this capability is useful enough to
> spend time on. If it is and you are going to return lots of rows
> consider paging through with cursorMark capabilities, see:
> https://lucidworks.com/2013/12/12/coming-soon-to-solr-
> efficient-cursor-based-iteration-of-large-result-sets/
>
> Best,
> Erick
>
> On Fri, Nov 24, 2017 at 9:38 AM, Kojo <rbsnkjmr@gmail.com> wrote:
> > I Think that I found the solution. After analysis, change from /export
> > request handler to /select request handler in order to obtain other
> fields.
> > I will try that.
> >
> >
> >
> > 2017-11-24 15:15 GMT-02:00 Kojo <rbsnkjmr@gmail.com>:
> >
> >> Thank you very much for your answer, Shawn.
> >>
> >> That is it, I was looking for another way to include fields non
> docValues
> >> to the filtered result documents.
> >> I can enable docValues to other fields and reindex all if necessary. I
> >> will tell you about the use case, because I am not sure  that I am on
> the
> >> right track.
> >>
> >> As I said before, I am using Streaming Expressions to deal with
> different
> >> collections. Up to this moment, it is decided that we will use this
> >> approach.
> >>
> >> The goal is to provide our users a web interface where they can make
> some
> >> queries. The backend will get Solr data using the Streaming Expressions
> >> rest api and will return rolled up data to the frontend, which will
> display
> >> some charts and aggregated data.
> >> After that, the end user may want to have data used to generate this
> >> aggregated information (not all fields of the filtered documents, but
> the
> >> fields used to aggregate information), combined with some other fields
> >> (title, description of document for example) which are not docValues. As
> >> you said I need to add docValues to then. My question is, isn´t it to
> >> expensive in terms of memory consumption to enable docValues on fields
> that
> >> I dont need to facet, search etc?
> >>
> >> I think that to reconstruct a standard query that achieves the results
> >> from a complex Streaming Expression is not simple. This is why I want to
> >> use the same query used to make analysis, to return full data via export
> >> handler.
> >>
> >> I am sorry if this is so much confusing.
> >>
> >> Thank you,
> >>
> >>
> >>
> >>
> >> 2017-11-24 12:36 GMT-02:00 Shawn Heisey <apache@elyograg.org>:
> >>
> >>> On 11/23/2017 1:51 PM, Kojo wrote:
> >>>
> >>>> I am working on Solr to develop a toll to make analysis. I am using
> >>>> search
> >>>> function of Streaming Expressions, which requires a field to be
> indexed
> >>>> with docValues enabled, so I can get it.
> >>>>
> >>>> Suppose that after someone finishes the analysis, and would like to
> get
> >>>> other fields of the resultset that are not docValues enabled. How can
> it
> >>>> be
> >>>> done?
> >>>>
> >>>
> >>> We did get this message, but it's confusing as to exactly what you're
> >>> asking, which is why nobody responded.
> >>>
> >>> If you're saying that this theoretical person wants to use another
> field
> >>> with the streaming expression analysis you have provided, and that
> field
> >>> does not have docValues, then you'll need to add docValues to the
> field and
> >>> completely reindex.
> >>>
> >>> If you're asking something else, then you're going to need to provide
> >>> more details so we can actually know what you want to have happen.
> >>>
> >>> Thanks,
> >>> Shawn
> >>>
> >>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message