lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zheng Lin Edwin Yeo <edwinye...@gmail.com>
Subject Re: List all Collections together with number of records
Date Mon, 08 Jun 2015 12:39:58 GMT
We're thinking of writing a custom request handler to do that, although the
handler will also query all the collections at the backend.

Will this lead to a faster response speed for the user?

Regards,
Edwin


On 8 June 2015 at 00:06, Erick Erickson <erickerickson@gmail.com> wrote:

> bq: we still need those information to be stored in a separate collection
> for security reasons.
>
> Not necessarily. I've seen lots of installations where "auth tokens" are
> embedded in the document (say groups that can see this doc). Then
> the front-end simply attaches &fq=auth_field:(groups each user belongs to)
> to every query to restrict access.
>
> That said, some organizations aren't comfortable with this and demand
> separate collections, in which case you're stuck.
>
> You've defined an architecture though, and one of the consequences
> of that is if you have many collections, you'll have to fire off many
> queries (perhaps in parallel, but still). There's no magic to get around
> that. And it really doesn't matter, because in what you've described
> what has to happen is one query has to be fired to each collection.
> It doesn't matter whether Solr does that for you or you spawn a bunch
> of threads on the client, the same work has to happen somewhere.
>
> You also have to figure out how to present the results to the user,
> if it's simple count you're OK. But scores will _not_ be comparable
> across the various collections so the presentation will be challenging.
>
> Best,
> Erick
>
> On Sun, Jun 7, 2015 at 6:29 AM, Zheng Lin Edwin Yeo
> <edwinyeozl@gmail.com> wrote:
> > The reasons we want to have different collections is that each of the
> > collections have different fields, and that some collections will contain
> > information that are more sensitive than others.
> >
> > As such, we may need to restrict access to certain collections for some
> > users. Although the restriction will be done on the front end client
> side,
> > but we still need those information to be stored in a separate collection
> > for security reasons..
> >
> > Regards,
> > Edwin
> >
> >
> > On 7 June 2015 at 12:23, Erick Erickson <erickerickson@gmail.com> wrote:
> >
> >> bq: Yup this information will need to be collected each time the user
> >> search
> >> for a query, as we want to show the number of records that matches the
> >> search query in each of the collections.
> >>
> >> You're looking at something akin to "federated search". About all you
> can
> >> do is send out parallel queries to each collection.
> >>
> >> This is an "interesting" requirement, and I really question whether
> it's a
> >> wise
> >> thing to insist on. I'd really think about going back to the design.
> >> For instance,
> >> could you consolidate all these collections into a single one, with
> perhaps
> >> a collection_id? Then the problem is relatively simple, use field
> >> collapsing
> >> (aka "grouping").
> >>
> >> Best,
> >> Erick
> >>
> >> On Sat, Jun 6, 2015 at 6:40 PM, Zheng Lin Edwin Yeo
> >> <edwinyeozl@gmail.com> wrote:
> >> > Yup this information will need to be collected each time the user
> search
> >> > for a query, as we want to show the number of records that matches the
> >> > search query in each of the collections.
> >> >
> >> > Currently I only have 6 collections, but it could increase to
> hundreds of
> >> > collections in the future. So I'm worried that it could slow down the
> >> > system a lot if we have to pass hundreds of queries for each search
> >> request.
> >> >
> >> > Regards,
> >> > Edwin
> >> >
> >> >
> >> > On 5 June 2015 at 21:00, Upayavira <uv@odoko.co.uk> wrote:
> >> >
> >> >> I'm not so sure this is as bad as it sounds. When your collection is
> >> >> sharded, no single node knows about the documents in other
> shards/nodes,
> >> >> so to find the total number, a query will need to go to every node.
> >> >>
> >> >> Trying to work out something to do a single request to every node,
> >> >> combine their collection statistics and aggregate them into a single
> >> >> result sounds very complicated, and likely overkill.
> >> >>
> >> >> Are you needing to collect this information often? Do you have a lot
> of
> >> >> collections?
> >> >>
> >> >> Upayavira
> >> >>
> >> >>
> >> >> On Fri, Jun 5, 2015, at 06:29 AM, Zheng Lin Edwin Yeo wrote:
> >> >> > I'm trying to write a SolrJ program in Java to read and consolidate
> >> all
> >> >> > the
> >> >> > information into a JSON file, The client will just need to call
> this
> >> >> > SolrJ
> >> >> > program and read this JSON file to get the details. But the problem
> >> is we
> >> >> > are still querying the Solr once for each collection, just that
> this
> >> time
> >> >> > it is done in the SolrJ program in a for-loop, while previously
> it's
> >> done
> >> >> > on the client side. Not sure will this lead to performance
> >> improvement?
> >> >> >
> >> >> > For your suggestion on spawning a bunch of threads, does it mean
> the
> >> same
> >> >> > thing as I did?
> >> >> >
> >> >> > Regards,
> >> >> > Edwin
> >> >> >
> >> >> >
> >> >> > On 5 June 2015 at 12:03, Erick Erickson <erickerickson@gmail.com>
> >> wrote:
> >> >> >
> >> >> > > Have you considered spawning a bunch of threads, one per
> collection
> >> >> > > and having them all run in parallel?
> >> >> > >
> >> >> > > Best,
> >> >> > > Erick
> >> >> > >
> >> >> > > On Thu, Jun 4, 2015 at 4:52 PM, Zheng Lin Edwin Yeo
> >> >> > > <edwinyeozl@gmail.com> wrote:
> >> >> > > > The reason we wanted to do a single call is to improve
on the
> >> >> > > performance,
> >> >> > > > as our application requires to list the total number
of
> records in
> >> >> each
> >> >> > > of
> >> >> > > > the collections, and the number of records that matches
the
> query
> >> >> each of
> >> >> > > > the collections.
> >> >> > > >
> >> >> > > > Currently we are querying each collection one by one
to
> retrieve
> >> the
> >> >> > > > numFound value and display them, but this can slow down
the
> system
> >> >> > > > significantly when the number of collection grows. So
we are
> >> >> thinking of
> >> >> > > > ways to improve the speed in this area.
> >> >> > > >
> >> >> > > > Any other methods which you can suggest that we can
do to
> overcome
> >> >> this
> >> >> > > > speed problem?
> >> >> > > >
> >> >> > > > Regards,
> >> >> > > > Edwin
> >> >> > > > On 5 Jun 2015 00:16, "Erick Erickson" <erickerickson@gmail.com
> >
> >> >> wrote:
> >> >> > > >
> >> >> > > >> Not in a single call that I know of. These are really
> orthogonal
> >> >> > > >> concepts. Getting the cluster status merely involves
reading
> the
> >> >> > > >> Zookeeper clusterstate whereas getting the total
number of
> docs
> >> for
> >> >> > > >> each would involve querying each collection, i.e.
going to the
> >> Solr
> >> >> > > >> nodes themselves. I'd guess it's unlikely to be
combined.
> >> >> > > >>
> >> >> > > >> Best,
> >> >> > > >> Erick
> >> >> > > >>
> >> >> > > >> On Thu, Jun 4, 2015 at 7:47 AM, Zheng Lin Edwin
Yeo
> >> >> > > >> <edwinyeozl@gmail.com> wrote:
> >> >> > > >> > Hi,
> >> >> > > >> >
> >> >> > > >> > Would like to check, are we able to use the
Collection API
> or
> >> any
> >> >> > > other
> >> >> > > >> > method to list all the collections in the cluster
together
> with
> >> >> the
> >> >> > > >> number
> >> >> > > >> > of records in each of the collections in one
output?
> >> >> > > >> >
> >> >> > > >> > Currently, I only know of the List Collections
> >> >> > > >> > /admin/collections?action=LIST. However, this
only list the
> >> names
> >> >> of
> >> >> > > the
> >> >> > > >> > collections that are in the cluster, but not
the number of
> >> >> records.
> >> >> > > >> >
> >> >> > > >> > Is there a way to show the number of records
in each of the
> >> >> > > collections
> >> >> > > >> as
> >> >> > > >> > well?
> >> >> > > >> >
> >> >> > > >> > Regards,
> >> >> > > >> > Edwin
> >> >> > > >>
> >> >> > >
> >> >>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message