lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis.gospodne...@gmail.com>
Subject Re: Dynamic collections in SolrCloud for log indexing
Date Thu, 27 Dec 2012 15:55:09 GMT
Added https://issues.apache.org/jira/browse/SOLR-4237

Otis
--
Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html



On Tue, Dec 25, 2012 at 9:13 PM, Mark Miller <markrmiller@gmail.com> wrote:

> I've been thinking about aliases for a while as well. Seem very handy and
> fairly easy to implement. So far there has just always been higher priority
> things (need to finish collection api responses this week…) but this is
> something I'd def help work on.
>
> - Mark
>
> On Dec 25, 2012, at 1:49 AM, Otis Gospodnetic <otis.gospodnetic@gmail.com>
> wrote:
>
> > Hi,
> >
> > Right, this is not really about routing in ElasticSearch-sense.
> > What's handy for indexing logs are index aliases.... which I thought I
> had
> > added to JIRA a while back, but it looks like I have not.
> > Index aliases would let you keep a "last 7 days" alias fixed while
> > underneath you push and pop an index every day without the client app
> > having to adjust.
> >
> > Otis
> > --
> > Performance Monitoring - http://sematext.com/spm/index.html
> > Search Analytics - http://sematext.com/search-analytics/index.html
> >
> >
> >
> > On Mon, Dec 24, 2012 at 4:30 AM, Per Steffensen <steff@designware.dk>
> wrote:
> >
> >> I believe it is a misunderstandig to use custom routing (or sharding as
> >> Erick calls it) for this kind of stuff. Custom routing is nice if you
> want
> >> to control which slice/shard under a collection a specific document
> goes to
> >> - mainly to be able to control that two (or more) documents are indexed
> on
> >> the same slice/shard, but also just to be able to control on which
> >> slice/shard a specific document is indexed. Knowing/controlling this
> kind
> >> of stuff can be used for a lot of nice purposes. But you dont want to
> move
> >> slices/shards around among collection or delete/add slices from/to a
> >> collection - unless its for elasticity reasons.
> >>
> >> I think you should fill a collection every week/month and just keep
> those
> >> collections as is. Instead of ending up with a big "historic" collection
> >> containing many slices/shards/cores (one for each historic week/month),
> you
> >> will end up with many historic collections (one for each historic
> >> week/month). Searching historic data you will have to cross-search those
> >> historic collections, but that is no problem at all. If Solr Cloud is
> made
> >> at it is supposed to be made (and I believe it is) it shouldnt require
> more
> >> resouces or be harder in any way to cross-search X slices across many
> >> collections, than it is to cross-search X slices under the same
> collection.
> >>
> >> Besides that see my answer for topic "Will SolrCloud always slice by ID
> >> hash?" a few days back.
> >>
> >> Regards, Per Steffensen
> >>
> >>
> >> On 12/24/12 1:07 AM, Erick Erickson wrote:
> >>
> >>> I think this is one of the primary use-cases for custom sharding. Solr
> 4.0
> >>> doesn't really lend itself to this scenario, but I _believe_ that the
> >>> patch
> >>> for custom sharding has been committed...
> >>>
> >>> That said, I'm not quite sure how you drop off the old shard if you
> don't
> >>> need to keep old data. I'd guess it's possible, but haven't implemented
> >>> anything like that myself.
> >>>
> >>> FWIW,
> >>> Erick
> >>>
> >>>
> >>> On Fri, Dec 21, 2012 at 12:17 PM, Upayavira <uv@odoko.co.uk> wrote:
> >>>
> >>> I'm working on a system for indexing logs. We're probably looking at
> >>>> filling one core every month.
> >>>>
> >>>> We'll maintain a short term index containing the last 7 days - that
> one
> >>>> is easy to handle.
> >>>>
> >>>> For the longer term stuff, we'd like to maintain a collection that
> will
> >>>> query across all the historic data, but that means every month we need
> >>>> to add another core to an existing collection, which as I understand
> it
> >>>> in 4.0 is not possible.
> >>>>
> >>>> How do people handle this sort of situation where you have rolling new
> >>>> content arriving? I'm sure I've heard people using SolrCloud for this
> >>>> sort of thing.
> >>>>
> >>>> Given it is logs, distributed IDF has no real bearing.
> >>>>
> >>>> Upayavira
> >>>>
> >>>>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message