lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Per Steffensen <st...@designware.dk>
Subject Re: CoreAdmin STATUS performance
Date Thu, 10 Jan 2013 16:09:39 GMT
The collections are created dynamically. Not on update though. We use 
one collection per month and we have a timer-job running (every hour or 
so), which checks if all collections that need to exist actually does 
exist - if not it creates the collection(s). The rule is that the 
collection for "next month" has to exist as soon as we enter "current 
month", so the first time the timer job runs e.g. 1. July it will create 
the August-collection. We never get data with timestamp in the future. 
Therefore if the timer-job just gets to run once within every month we 
will always have needed collections ready.

We create collections using the new Collection API in Solr. Be used to 
manage creation of every single Shard/Replica/Core of the collections 
during the Core Admin API in Solr, but since an Collection API was 
introduced we decided that we better use that. In 4.0 it did not have 
the features we needed, which triggered SOLR-4114, SOLR-4120 and 
SOLR-4140 which will be available in 4.1. With those features we are now 
using the Collection API.

BTW, our timer-job also handles deletion of "old" collections. In our 
system you can configure how many historic month-collection you will 
keep before it is ok to delete them. Lets say that this is configured to 
3, as soon at it becomes 1. July the timer-job will delete the 
March-collection (the historic collections to keep will just have become 
April-, May- and June-collections). This way we will always have a least 
3 months of historic data, and last in a month close to 4 months of 
history. It does not matter that we have a little to much history, when 
we just do not go below the lower limit on lenght of historic data. We 
also use the new Collection API for deletion.

Regards, Per Steffensen

On 1/10/13 3:04 PM, Shahar Davidson wrote:
> Hi Per,
>
> Thanks for your reply!
>
> That's a very interesting approach.
>
> In your system, how are the collections created? In other words, are the collections
created dynamically upon an update (for example, per new day)?
> If they are created dynamically, who handles their creation (client/server)  and how
is it done?
>
> I'd love to hear more about it!
>
> Appreciate your help,
>
> Shahar.
>
> -----Original Message-----
> From: Per Steffensen [mailto:steff@designware.dk]
> Sent: Thursday, January 10, 2013 1:23 PM
> To: solr-user@lucene.apache.org
> Subject: Re: CoreAdmin STATUS performance
>
> On 1/10/13 10:09 AM, Shahar Davidson wrote:
>> search request, the system must be aware of all available cores in
>> order to execute distributed search on_all_  relevant cores
> For this purpose I would definitely recommend that you go "SolrCloud".
>
> Further more we do something "ekstra":
> We have several collections each containing data from a specific period in time - timestamp
of ingoing data decides which collection it is indexed into. One important search-criteria
for our clients are search on timestamp-interval. Therefore most searches can be restricted
to only consider a subset of all our collections. Instead of having the logic calculating
the subset of collections to search (given the timestamp
> search-interval) in clients, we just let clients do "dumb" searches by giving the timestamp-interval.
The subset of collections to search are calculated on server-side from the timestamp-interval
in the search-query. We handle this in a Solr SearchComponent which we place "early" in the
chain of SearchComponents. Maybe you can get some inspiration by this approach, if it is also
relevant for you.
>
> Regards, Per Steffensen
>
> Email secured by Check Point
>


Mime
View raw message