lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ксения Баталова <batalova...@gmail.com>
Subject Re: Solr Atomic Updates
Date Thu, 04 Jun 2015 17:04:35 GMT
Erick,

Thank you so much. It became a bit clearer.

It was decided to upgrade Solr to 5.2 and use SolrCloud in our next release.

I think I'll write here about it yet :)

_ _

Batalova Kseniya


I have to ask then why you're not using SolrCloud with multiple shards? It
seems to me that that gives you the indexing throughput you need (be sure to
use CloudSolrServer from your client). At 300M complex documents, you
pretty much certainly will need to shard anyway so in some sense you're
re-inventing the wheel here.

You can host multiple shards on the same machine, and these _are_ separate
Solr cores under the covers so you problem with atomic updates disappears.

Although I would consider upgrading to Solr 4.10.3 or even 5.2 (which is being
voted on even now and should be out in a week or so barring problems).

Best,
Erick

On Wed, Jun 3, 2015 at 11:04 AM, Ксения Баталова <batalova.ks@gmail.com>
wrote:
> Jack,
>
> Decision of using several cores was made to increase indexing and
> searching performance (experimentally).
>
> In my project index is about 300-500 millions documents (each document
> has rather difficult structure) and it may be larger.
>
> So, while indexing the documents are being added in different cores by
> some amount of threads.
>
> In other words, each thread collect nessesary information for list of
> documents and generate create-documents query to specific core.
>
> At this moment it doesn't matter (and it can't be found out) which
> document in which core will be.
>
> And now there is necessary to update (atomic update) this index.
>
> Something like this..
>
> _ _
>
> Batalova Kseniya
>
>
> Explain a little about why you have separate cores, and how you decide
> which core a new document should reside in. Your scenario still seems a bit
> odd, so help us understand.
>
>
> -- Jack Krupansky
>
> On Wed, Jun 3, 2015 at 3:15 AM, Ксения Баталова <batalova.ks@gmail.com>
> wrote:
>
>> Hi!
>>
>> Thanks for your quick reply.
>>
>> The problem that all my index is consists of several parts (several cores)
>>
>> and while updating I don't know in advance in which part updated id is
>> lying (in which core the document with specified id is lying).
>>
>> For example, I have two cores (*Core1 *and *Core2*) and I want to
>> update the document with id *Id1 *and I don't know where this document
>> is lying.
>>
>> So, I have to do two select-queries to my cores to know where it is.
>>
>> And then generate update-query to necessary core.
>>
>> What am I doing wrong?
>>
>> I remind that I'm using SOLR 4.4.0.
>>
>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>> Best regards,
>> Batalova Kseniya
>>
>>
>> What exactly is the problem? And why do you care about cores, per se -
>> other than to send the update to the core/collection you are trying to
>> update? You should specify the core/collection name in the URL.
>>
>> You should also be using the Solr reference guide rather than the (old)
>> wiki:
>>
>> https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents
>>
>>
>> -- Jack Krupansky
>>
>> On Tue, Jun 2, 2015 at 10:15 AM, Ксения Баталова <batalova.ks@gmail.com>
>> wrote:
>>
>> > Hi!
>> >
>> > I'm using *SOLR 4.4.0* for searching in my project.
>> > Now I am facing a problem of atomic updates in multiple cores.
>> > From wiki:
>> >
>> > curl *http://localhost:8983/solr/update
>> > <http://localhost:8983/solr/update> *-H
>> > 'Content-type:application/json' -d '
>> > [
>> >  {
>> >   "*id*"        : "*TestDoc1*",
>> >   "title"     : {"set":"test1"},
>> >   "revision"  : {"inc":3},
>> >   "publisher" : {"add":"TestPublisher"}
>> >  },
>> >  {
>> >   "id"        : "TestDoc2",
>> >   "publisher" : {"add":"TestPublisher"}
>> >  }
>> > ]'
>> >
>> > As well as I understand, this means that the document, for example, with
>> id
>> > *TestDoc1*, will be searched for updating *only in one core*.
>> > And if there is no any document with id *TestDoc1*, the document will be
>> > created.
>> > Can I somehow to specify the* list of cores* for searching and then
>> > updating necessary document with specific id?
>> >
>> > It's something like *shards *parameter in *select* query.
>> > From wiki:
>> >
>> > #now do a distributed search across both servers with your browser or
>> curl
>> > curl '
>> >
>> http://localhost:8983/solr/*select*?*shards*=localhost:8983/solr,localhost:7574/solr&indent=true&q=ipod+solr
>> > '
>> >
>> > Or is it planned in the future?
>> >
>> > Thanks in advance.
>> >
>> > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>> >
>> > Best regards,
>> > Batalova Kseniya
>> >
>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message