lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sourajit Basak <sourajit.ba...@gmail.com>
Subject Re: Solr: separating index and storage
Date Thu, 06 Jun 2013 12:02:53 GMT
Each day the index grows by ~250 MB; however I am anticipating that this
growth will slow down because there will be repetitions (just a guess). Its
not the order of growth but limitation of our infrastructure. Basically a
budgetary constraint :-)

Apparently there seems to be no problem than disk space. So we will go
ahead with the idea of stored fields.




On Thu, Jun 6, 2013 at 5:03 PM, Erick Erickson <erickerickson@gmail.com>wrote:

> By and large, stored fields are pretty irrelevant for resource
> consumption _except_ for
> disk space consumed. Sharded systems work fine, the
> stored data is stored in the index files (*.fdt and *.fdx) files in
> each segment on each shard.
>
> But you haven't told us anything about your data. How much are
> you talking about here? 100s of G? Terabytes? Other than disk
> space, You may well be anticipating problems that don't exist...
>
> Now, when _returning_ documents the fields must be read, so
> there is some resource consumption there which you can
> mitigate with lazy field loading. But this is usually just a few docs
> so often isn't a problem.
>
> Best
> Erick
>
> On Thu, Jun 6, 2013 at 3:34 AM, Sourajit Basak <sourajit.basac@gmail.com>
> wrote:
> > Absolutely. Solr will return the reference along the docs/results; those
> > references may be used to look-up the actual stuff. Such use cases aren't
> > hard to solve.
> >
> > If the use case demands returning the actual stuff alongside the results,
> > it becomes non-trivial, especially during high loads.
> >
> > To avoid this and do a quick implementation I can judiciously create
> stored
> > fields and see how it performs. I will need to figure out what happens if
> > the volume growth of stored fields is high, how much is the disk I/O and
> > what happens if we shard the index, like, what happens to the stored
> fields
> > then.
> >
> > Best,
> > Sourajit
> >
> >
> >
> >
> > On Tue, Jun 4, 2013 at 5:31 PM, Erick Erickson <erickerickson@gmail.com
> >wrote:
> >
> >> You have to index something with your Solr documents that
> >> has meaning in _your_ system so you can find the
> >> original record. You don't search this field, you just
> >> return it with the search results and then use it to get
> >> the original document.
> >>
> >> If you're storing the original in a DB, this can be the PK.
> >> If on a file system the path. etc.
> >>
> >> Essentially, since the association is specific to your environment
> >> you need to handle it explicitly...
> >>
> >> Best
> >> Erick
> >>
> >> On Mon, Jun 3, 2013 at 11:56 AM, Sourajit Basak
> >> <sourajit.basac@gmail.com> wrote:
> >> > Consider the following use case.
> >> >
> >> > Certain words are extracted from a document and indexed. The exact
> >> sentence
> >> > containing the word cannot be stored alongside the extracted word
> because
> >> > of the volume at which the documents grow; How can the index and, lets
> >> call
> >> > it doc servers be separated ?
> >> >
> >> > An option is to store the sentences in MongoDB or a RDBMS. But there
> >> seems
> >> > to be a schema level design issue. Assuming 'word' to be a multivalued
> >> > field, how do we associate to it a reference to the corresponding
> entry
> >> in
> >> > the doc server.
> >> >
> >> > May create (word_1, ref_1) tuples. Is there any other in-built
> feature ?
> >> >
> >> > Any related project which separates index & doc servers ?
> >> >
> >> > Thanks,
> >> > Sourajit
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message