lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Zhang <smartag...@gmail.com>
Subject Re: processing documents in solr
Date Sat, 27 Jul 2013 06:30:03 GMT
On Fri, Jul 26, 2013 at 11:18 PM, Shawn Heisey <solr@elyograg.org> wrote:

> On 7/26/2013 11:50 PM, Joe Zhang wrote:
> > ==> Essentially we are doing paigination here, right? If performance is
> not
> > the concern, given that the index is dynamic, does the order of
> > entries remain stable over time?
>
> Yes, it's pagination.  Just like the other method that I've described in
> detail, you'd have to avoid updating the index while you were getting
> information.  Unless you can come up with a sort parameter that's
> guaranteed to make sure that new documents are at the end, any changes
> to the index during the retrieval process will make it impossible to
> retrieve every document.
>
==> What I can guarantee is that there is no deletion, but I guess this is
not equivalent to "newly added docs are at the end", right?

==> I believe you are right about performance. The retrived set becomes
larger and larger.

>
> >> ==> This approach seems to require that the id field is numerical,
> right?
> > I have a text-based id that is unique.
>
> StrField types work perfectly with range queries.  As long as it's not a
> tokenized field, TextField works properly with range queries too.
> KeywordTokenizer is OK, as long you don't use filters that create
> additional tokens.  Some examples that create additional tokens are
> WordDelimiterFilter and EdgeNgramFilter.
>
>
==> so a "url" field would work fine?


>
>
> ==> I'm not sure I understand the "q={XXX TO *}" part --> wouldn't query
> be
> > matched against the default search field, which could be "content", for
> > example? How would that do the job?
>
> You are correct, I was too hasty in constructing the query.  That should
> be:
> q=id:{XXX TO *}&rows=NNNNNN&sort=id asc
>
> You could speed things up if you don't need to see all stored fields in
> the response by using the fl parameter to only return the fields that
> you need.
>
> Responding to your additional message about an autoincrement field -
> that would only be possible if you are importing from a data source that
> supports autoincrement, like MySQL.  Solr itself has no support for
> autoincrement.
>
> Thanks,
> Shawn
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message