lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Number of values limitation in multivalued field
Date Tue, 19 Jan 2010 22:32:12 GMT
As far as I know, there's no underlying difference between
adding all 42K tokens one at a time (mutlivalued)
or all at once (singlevalued), with one rather technical
difference: If you've changed the positionIncrementGap
to something other than "1" in your schema, then the
token offsets delta between successive value adds will be
something other than one. Put another way, there's
no difference if you leave PositionIncrementGap="1".
And even that doesn't matter of you're not doing
proximity queries on that field.

You could even batch them up in chunks. I.e.
<zip>zip1 zip2 zip3</zip>
<zip>zip4 zip5 zip6</zip>

You're only talking 2.5M tokens or so, right? I predict
you'll never notice the data duplication etc. I'd guess that
it's too small of a data set to worry about...

HTH
Erick

On Tue, Jan 19, 2010 at 3:15 PM, SHS SOLR <shssolr@gmail.com> wrote:

> Thanks Erik,
>
> I was not aware of the maxFieldLength.
>
> * Query performance compared to storing data by zipcode. Schema to
> accommodate this would have 42K * 60 documents approx. Also to consider
> duplicate document data with varying zipcode in the index.
>
> Hope this makes sense. We however wanted to understand if it is a good
> practice to dump 42K tokens in a multivalued field.
>
> Thanks,
> Pavan.
>
> On Tue, Jan 19, 2010 at 1:56 PM, Erick Erickson <erickerickson@gmail.com
> >wrote:
>
> > You should be able to do this no problem. Do be aware of the
> > maxfieldlength though, it defaults to 10,000 tokens but you
> > can change it in your schema.xml. Beware, there are TWO
> > instances of this in the schema file. See:
> >
> >
> http://search.lucidimagination.com/search/document/30616a061f8c4bf6/solr_ignoring_maxfieldlength
> >
> > What do you mean by index/search performance impact? As
> > compared to what?
> >
> > I think the impacts will be negligible when compared to putting all
> > the zip codes into the field at once, and search time should be
> > unaffected over that alternative.
> >
> > HTH
> > Erick
> >
> > On Tue, Jan 19, 2010 at 12:11 PM, SHS SOLR <shssolr@gmail.com> wrote:
> >
> > > * Can we define a field in our schema as multiValued (with
> stored=false,
> > > indexed=true) that will hold upto 42K zipcode values associated to each
> > > document?
> > > * Is there any query time performance impact with this.
> > > * Is there any impact on index time.
> > >
> > > The number of documents we are talking here is not more than 100 right
> > now.
> > > There is no requirement to facet or highlight or even show this field
> in
> > > the
> > > search results. We only want to enable zipcode searches that would
> return
> > > matching docs.
> > >
> > > Thanks,
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message