lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: date issues
Date Thu, 23 Feb 2012 13:26:28 GMT
1> Don't use sint, it's being deprecated. And it'll take up more space
than a TrieDate
2> Precision. Sure, use the coarsest time you can, normalizing
everything to day would be a good thing.

You won't get any space savings by storing to day resolution, it's
just a long under the covers. But
depending on how you're doing your query, you may get much less memory
usage since some searches are sensitive to the number of *unique* terms
in a field and you'll reduce that number.

But without some idea of the queries you're running it's hard to say whether
this will help.

Best
Erick

On Thu, Feb 23, 2012 at 1:25 AM, Jason Toy <jasontoy@gmail.com> wrote:
> I  have a solr instance with about 400m docs. For text searches it is perfectly fine.
When I do searches that calculate  the amount of times a word appeared in the doc set for
every day of a month, it usually causes solr to crash with out of memory errors.
> I calculate this by running  ~30 queries, one for each day to see the count for that
day.
> Is there a better way I could do this?
>
> Currently the date fields are stored as:
> <fieldType name="date" class="solr.TrieDateField" omitNorms="true" precisionStep="0"
positionIncrementGap="0"/>
>
> and the timestamps are stored in the format of:
> 2012-02-22T21:11:14Z
>
> We have no need to store anything beyond the date. Will just changing the time portion
to zeros make things faster:
> 2012-02-22T00:00:00Z
>
> I thought that to optimize this, there would be an actual date type that doesnt store
the time component, but looking through the solr docs, I don't see anything specifically for
a date as opposed to a timestamp.  Would it be faster for me to store dates in an sint format?
 What is the optimal format I should use? If the format is to continue to use TrieDateField,
 is it not a waste to store the hour/minute/seconds even if they are not being used?
>
> Is there anything else I can do to make this more efficient?
>
> I have looked around on the mailing list and on google and not sure what to use, I would
appreciate any pointers.  Thanks.
>
> Jason
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message