lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <yo...@apache.org>
Subject Re: DocumentsWriter.checkMaxTermLength issues
Date Mon, 31 Dec 2007 16:10:28 GMT
On Dec 31, 2007 5:53 AM, Michael McCandless <lucene@mikemccandless.com> wrote:
> Doron Cohen <cdoronc@gmail.com> wrote:
> > I like the approach of configuration of this behavior in Analysis
> > (and so IndexWriter can throw an exception on such errors).
> >
> > It seems that this should be a property of Analyzer vs.
> > just StandardAnalyzer, right?
> >
> > It can probably be a "policy" property, with two parameters:
> > 1) maxLength, 2) action: chop/split/ignore/raiseException when
> > generating too long tokens.
>
> Agreed, this should be generic/shared to all analyzers.
>
> But maybe for 2.3, we just truncate any too-long term to the max
> allowed size, and then after 2.3 we make this a settable "policy"?

But we already have a nice component model for analyzers...
why not just encapsulate truncation/discarding in a TokenFilter?

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message