lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Lucene search in attachments
Date Tue, 10 Feb 2015 09:15:15 GMT
Hi,

There is no restriction to 10000 characters inside Lucene and there never was one. In earlier
Lucene versions (long time ago) there was an implicit restriction to 10,000 TERMS (not characters).
This is no longer the case. If you still want this, you have to wrap your Analyzer: http://goo.gl/SRf45A

If you have a limitation to 10,000 characters somewhere, it might be your TIKA text extraction.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: sreedevi s [mailto:sreedevi.payikkad@gmail.com]
> Sent: Tuesday, February 10, 2015 9:53 AM
> To: java-user@lucene.apache.org
> Subject: Re: Lucene search in attachments
> 
> Thank you David. Yes, it has a restriction of characters to 10000.
> But for large files, what could be done in that case?
> 
> Best Regards,
> Sreedevi S
> 
> On Tue, Feb 10, 2015 at 2:04 PM, David Pilato <david@pilato.fr> wrote:
> 
> > If you don’t index content, you won’t be able to search for it I guess.
> > That said, Tika can have this extracted characters limit. See
> > indexedChars
> > below:
> >
> > tika().parseToString(new BytesStreamInput(content, false), metadata,
> > indexedChars);
> >
> > [1]
> > https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob
> >
> /master/src/main/java/org/elasticsearch/index/mapper/attachment/Attach
> > mentMapper.java#L456
> > <
> > https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob
> >
> /master/src/main/java/org/elasticsearch/index/mapper/attachment/Attach
> > mentMapper.java#L456
> > >
> >
> > --
> > David Pilato | Technical Advocate | Elasticsearch.com @dadoonet
> > <https://twitter.com/dadoonet> | @elasticsearchfr <
> > https://twitter.com/elasticsearchfr> | @scrutmydocs <
> > https://twitter.com/scrutmydocs>
> >
> >
> >
> > > Le 10 févr. 2015 à 09:24, sreedevi s <sreedevi.payikkad@gmail.com>
a
> > écrit :
> > >
> > > Hi,
> > >    Which is the best method to search in attachments in lucene? I am
> > > new to lucene and I am using version 4.10.2. By making use of Tika,
> > > I know I can convert files to text and then index it as another
> > > field. But for
> > large
> > > files that will not be the ideal solution. I believe the maximum
> > characters
> > > per field is 10,000. So, what can be ideal method to search
> > > attachments
> > then
> > >
> > >
> > > Best Regards,
> > > Sreedevi S
> >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message