lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sreedevi s <sreedevi.payik...@gmail.com>
Subject Re: Lucene search in attachments
Date Tue, 10 Feb 2015 09:13:32 GMT
No David. By increasing the value or I can set to -1 to make it unlimited
but still I cannot assure that my whole text can be searchable, which is
still a problem with large files because only the part which is indexed
will be searchable.
Was looking for some alternatives.

Best Regards,
Sreedevi S

On Tue, Feb 10, 2015 at 2:26 PM, David Pilato <david@pilato.fr> wrote:

> I don’t understand.
> If you don’t raise this restriction to a higher value (or to -1), all the
> text won’t be extracted so only a subset of the text will be indexed.
> Non indexed parts of the text won’t be searchable.
>
> Did I misunderstand your question?
>
> --
> David Pilato | Technical Advocate | Elasticsearch.com
> @dadoonet <https://twitter.com/dadoonet> | @elasticsearchfr <
> https://twitter.com/elasticsearchfr> | @scrutmydocs <
> https://twitter.com/scrutmydocs>
>
>
>
> > Le 10 févr. 2015 à 09:52, sreedevi s <sreedevi.payikkad@gmail.com> a
> écrit :
> >
> > Thank you David. Yes, it has a restriction of characters to 10000.
> > But for large files, what could be done in that case?
> >
> > Best Regards,
> > Sreedevi S
> >
> > On Tue, Feb 10, 2015 at 2:04 PM, David Pilato <david@pilato.fr> wrote:
> >
> >> If you don’t index content, you won’t be able to search for it I guess.
> >> That said, Tika can have this extracted characters limit. See
> indexedChars
> >> below:
> >>
> >> tika().parseToString(new BytesStreamInput(content, false), metadata,
> >> indexedChars);
> >>
> >> [1]
> >>
> https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob/master/src/main/java/org/elasticsearch/index/mapper/attachment/AttachmentMapper.java#L456
> >> <
> >>
> https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob/master/src/main/java/org/elasticsearch/index/mapper/attachment/AttachmentMapper.java#L456
> >>>
> >>
> >> --
> >> David Pilato | Technical Advocate | Elasticsearch.com
> >> @dadoonet <https://twitter.com/dadoonet> | @elasticsearchfr <
> >> https://twitter.com/elasticsearchfr> | @scrutmydocs <
> >> https://twitter.com/scrutmydocs>
> >>
> >>
> >>
> >>> Le 10 févr. 2015 à 09:24, sreedevi s <sreedevi.payikkad@gmail.com>
a
> >> écrit :
> >>>
> >>> Hi,
> >>>   Which is the best method to search in attachments in lucene? I am new
> >>> to lucene and I am using version 4.10.2. By making use of Tika, I know
> I
> >>> can convert files to text and then index it as another field. But for
> >> large
> >>> files that will not be the ideal solution. I believe the maximum
> >> characters
> >>> per field is 10,000. So, what can be ideal method to search attachments
> >> then
> >>>
> >>>
> >>> Best Regards,
> >>> Sreedevi S
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message