lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jebarlin Robertson <jebar...@gmail.com>
Subject Re: Regarding Compression Tool
Date Thu, 19 Sep 2013 03:44:20 GMT
Hi,

Thanks Mark Miller for your advise.

I had missed some of the part, thats why I could not get the proper value.
I should get the binaryvalue instead of get() for compressed content.
I tested all the scnarious and I have some doubts,
1. I observed that while searching with highlighter tool, it is having case
senstive problem. It should be taken care from our side or the lucene
itself will take care of it for Highlihter feature (not for indexing or
search).
2. When I test highliter feature with both Normal Highlighter and
FastVector Heightliter, I could see the normal highlighter is taking less
time than the FasterVectorHighlighter. But the document says
FasterVectorHighlither will be 2.5 times faster than normal highliter. I
have used both from the examples given in the Lucne in Action book.
3.I was not able to get the proper highlighted sentences with the Normal
Highliter class for all the files though I have the search query in the
file, The Arraylist<TextFragment> size is stuck with 513 always for any
file in Highlighter.java file when I use getBestTextFragments().
4. Why all this Fragment and TokenStream techniques are required, which is
analyzing the content and tokenizing and then creating fragments. what if
we directly take the index of the search query from the content and take
the sentence before and after with some specified offset values.


Regards,
Jebarlin Robertson.R

On Tue, Sep 17, 2013 at 3:07 PM, Jebarlin Robertson <jebarlin@gmail.com>wrote:

> Thanks Mark.
>
> I know all this scenarios about battery and space. But at the same I am
> just checking the feasibility only.
> Actually I started this to ask how to use the CompressionTool to compress
> the data and store it in index.
> I observed the below things and I tried using this way
> * Field field = new Field("contents", contents, Field.Store.NO,*
> *                Field.Index.ANALYZED,
> Field.TermVector.WITH_POSITIONS_OFFSETS);*
> *Field field1 = new Field("contents",
> CompressionTools.compressString(contents), Field.Store.YES)*  . I could
> able to search but when i try to get the stored content from the document,
> it is giving null.
> So Could you please give me some sample code to use the CompressionTool.
>
> public static final Field.Store <http://lucene.apache.org/core/2_9_4/api/all/org/apache/lucene/document/Field.Store.html>
*COMPRESS*
>
> *Deprecated.* *Please use CompressionTools<http://lucene.apache.org/core/2_9_4/api/all/org/apache/lucene/document/CompressionTools.html>
instead.
> For string fields that were previously indexed and stored using
> compression, the new way to achieve this is: First add the field
> indexed-only (no store) and additionally using the same field name as a
> binary, stored field with
> CompressionTools.compressString(java.lang.String)<http://lucene.apache.org/core/2_9_4/api/all/org/apache/lucene/document/CompressionTools.html#compressString(java.lang.String)>
> .* Store the original field value in the index in a compressed form. This
> is useful for long documents and for binary valued fields.
>
>
> On Mon, Sep 16, 2013 at 9:56 PM, Mark Miller <
> developmentalmadness@gmail.com> wrote:
>
>> Have you considered storing your indexes server-side? I haven't used
>> compression but usually the trade-off of compression is CPU usage which
>> will also be a drain on battery life. Or maybe consider how important the
>> highlighter is to your users - is it worth the trade-off of either disk
>> space or battery life? If it's more of a nice-to-have then maybe hold off
>> on the feature for a later release until you've had some feedback and some
>> more time to figure out the best solution. Of course I don't know much
>> about your application, so take my advice with a grain of salt.
>>
>>
>> On Mon, Sep 16, 2013 at 2:22 AM, Jebarlin Robertson <jebarlin@gmail.com
>> >wrote:
>>
>> > I am using Apache Lucene in Android. I have around 1 GB of Text
>> documents
>> > (Logs). When I Index these text documents using this
>> > *new Field(ContentIndex.KEY_TEXTCONTENT, contents, Field.Store.YES,
>> > Field.Index.ANALYZED,TermVector.WITH_POSITIONS_OFFSETS)*, the index
>> > directory is consuming 1.59GB memory size.
>> > But without Field Store it will be adound 0.59 GB indexed size. If the
>> > Lucene indexing is taking this much space to create index and to store
>> the
>> > original text just to use hightlight feature, it will be big problem for
>> > mobile devices. So I just want some help that, is there any alternative
>> > ways to do this without occupying more space to use highligh feature in
>> > Android powered devices.
>> >
>> >
>> > On Sun, Sep 15, 2013 at 3:26 AM, Erick Erickson <
>> erickerickson@gmail.com
>> > >wrote:
>> >
>> > > bq: I thought that I can use the CompressionTool to minimize the
>> memory
>> > > size.
>> > >
>> > > This doesn't make a lot of sense. Highlighting needs the raw data to
>> > > figure out what to highlight, so I don't see how the CompressionTool
>> > > will help you there.
>> > >
>> > > And unless you have a huge document and only a very few of them, then
>> > > the memory occupied by the uncompressed data should be trivial
>> > > compared to the various low-level caches. This really is seeming like
>> > > an XY problem. Perhaps if you backed up and explained _why_ this
>> > > seems important to do people could be more helpful.
>> > >
>> > >
>> > > Best,
>> > > Erick
>> > >
>> > >
>> > > On Sat, Sep 14, 2013 at 12:21 PM, Jebarlin Robertson <
>> jebarlin@gmail.com
>> > > >wrote:
>> > >
>> > > > Thank you very much Erick. Actually I was using Highlighter tool,
>> that
>> > > > needs the entire data to be stored to get the relevant searched
>> > sentence.
>> > > > But when I use that, It was consuming more memory (Indexed data
>> size +
>> > > >  Store.YES - the entire content) than the actual documents size.
>> > > > I thought that I can use the CompressionTool to minimize the memory
>> > size.
>> > > > You can help, if there is any possiblities or way to store the
>> entire
>> > > > content and to use the highlighter feature.
>> > > >
>> > > > Thankyou
>> > > >
>> > > >
>> > > > On Fri, Sep 13, 2013 at 6:54 PM, Erick Erickson <
>> > erickerickson@gmail.com
>> > > > >wrote:
>> > > >
>> > > > > Compression is for the _stored_ data, which is not searched.
>> Ignore
>> > > > > the compression and insure that you index the data.
>> > > > >
>> > > > > The compressing/decompressing for looking at stored
>> > > > > values is, I believe, done at a very low level that you don't
>> > > > > need to care about at all.
>> > > > >
>> > > > > If you index the data in the field, you shouldn't have to do
>> > > > > anything special to search it.
>> > > > >
>> > > > > Best,
>> > > > > Erick
>> > > > >
>> > > > >
>> > > > > On Fri, Sep 13, 2013 at 1:19 AM, Jebarlin Robertson <
>> > > jebarlin@gmail.com
>> > > > > >wrote:
>> > > > >
>> > > > > > Hi,
>> > > > > >
>> > > > > > I am trying to store all the Field values using CompressionTool,
>> > But
>> > > > > When I
>> > > > > > search for any content, it is not finding any results.
>> > > > > >
>> > > > > > Can you help me, how to create the Field with CompressionTool
to
>> > add
>> > > to
>> > > > > the
>> > > > > > Document and how to decompress it when searching for any
>> content in
>> > > it.
>> > > > > >
>> > > > > > --
>> > > > > > Thanks & Regards,
>> > > > > > Jebarlin Robertson.R
>> > > > > >
>> > > > >
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > Thanks & Regards,
>> > > > Jebarlin Robertson.R
>> > > > GSM: 91-9538106181.
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > Thanks & Regards,
>> > Jebarlin Robertson.R
>> > GSM: 91-9538106181.
>> >
>>
>>
>>
>> --
>> Mark J. Miller
>> Blog: http://www.developmentalmadness.com
>> LinkedIn: http://www.linkedin.com/in/developmentalmadness
>>
>
>
>
> --
> Thanks & Regards,
> Jebarlin Robertson.R
> GSM: 91-9538106181.
>



-- 
Thanks & Regards,
Jebarlin Robertson.R
GSM: 91-9538106181.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message