lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: new to Lucene
Date Fri, 07 Aug 2015 15:35:04 GMT
2. Is the "Index" saved as a file or loaded into the memory?

Adding to Modassar's comments:

Almost all "real" implementations save the index to disk and
read selected portions back in to memory as needed, otherwise
the data isn't permanent. In the Lucene world, I'd start with
NRTCachingDirectory. It'll write data to disk for you and generally
"do the right thing" to make that data available to queries.

RAMDirectory is, indeed, RAM-only, but I _strongly_ recommend you
do not use it unless you deeply understand the underlying use-case
as it's pretty special-purpose.

Indexing a PDF is usually done with Tika, here's a sample program,
admittedly written with Solr in mind, but the Tika parsing is easily
transferrable: searchhub.org/2012/02/14/indexing-with-solrj/

But this raises the question of why you don't use Solr, which uses
Lucene? There are many reasons one might want to use Lucene
directly, but I thought I'd ask.

Best,
Erick


On Fri, Aug 7, 2015 at 6:28 AM, Modassar Ather <modather1981@gmail.com> wrote:
> Please see my comments in-line.
>
> 1. For the indexing of these chapters, how many fields that need to be
> declared? Can I just declare only one field for the contents?
>
> This depends on what you need to search with. E.g if only plain content
> (chapters) are to be searched then one indexed field is required.
> Also if you want to update the index then an id field is required per
> Lucene document.
> There might be a requirement where search on chapter title can be provided
> for which a title field can be added.
>
> 2. Is the "Index" saved as a file or loaded into the memory?
> I think it depends on the type of Lucene Direcotry used. E.g. RAMDirectory
> is a in memory implementation whereas FsDirectory stores index on file
> system.
>
> 2. Can we use multiple terms for the user query such as  "Information
> Technology in Education" or we only allowed to use single term.
> Lucene has support of single term search and phrase search too.
> "Information Technology in Education" as in your question can be searched
> as phrase query.
>
> Regards,
> Modassar
>
>
> On Fri, Aug 7, 2015 at 1:07 PM, Nantha Kumar Subramaniam <
> nanthakumar@oum.edu.my> wrote:
>
>> Good day
>> I am new to Lucene and have started to explore Lucene.
>>
>> I have questions:
>>
>> I have a book in which all the chapters are in pdf. I  plan to index all
>> these individual chapters in Lucene using Tika for the text extraction.
>>
>> 1. For the indexing of these chapters, how many fields that need to be
>> declared? Can I just declare only one field for the contents?
>>
>> 2. Is the "Index" saved as a file or loaded into the memory?
>>
>> 2. Can we use multiple terms for the user query such as  "Information
>> Technology in Education" or we only allowed to use single term.
>>
>>
>> Thank you..
>>
>> Regards,
>>
>>
>> Assoc Prof Dr Nantha Kumar Subramaniam
>> *Head of E-Learning*
>> Open University Malaysia (OUM)
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message