lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From gtkesh <gtk...@gmail.com>
Subject Help with document design for indexing/searching
Date Wed, 03 Jul 2013 16:52:11 GMT
Hi everyone! This is my first post here and I'm new to Lucene, so I would
appreciate your ideas with the design of lucene document I came up with.

*What is my goal*

I'm trying to index the collection of xml documents and all have the same
structure like this:
Each <section> tag can itself have <sections> tag which itself has <section>
tags and so on. The maximum depth is 3. 
<doc>
	<title>
	</title>
	<sections>
		<section>
			<title>
			<text>
		</section>
	</sections>
</doc>

So, I figured out to have these separate fields:

"pageTitle" - doc/title
"sectionTitle" - doc/sections/section/title
"sectionText" - doc/sections/section/text
"subSectionTitle" - doc/sections/section/sections/section/title
"subSectionText" - doc/sections/section/sections/section/text
"subSubSectionTitle" - ...
"subSubSectionText" - ...

Currently, as I index, each document is a separate sectiontext, sectiontitle
or sub things, but they all have the same pageTitle field of course. For
searching, is that the good approach to index the document? I will describe
below *how I'm going to search*;

The real page/document structure is like this: pageTitle is the disease name
and e.g sectionTitle can be "Definition" or "Treatment" or something like
that. So, when the user asks a question like: "What are the treatments for
"x" disease?"  - I'm classifying that the questions is "treatment" type, so
I would like to search the disease name in lucene index, but I would like to
specifically retrieve the section of which title is "treatment". 

Is that the good indexing approach? And also, how would you recommend me to
construct a query for searching, because I want to give disease name more
importance and type ("treatment") relatively less.

Thanks in advance!








--
View this message in context: http://lucene.472066.n3.nabble.com/Help-with-document-design-for-indexing-searching-tp4075228.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message