lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: Newbie question - use of SOLR
Date Mon, 05 Dec 2011 23:11:17 GMT

The overall problem you seem to be describing is how to parse your 
files to extract structured data and then index that data in discreet 
fields of Solr documents.

How you should go about thta depends largely on the formats of these files 
and how well structured they are - for instance: it's a lot easier to 
extract documents and fields out of well structured XML files then it is 
out of plain text -- but if the plain text files are really uniform and 
every one hsa the exact same layout, then the problem gets easier.

I would suggest you send more details about hte types of files you are 
working and the types of fields you'd like to search on to the 
solr-user@lucene mailing list.  that list is more suitable for discussions 
baout how to "use" solr to achieve goals, this general@lucene list is 
primarily for broad discussions about the project as a whole, (and for 
people with no idea what lucene is to have a place to start with their 

Good Luck!

: Im totally new to SOLR and Lucene. So, for now i would really appreciate
: some feedback from experienced people.
: Im thinkin to use SOLR/Lucene engine for archiving documents. Documents
: typically have similar layout with different values in specified positions.
: I would like to keep all the docs full indexed. Once user performs search,
: id like to return a list of documents that contain searched criterion.
: Question - how can i specify area on document to search in?
: Lets say i have 1000 full indexed documents, each one has similar data
: placed in, with differences in only few areas (id like to search). For
: instance, document may have some text in header, body and footer. Each of
: this peaces may have some searched word, but, id like to find only those
: docs, that contain searched word in body only?
: Please give me some suggestions, what to start from, what feature of
: SOLR/Lucene i should look into and learn.


View raw message