lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Swapna Vuppala <>
Subject RE: Mapping and Capture in ExtractingRequestHandler
Date Wed, 21 Dec 2011 05:04:32 GMT
Hi Erick,

Can you please give me little more information about SolrJ program and how to use it to construct
a Solr document ?

Thanks and Regards,

-----Original Message-----
From: Erick Erickson [] 
Sent: Wednesday, December 21, 2011 2:28 AM
Subject: Re: Mapping and Capture in ExtractingRequestHandler

When you start getting into complex HTML extraction, you're probably
better off using a SolrJ program with a forgiving HTML parser
and extracting the relevant bits yourself and construction a


On Tue, Dec 20, 2011 at 12:54 AM, Swapna Vuppala
<> wrote:
> Hi,
> I understand that we can specify parameters in ExtractingRequestHandler in solrconfig.xml
to capture HTML tags of a particular type and map them to desired solr fields, like something
> <str name="capture">div</str>
> <str name="fmap.div">mysolrfield</str>
> The above setting will capture content in "div" tags and copy to the solr field "mysolrfield".
> What am interested is in capturing div tags with a particular class name to a solr field.
When extracting content from outlook messages, I would like to capture the content within
<div class="message-body"> to go into a solr field and the content within <div class="attachment-entry">
to go into another solr field.
> Can someone please let me know how to achieve this ?
> Thanks and Regards,
> Swapna.
> ____________________________________________________________
> Electronic mail messages entering and leaving Arup  business
> systems are scanned for acceptability of content and viruses

View raw message