lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Swapna Vuppala <Swapna.Vupp...@arup.com>
Subject Mapping and Capture in ExtractingRequestHandler
Date Tue, 20 Dec 2011 05:54:21 GMT
Hi,

I understand that we can specify parameters in ExtractingRequestHandler in solrconfig.xml
to capture HTML tags of a particular type and map them to desired solr fields, like something
below.

<str name="capture">div</str>
<str name="fmap.div">mysolrfield</str>

The above setting will capture content in "div" tags and copy to the solr field "mysolrfield".

What am interested is in capturing div tags with a particular class name to a solr field.
When extracting content from outlook messages, I would like to capture the content within
<div class="message-body"> to go into a solr field and the content within <div class="attachment-entry">
to go into another solr field.

Can someone please let me know how to achieve this ?

Thanks and Regards,
Swapna.

____________________________________________________________
Electronic mail messages entering and leaving Arup  business
systems are scanned for acceptability of content and viruses

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message