lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Searle <dave.sea...@magicalia.com>
Subject RE: faceted search with job title
Date Wed, 21 Jul 2010 15:42:55 GMT
You'd probably need to do some post processing on the pages and set up rules for each website
to grab that specific bit of data. You could load the html into an xml parser, then use xpath
to grab content from a particular tag with a class or id, based on the particular website



-----Original Message-----
From: Savannah Beckett [mailto:savannah_beckett30@yahoo.com] 
Sent: 21 July 2010 16:38
To: solr-user@lucene.apache.org
Subject: faceted search with job title

Hi,
  I am currently using nutch to crawl some job pages from job boards.  They are 
in my solr index now.  I want to do faceted search with the job titles.  How?  
The job titles can be in any locations of the page, e.g. title, header, 
content...   If I use indexfilter in Nutch to search the content for job title, 
there are hundred of thousands of job titles, I can't hard code them all.  Do 
you have a better idea?  I think I need the job title in a separate field in the 
index to make it work with solr faceted search, am I right?
Thanks.


      

Mime
View raw message