nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lewis john mcgibbney <lewis.mcgibb...@gmail.com>
Subject Re: Questions about Jena in Nutch
Date Wed, 01 Jun 2011 13:06:13 GMT
My advice to you would be to first ensure that you have a neatly tweaked
Nutch configuration which maintains a healthy representation of the web
graph. Once this is is place, configuring Nutch-1.2 to use the ontology
plug-in is not difficult. I would mention the following points thought.

> If you look at the Jena code within the ontology plug-in you will see that
the parser is only called to retrieve classes and subclasses within your
ontology (which needs to be in RDF/XML) e.g. say your user searched for
'money', the plug-in would display all your classes which match this term or
are similar. If your user then clicked on 'money', the search JSP uses this
as the next query, returns results and refines your users next query using
subclasses of 'money' which may be 'dollar' 'yuan' 'euro' etc etc

>As my first point highlights, I think the main thing to take into
consideration here is the quality and usefulness of your ontology files...
you will undoubtedly get help on consruction of these on
jena-users@incubator.apache.org<https://mail.gcal.ac.uk/owa/?ae=Item&t=IPM.Note&a=New&to=jena-users%40incubator.apache.org&nm=jena-users%40incubator.apache.org>or
of course whatever of the Protege user lists you are working on

>The underlying purpose of the ontology plug-in is to provide domain
specific query refinement/expansion for search engine users. If you have a
finely tweaked Nutch configuration crawling for pages, as well as a
carefully considered and methodically constructed ontology(ies) then of
course the blue print is reasonable.

If you require general guidance please post on the Nutch user list, and I
will try my best to provide a solution or at least help.

Lewis


lfs
Wed, 01 Jun 2011 00:59:27 -0700

Thank you, Lewis! Your information is great to me.


My idea is to use nutch to build a Chinese financial blog information search
engine. My plan is that first use Protege to build a financial ontology and
then do some localization upon Jena. And in this way at least a experimental
Chinese financial blog information search engine can be built. Will you tell
me if my plan can lead me to success? As I want to make sure the blueprint
is reasonable before I get start.


-- 
*Lewis*

Mime
View raw message