jena-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paolo Castagna <castagna.li...@googlemail.com>
Subject Inference with MapReduce (a la RIOT infer command)
Date Tue, 01 Nov 2011 15:08:32 GMT
Hi,
I just want to share an approach to do inference a la RIOT infer command
line but faster (i.e. using MapReduce).

I've done only limited testing, but it should work. It's quite simple and
it is just a map only job.

The driver is InferDriver.java [1] and the map function is InferMapper.java
[2]. Now, I am interested in what parts of OWL can be done in a similar way.

In comparison to other (very interesting) approaches (for example: [3]) this
is extremely simple, but its simplicity is a very big plus in practice.
It also satisfies a lot of use cases.

Next step is: how to I do the same when I receive a (typically small) update?
How to I intercept the update?
What if the update deletes stuff (with stuff 1) vocabulary data 2) instance
data)? 2) is what I think is more likely to happen in practice.

Cheers,
Paolo

PS:
I've been using Apache Whirr to test this and it works perfectly with small
Hadoop clusters (i.e. < 10 nodes). Unfortunately, I am having issues with
larger clusters (i.e. > 20 nodes) [4]. Apache Whirr just went out incubation
and it's a really great project, I really recommend you look at it if you
ever need to have an Hadoop cluster running on EC2. Whirr also is not limited
to Hadoop.

  [1] 
https://github.com/castagna/tdbloader3/blob/master/src/main/java/org/apache/jena/tdbloader3/InferDriver.java
  [2] 
https://github.com/castagna/tdbloader3/blob/master/src/main/java/org/apache/jena/tdbloader3/InferMapper.java
  [3] http://www.few.vu.nl/~jui200/webpie.html
  [4] http://markmail.org/thread/tseifrs7y3kiebih

Mime
View raw message