nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <>
Subject [jira] Commented: (NUTCH-808) Evaluate ORM Frameworks which support non-relational column-oriented datastores and RDBMs
Date Tue, 13 Apr 2010 09:50:49 GMT


Enis Soztutar commented on NUTCH-808:

bq. What do you mean by current implementation? NutchBase?
Indeed. In package deals with ORM (though not all classes)

bq. I know that Cascading have various Tape/Sink implementations including JDBC, HBase but
also SimpleDB. Maybe it would be worth having a look at how they do it?
The way cascading does this is to convert Tuples (cascading data structure) to HBase/JDBC
records. The schema for HBase/JDBC is given as a metadata. Since they deal with only tuple
-> table row, it is not that difficult. But again, cascading does not allow for mapping
lists to columns, etc. 

bq. My gut feeling would be to write a custom framework instead of relying on DataNucleus
and use AVRO if possible. I really think that HBase support is urgently needed but am less
convinced that we need MySQL in the very short term. 
Yeah, the more I think about it, the more I come to terms with custom implementation. However,
I think we might benefit a lot from the ideas from JDO in the long term. Also, JDBC implementation
may not be relevant for large scale deployments, but it will be a very nice side effect of
the ORM layer, which will allow easy deployment, which in turn will hopefully bring more users.

> Evaluate ORM Frameworks which support non-relational column-oriented datastores and RDBMs

> ------------------------------------------------------------------------------------------
>                 Key: NUTCH-808
>                 URL:
>             Project: Nutch
>          Issue Type: Task
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: 2.0
> We have an ORM layer in the NutchBase branch, which uses Avro Specific Compiler to compile
class definitions given in JSON. Before moving on with this, we might benefit from evaluating
other frameworks, whether they suit our needs. 
> We want at least the following capabilities:
> - Using POJOs 
> - Able to persist objects to at least HBase, Cassandra, and RDBMs 
> - Able to efficiently serialize objects as task outputs from Hadoop jobs
> - Allow native queries, along with standard queries 
> Any comments, suggestions for other frameworks are welcome.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:


View raw message