james-server-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ioan eugen stan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAILBOX-44) [gsoc2011] Design and implement a distributed mailbox using Hadoop
Date Tue, 14 Jun 2011 22:28:47 GMT

    [ https://issues.apache.org/jira/browse/MAILBOX-44?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049480#comment-13049480
] 

ioan eugen stan commented on MAILBOX-44:
----------------------------------------

Thank you for the input, I appreciate it and I will look into it, it seems very promising.

My first idea was to store all the users emails in a single row, but I couldn't figure how
to access the emails in an efficient manner.
I hope I will get my hands on that book soon, but until then I will see what I can get from
other sources. 

We are currently discussing the requirements and constraints about building a NoSQL storage
here: https://issues.apache.org/jira/browse/MAILBOX-72. For now, the discussion is targeting
HBase, but I think it can be adapted to other NoSQL implementations. We will publish the schema
details there.



> [gsoc2011] Design and implement a distributed mailbox using Hadoop
> ------------------------------------------------------------------
>
>                 Key: MAILBOX-44
>                 URL: https://issues.apache.org/jira/browse/MAILBOX-44
>             Project: James Mailbox
>          Issue Type: New Feature
>            Reporter: Eric Charles
>            Assignee: Norman Maurer
>              Labels: gsoc2011
>             Fix For: 0.3
>
>
> Context: The mailbox subproject (http://james.apache.org/mailbox/) supports maildir,
SQL database (via JPA) and Java Content Repository (JCR) as technology for mail storage. This
flexibility is achieved thanks to a API design that abstracts mail storage from the mail protocols.
> Task: We need to implement mailbox storage as a distributed system on top of Hadoop HDFS.
The James mailbox API will be used. A first step is to design how to interact with Hadoop
(native api, gora incubator at apache,...) and deal with specific performance questions related
to mail loading/parsing in a distributed system (use map/reduce or not, use existing local
lucene indexes for search,...). The second step is to implement the HDFS mailbox (maildir
mailbox is similar because is stores mails as a file and can be an inspiration). A single
James server will still be deployed because we don't have any distributed UID generation.
> Mentor: eric at apache dot org
> Complexity: medium 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Mime
View raw message