hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Apte <technicalarchitect2...@gmail.com>
Subject Re: Modelling threaded messages
Date Sat, 02 Apr 2011 03:10:59 GMT
Have you considered using Cassandra?

Kevin


On Fri, Apr 1, 2011 at 11:01 PM, M. C. Srivas <mcsrivas@gmail.com> wrote:

> Ted, this is a pretty clever idea.
>
> On Thu, Mar 31, 2011 at 9:27 PM, Ted Dunning <tdunning@maprtech.com>
> wrote:
>
> > Solr/Elastic search is a fine solution, but probably won't be quite as
> fast
> > as a well-tuned hbase solution.
> >
> > One key assumption you seem to be making is that you will store messages
> > only once.  If you are willing to make multiple updates to tables, then
> you
> > can arrange the natural ordering of the table to get what you want.  For
> > instance, you could keep the most recent messages (say the last 10 from
> > each
> > of the 1000 most recently updated threads) in an in memory table.  Then
> you
> > could store messages in a thread table indexed by thread:timestamp.
> >  Finally
> > you could store messages in a table indexed by user:thread or
> > user:timestamp.  This would allow you to display the most recent messages
> > or
> > thread in near zero time, to display all or the most recent messages from
> a
> > particular thread with only one retrieval and all of the messages from a
> > particular user in time order in one retrieval.
> >
> > On Thu, Mar 31, 2011 at 5:56 PM, Mark Jarecki <mjarecki@bigpond.net.au
> > >wrote:
> >
> > > Hi all,
> > >
> > > I'm modelling a schema for storing and retrieving threaded messages,
> > where,
> > > for planning purposes:
> > >
> > >        - there are many millions of users.
> > >        - a user might have up to 1000 threads.
> > >        - each thread might have up to 50000 messages (with some threads
> > > being sparse with only a few messages).
> > >        - the Stargate REST interface is used.
> > >
> > > I want to be able to execute the following queries:
> > >
> > >        - retrieve x latest active threads, with the latest message.
> > >        - retrieve x latest active threads, with the latest message,
> > offset
> > > by y.
> > >        - retrieve x latest messages from a thread.
> > >        - retrieve x latest messages from a thread, offset by y.
> > >
> > > I've come up with a few possible methods for modelling this. But any
> > > insights would be greatly appreciated.
> > >
> > > Thanks in advance,
> > >
> > > Mark
> > >
> > >
> > > Possible solution 1:
> > >
> > > TABLE:          threads
> > > KEY:            userID : threadID
> > > COLUMN:         latest_message
> > >
> > > TABLE:          messages
> > > KEY:            userID : threadID : timestamp
> > > COLUMN:         message
> > >
> > > Messages are first written to the messages table, and then the threads
> > > table's thread is updated with the latest message.
> > >
> > > To fetch the latest x active threads, with the latest message:
> > >
> > >        - I retrieve all threads and then sort and reduce the results on
> > the
> > > client.
> > >
> > > A concern with this is the fetching of all threads to sort on each
> > request.
> > > This could be unwieldy!
> > >
> > >
> > > Possible solution 2:
> > >
> > > TABLE:          threads
> > > KEY:            userID : timestamp : threadID
> > > COLUMN:         latest_message
> > >
> > > TABLE:          messages
> > > KEY:            userID : threadID : timestamp
> > > COLUMN:         message
> > >
> > > Messages are first written to the messages table, and then the threads
> > > table's is updated with the latest message. The previous latest message
> > is
> > > then deleted from the threads table.
> > >
> > > To fetch the latest x active threads, with the latest message:
> > >
> > >        - I scan the threads table until I get x unique threads.
> > >
> > > A concern with this could be the issue of keeping the threads table in
> > sync
> > > with the messages table - especially with the deletion of old latest
> > > messages.
> > >
> > >
> > > Possible solution 3:
> > >
> > > TABLE:          messages
> > > KEY:            userID : timestamp : threadID
> > > COLUMN:         message
> > >
> > > To fetch the latest x active threads, with the latest message:
> > >
> > >        - I scan the messages table until I get x unique threads.
> > >
> > > One of my concerns with this method is that some threads will be busier
> > > than others, forcing a scan through nearly all of a user's messages.
> And
> > > there will be an ever increasing number of messages. A periodic
> archiving
> > > process - moving older messages to another table - might alleviate
> things
> > > here.
> > >
> > >
> > > Possible solution 4:
> > >
> > > Use SOLR/Elastic search or equivalent.
> > >
> > >
> > >
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message