hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "M. C. Srivas" <mcsri...@gmail.com>
Subject Re: Modelling threaded messages
Date Sat, 02 Apr 2011 03:01:48 GMT
Ted, this is a pretty clever idea.

On Thu, Mar 31, 2011 at 9:27 PM, Ted Dunning <tdunning@maprtech.com> wrote:

> Solr/Elastic search is a fine solution, but probably won't be quite as fast
> as a well-tuned hbase solution.
>
> One key assumption you seem to be making is that you will store messages
> only once.  If you are willing to make multiple updates to tables, then you
> can arrange the natural ordering of the table to get what you want.  For
> instance, you could keep the most recent messages (say the last 10 from
> each
> of the 1000 most recently updated threads) in an in memory table.  Then you
> could store messages in a thread table indexed by thread:timestamp.
>  Finally
> you could store messages in a table indexed by user:thread or
> user:timestamp.  This would allow you to display the most recent messages
> or
> thread in near zero time, to display all or the most recent messages from a
> particular thread with only one retrieval and all of the messages from a
> particular user in time order in one retrieval.
>
> On Thu, Mar 31, 2011 at 5:56 PM, Mark Jarecki <mjarecki@bigpond.net.au
> >wrote:
>
> > Hi all,
> >
> > I'm modelling a schema for storing and retrieving threaded messages,
> where,
> > for planning purposes:
> >
> >        - there are many millions of users.
> >        - a user might have up to 1000 threads.
> >        - each thread might have up to 50000 messages (with some threads
> > being sparse with only a few messages).
> >        - the Stargate REST interface is used.
> >
> > I want to be able to execute the following queries:
> >
> >        - retrieve x latest active threads, with the latest message.
> >        - retrieve x latest active threads, with the latest message,
> offset
> > by y.
> >        - retrieve x latest messages from a thread.
> >        - retrieve x latest messages from a thread, offset by y.
> >
> > I've come up with a few possible methods for modelling this. But any
> > insights would be greatly appreciated.
> >
> > Thanks in advance,
> >
> > Mark
> >
> >
> > Possible solution 1:
> >
> > TABLE:          threads
> > KEY:            userID : threadID
> > COLUMN:         latest_message
> >
> > TABLE:          messages
> > KEY:            userID : threadID : timestamp
> > COLUMN:         message
> >
> > Messages are first written to the messages table, and then the threads
> > table's thread is updated with the latest message.
> >
> > To fetch the latest x active threads, with the latest message:
> >
> >        - I retrieve all threads and then sort and reduce the results on
> the
> > client.
> >
> > A concern with this is the fetching of all threads to sort on each
> request.
> > This could be unwieldy!
> >
> >
> > Possible solution 2:
> >
> > TABLE:          threads
> > KEY:            userID : timestamp : threadID
> > COLUMN:         latest_message
> >
> > TABLE:          messages
> > KEY:            userID : threadID : timestamp
> > COLUMN:         message
> >
> > Messages are first written to the messages table, and then the threads
> > table's is updated with the latest message. The previous latest message
> is
> > then deleted from the threads table.
> >
> > To fetch the latest x active threads, with the latest message:
> >
> >        - I scan the threads table until I get x unique threads.
> >
> > A concern with this could be the issue of keeping the threads table in
> sync
> > with the messages table - especially with the deletion of old latest
> > messages.
> >
> >
> > Possible solution 3:
> >
> > TABLE:          messages
> > KEY:            userID : timestamp : threadID
> > COLUMN:         message
> >
> > To fetch the latest x active threads, with the latest message:
> >
> >        - I scan the messages table until I get x unique threads.
> >
> > One of my concerns with this method is that some threads will be busier
> > than others, forcing a scan through nearly all of a user's messages. And
> > there will be an ever increasing number of messages. A periodic archiving
> > process - moving older messages to another table - might alleviate things
> > here.
> >
> >
> > Possible solution 4:
> >
> > Use SOLR/Elastic search or equivalent.
> >
> >
> >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message