hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@maprtech.com>
Subject Re: Modelling threaded messages
Date Fri, 01 Apr 2011 04:27:14 GMT
Solr/Elastic search is a fine solution, but probably won't be quite as fast
as a well-tuned hbase solution.

One key assumption you seem to be making is that you will store messages
only once.  If you are willing to make multiple updates to tables, then you
can arrange the natural ordering of the table to get what you want.  For
instance, you could keep the most recent messages (say the last 10 from each
of the 1000 most recently updated threads) in an in memory table.  Then you
could store messages in a thread table indexed by thread:timestamp.  Finally
you could store messages in a table indexed by user:thread or
user:timestamp.  This would allow you to display the most recent messages or
thread in near zero time, to display all or the most recent messages from a
particular thread with only one retrieval and all of the messages from a
particular user in time order in one retrieval.

On Thu, Mar 31, 2011 at 5:56 PM, Mark Jarecki <mjarecki@bigpond.net.au>wrote:

> Hi all,
>
> I'm modelling a schema for storing and retrieving threaded messages, where,
> for planning purposes:
>
>        - there are many millions of users.
>        - a user might have up to 1000 threads.
>        - each thread might have up to 50000 messages (with some threads
> being sparse with only a few messages).
>        - the Stargate REST interface is used.
>
> I want to be able to execute the following queries:
>
>        - retrieve x latest active threads, with the latest message.
>        - retrieve x latest active threads, with the latest message, offset
> by y.
>        - retrieve x latest messages from a thread.
>        - retrieve x latest messages from a thread, offset by y.
>
> I've come up with a few possible methods for modelling this. But any
> insights would be greatly appreciated.
>
> Thanks in advance,
>
> Mark
>
>
> Possible solution 1:
>
> TABLE:          threads
> KEY:            userID : threadID
> COLUMN:         latest_message
>
> TABLE:          messages
> KEY:            userID : threadID : timestamp
> COLUMN:         message
>
> Messages are first written to the messages table, and then the threads
> table's thread is updated with the latest message.
>
> To fetch the latest x active threads, with the latest message:
>
>        - I retrieve all threads and then sort and reduce the results on the
> client.
>
> A concern with this is the fetching of all threads to sort on each request.
> This could be unwieldy!
>
>
> Possible solution 2:
>
> TABLE:          threads
> KEY:            userID : timestamp : threadID
> COLUMN:         latest_message
>
> TABLE:          messages
> KEY:            userID : threadID : timestamp
> COLUMN:         message
>
> Messages are first written to the messages table, and then the threads
> table's is updated with the latest message. The previous latest message is
> then deleted from the threads table.
>
> To fetch the latest x active threads, with the latest message:
>
>        - I scan the threads table until I get x unique threads.
>
> A concern with this could be the issue of keeping the threads table in sync
> with the messages table - especially with the deletion of old latest
> messages.
>
>
> Possible solution 3:
>
> TABLE:          messages
> KEY:            userID : timestamp : threadID
> COLUMN:         message
>
> To fetch the latest x active threads, with the latest message:
>
>        - I scan the messages table until I get x unique threads.
>
> One of my concerns with this method is that some threads will be busier
> than others, forcing a scan through nearly all of a user's messages. And
> there will be an ever increasing number of messages. A periodic archiving
> process - moving older messages to another table - might alleviate things
> here.
>
>
> Possible solution 4:
>
> Use SOLR/Elastic search or equivalent.
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message