mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <jake.man...@gmail.com>
Subject Re: Replacing the Netflix data set
Date Fri, 07 May 2010 20:25:37 GMT
If you're willing to live with a social network graph for this example, then
even bigger than LJ, but still public, is the twitter social graph,
available
as a torrent <http://an.kaist.ac.kr/traces/WWW2010.html>, which I've also
put on S3 and just need to make public at
some point.  It has 1.47 billion connections on 47 million nodes.

  -jake

On Fri, May 7, 2010 at 8:38 AM, Sean Owen <srowen@gmail.com> wrote:

> Cool, yeah I'm looking for something even larger, since this is small
> enough that processing it easily fits on one computer. The chapter in
> question is about distributing via Hadoop.
>
> My current next-best option, if it can be used, is the LiveJournal
> network data here:
> http://snap.stanford.edu/data/index.html
>
> On Fri, May 7, 2010 at 4:29 PM, Pedro Oliveira <cpdomina@gmail.com> wrote:
> > This dataset seems to have a few million <user, artist, plays> triples
> from
> > last.fm:
> > http://mtg.upf.edu/node/1671
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message