mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wettin <karl.wet...@gmail.com>
Subject Re: Clustering Demo
Date Sat, 17 May 2008 19:22:29 GMT

17 maj 2008 kl. 21.01 skrev Isabel Drost:
> On Saturday 17 May 2008, Lukas Vlcek wrote:
>> http://archive.ics.uci.edu/ml/datasets.html?format=&task=clu&att=&area=&num
>> Att=&numIns=&type=&sort=taskUp&view=table
>>
>> Some of those data sets are reasonably small so that they could be
>> integrated into Mahout unit tests by default (sounds like crazy  
>> idea?).
>
> Hmm. If we want to integrate them in unit tests, we should have a  
> look at the
> license of these datasets. But for examples, it might be ok, if  
> users simply
> download the dataset from the uci web page themselves.

+1

Actually, the Taste code already contains an example depends on a data  
set from GroupLens the user must download. For examples I don't mind  
at all, especially if the data is good. For unit tests we really think  
we want data that can be redistributed by us.

Once again I'm taking the oppertunity to point out the synthetic data  
generator at http://www.datasetgenerator.com/ is excellent for unit-  
and load testing. The generator C source code has been donated by  
Gabor Melli (thanks again) to Mahout and is available in the issue  
tracker!


       karl

Mime
View raw message