samoa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthieu Morel <matthieu.mo...@gmail.com>
Subject Re: New Instances
Date Fri, 23 Jan 2015 17:38:57 GMT
I took a shot at drafting a simplified API for instances.
https://github.com/matthieumorel/samoa/tree/new-instances

As pointed out in this thread, the current API is too exhaustive, too
tied to a specific implementation, and too tied to WEKA/MOA APIs.

In addition, I feel the header/information does not belong to the
instance. This is something which is used when parsing arff files
where static information about the stream is available from the start.
But for a real streaming use case, we should not make such assumption.
Whatever is known at the begining should be loaded by the topology,
but not included in the instances. Arff files can still be loaded and
generate instances in the new format. Only the headers should be
parsed separately.

This proposal is a draft and single label only. It should be easy to
add functionality suggested by Albert for multi labels.

Feel free to comment!

Matthieu




On Wed, Jan 21, 2015 at 2:31 AM, Albert Bifet <abifet@waikato.ac.nz> wrote:
> 1/ Learners as decision trees can deal with new instances that arrive
> with more label classes. New instances can arrive with new headers.
>
> 2/ To change class labels dynamically, we need to add a method
> "setValue(int, string)" in the Attribute class to add dynamically new
> attribute values.
>
> 3/ The current design is being compatible with the methods in weka
> instances. It could be nice to have a fresher design. I will need some
> help to have a simplified and fresher design of the instances as I'm a
> bit conditioned by the previous instance usage :)
>
> Thanks,
>
> Albert
>
>
>
> On Wed, Jan 21, 2015 at 2:33 AM, Olivier Van Laere
> <oliviervanlaere@gmail.com> wrote:
>> Hey Matthieu,
>>
>>> On Jan 20, 2015, at 1:47 AM, Matthieu Morel <matthieu.morel@gmail.com>
wrote:
>>>
>>> I'm confused. From what I see the number of classes is currently fixed
>>> in the instance header. As if it was static. I suppose you can work
>>> around that limitation with some hacks but I want to use a clean API
>>> for that.
>>>
>>> Or is there a recommended way I'm missing?
>>
>> Ah, I think I remember now what happened. As far as I encountered this situation,
the data had say an .arff format with a header stating the number of class values, and the
instance header was read from that, while the instances were then read by the line.
>>
>> I worked around that by just storing the class label seen in the instances on the
fly when building a model, and ignored that field of the instance header. Sorry for the confusion!
>>
>> Cheers,
>> Olivier
>>
>>

Mime
View raw message