samoa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Albert Bifet <abi...@waikato.ac.nz>
Subject Re: New Instances
Date Mon, 26 Jan 2015 06:30:00 GMT
Nice and simple API! Some things to comment:

- how can we manage discrete attributes, for example attribute class:
"+","-"?

- In sparse instances, is the performance of a map similar to the
performance of two arrays, one for indices and one for values?

Albert

On Sat, Jan 24, 2015 at 1:38 AM, Matthieu Morel <matthieu.morel@gmail.com>
wrote:

> I took a shot at drafting a simplified API for instances.
> https://github.com/matthieumorel/samoa/tree/new-instances
>
> As pointed out in this thread, the current API is too exhaustive, too
> tied to a specific implementation, and too tied to WEKA/MOA APIs.
>
> In addition, I feel the header/information does not belong to the
> instance. This is something which is used when parsing arff files
> where static information about the stream is available from the start.
> But for a real streaming use case, we should not make such assumption.
> Whatever is known at the begining should be loaded by the topology,
> but not included in the instances. Arff files can still be loaded and
> generate instances in the new format. Only the headers should be
> parsed separately.
>
> This proposal is a draft and single label only. It should be easy to
> add functionality suggested by Albert for multi labels.
>
> Feel free to comment!
>
> Matthieu
>
>
>
>
> On Wed, Jan 21, 2015 at 2:31 AM, Albert Bifet <abifet@waikato.ac.nz>
> wrote:
> > 1/ Learners as decision trees can deal with new instances that arrive
> > with more label classes. New instances can arrive with new headers.
> >
> > 2/ To change class labels dynamically, we need to add a method
> > "setValue(int, string)" in the Attribute class to add dynamically new
> > attribute values.
> >
> > 3/ The current design is being compatible with the methods in weka
> > instances. It could be nice to have a fresher design. I will need some
> > help to have a simplified and fresher design of the instances as I'm a
> > bit conditioned by the previous instance usage :)
> >
> > Thanks,
> >
> > Albert
> >
> >
> >
> > On Wed, Jan 21, 2015 at 2:33 AM, Olivier Van Laere
> > <oliviervanlaere@gmail.com> wrote:
> >> Hey Matthieu,
> >>
> >>> On Jan 20, 2015, at 1:47 AM, Matthieu Morel <matthieu.morel@gmail.com>
> wrote:
> >>>
> >>> I'm confused. From what I see the number of classes is currently fixed
> >>> in the instance header. As if it was static. I suppose you can work
> >>> around that limitation with some hacks but I want to use a clean API
> >>> for that.
> >>>
> >>> Or is there a recommended way I'm missing?
> >>
> >> Ah, I think I remember now what happened. As far as I encountered this
> situation, the data had say an .arff format with a header stating the
> number of class values, and the instance header was read from that, while
> the instances were then read by the line.
> >>
> >> I worked around that by just storing the class label seen in the
> instances on the fly when building a model, and ignored that field of the
> instance header. Sorry for the confusion!
> >>
> >> Cheers,
> >> Olivier
> >>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message