spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Rowe <sar...@gmail.com>
Subject Re: Writing custom Transformers and Estimators like Tokenizer in spark ML
Date Tue, 02 Aug 2016 00:25:31 GMT
UnaryTransformer’s scaladoc says "Abstract class for transformers that take one input column,
apply transformation, and output the result as a new column.”

If you want to allow specification of more than one input column, or if your output column
already exists, or you want multiple output columns, then you can’t use UnaryTransformer.
 

If all of the above conditions are met, though, UnaryTransformer will simplify your subclass.

BTW the scaladocs for StringType say "The data type representing `String` values. Please use
the singleton [[DataTypes.StringType]].” <- do that instead of calling StringType’s
ctor.

--
Steve
www.lucidworks.com

> On Aug 1, 2016, at 2:30 PM, janardhan shetty <janardhanp22@gmail.com> wrote:
> 
> What is the difference between UnaryTransformer and Transformer classes. In which scenarios
should we use  one or the other ?
> 
> On Sun, Jul 31, 2016 at 8:27 PM, janardhan shetty <janardhanp22@gmail.com> wrote:
> Developing in scala but any help with difference between UnaryTransformer (Is this experimental
still ?)and Transformer class is appreciated.
> 
> Right now encountering  error for the code which extends UnaryTransformer
> override protected def outputDataType: DataType = new StringType
> 
> Error:(26, 53) constructor StringType in class StringType cannot be accessed in class
Capitalizer
>   override protected def outputDataType: DataType = new StringType
>                                                     ^
> 
> 
> On Thu, Jul 28, 2016 at 8:20 PM, Phuong LE-HONG <phuonglh@gmail.com> wrote:
> Hi,
> 
> I've developed a simple ML estimator (in Java) that implements
> conditional Markov model for sequence labelling in Vitk toolkit. You
> can check it out here:
> 
> https://github.com/phuonglh/vn.vitk/blob/master/src/main/java/vn/vitk/tag/CMM.java
> 
> Phuong Le-Hong
> 
> On Fri, Jul 29, 2016 at 9:01 AM, janardhan shetty
> <janardhanp22@gmail.com> wrote:
> > Thanks Steve.
> >
> > Any pointers to custom estimators development as well ?
> >
> > On Wed, Jul 27, 2016 at 11:35 AM, Steve Rowe <sarowe@gmail.com> wrote:
> >>
> >> You can see the source for my transformer configurable bridge to Lucene
> >> analysis components here, in my company Lucidworks’ spark-solr project:
> >> <https://github.com/lucidworks/spark-solr/blob/master/src/main/scala/com/lucidworks/spark/ml/feature/LuceneTextAnalyzerTransformer.scala>.
> >>
> >> Here’s a blog I wrote about using this transformer, as well as
> >> non-ML-context use in Spark of the underlying analysis component, here:
> >> <https://lucidworks.com/blog/2016/04/13/spark-solr-lucenetextanalyzer/>.
> >>
> >> --
> >> Steve
> >> www.lucidworks.com
> >>
> >> > On Jul 27, 2016, at 1:31 PM, janardhan shetty <janardhanp22@gmail.com>
> >> > wrote:
> >> >
> >> > 1.  Any links or blogs to develop custom transformers ? ex: Tokenizer
> >> >
> >> > 2. Any links or blogs to develop custom estimators ? ex: any ml
> >> > algorithm
> >>
> >
> 
> 


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message