spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Meeraj Kunnumpurath <mee...@servicesymphony.com>
Subject Re: Logistic Regression Match Error
Date Sat, 19 Nov 2016 19:26:05 GMT
Thank you, it was the escape character, option("escape", "\"")

Regards

On Sat, Nov 19, 2016 at 11:10 PM, Meeraj Kunnumpurath <
meeraj@servicesymphony.com> wrote:

> I triied .option("quote", "\""), which I believe is the default, still the
> same error. This is the offending record.
>
> Primo 4-In-1 Soft Seat Toilet Trainer and Step Stool White with Pastel
> Blue Seat,"I chose this potty for my son because of the good reviews. I do
> not like it. I'm honestly baffled by all the great reviews now that I have
> this thing in front of me.1)It is made of cheap material, feels flimsy, the
> grips on the bottom of the thing do nothing to keep it in place when the
> child sits on it.2)It comes apart into 5 or 6 different pieces and all my
> son likes to do is take it apart. I did not want a potty that would turn
> into a toy, and this has just become like a puzzle for him, with all the
> different pieces.3)It is a little big for him. He is young still but he's a
> big boy for his age. I looked at one of the pictures posted and he looks
> about the same size as the curly haired kid reading the book, but the potty
> in that picture is NOT this potty! This one is a little bigger and he can't
> get quite touch his feet on the ground, which is important.4)And one final
> thing, maybe most importantly, the ""soft"" seat is not so soft. Doesn't
> seem very comfortable to me. It's just plastic on top of plastic... and
> after my son sits on it for just a few minutes his butt has horrible red
> marks all over it! Definitely not comfortable.So, overall, i'm not
> impressed at all.I gave it 2 stars because... it gets the job done I
> suppose, and for a child a little bit older than my son it might fit a
> little better. Also I really liked the idea that it was 4-in-1.Overall
> though, I do not suggest getting this potty. Look elseware!It's probably
> best to actually go to a store and look at them first hand, and not order
> online. That's what I should have done in the first place.",2
>
> On Sat, Nov 19, 2016 at 10:59 PM, Meeraj Kunnumpurath <
> meeraj@servicesymphony.com> wrote:
>
>> Digging through it looks like an issue with reading CSV. Some of the data
>> have embedded commas in them, these fields are rightly quoted. However, the
>> CSV reader seems to be getting to a pickle, when the records contain quoted
>> and unquoted data. Fields are only quoted, when there are commas within the
>> fields, otherwise they are unquoted.
>>
>> Regards
>> Meeraj
>>
>> On Sat, Nov 19, 2016 at 10:10 PM, Meeraj Kunnumpurath <
>> meeraj@servicesymphony.com> wrote:
>>
>>> Hello,
>>>
>>> I have the following code that trains a mapping of review text to
>>> ratings. I use a tokenizer to get all the words from the review, and use a
>>> count vectorizer to get all the words. However, when I train the classifier
>>> I get a match error. Any pointers will be very helpful.
>>>
>>> The code is below,
>>>
>>> val spark = SparkSession.builder().appName("Logistic Regression").master("local").getOrCreate()
>>> import spark.implicits._
>>>
>>> val df = spark.read.option("header", "true").option("inferSchema", "true").csv("data/amazon_baby.csv")
>>> val tk = new Tokenizer().setInputCol("review").setOutputCol("words")
>>> val cv = new CountVectorizer().setInputCol("words").setOutputCol("features")
>>>
>>> val isGood = udf((x: Int) => if (x >= 4) 1 else 0)
>>>
>>> val words = tk.transform(df.withColumn("label", isGood('rating)))
>>> val Array(training, test) = cv.fit(words).transform(words).randomSplit(Array(0.8,
0.2), 1)
>>>
>>> val classifier = new LogisticRegression()
>>>
>>> training.show(10)
>>>
>>> val simpleModel = classifier.fit(training)
>>> simpleModel.evaluate(test).predictions.select("words", "label", "prediction",
"probability").show(10)
>>>
>>>
>>> And the error I get is below.
>>>
>>> 16/11/19 22:06:45 ERROR Executor: Exception in task 0.0 in stage 8.0
>>> (TID 9)
>>> scala.MatchError: [null,1.0,(257358,[0,1,2,3,4,5
>>> ,6,7,8,9,10,13,15,16,20,25,27,29,34,37,40,42,45,48,49,52,58,
>>> 68,71,76,77,86,89,93,98,99,100,108,109,116,122,124,129,169,2
>>> 08,219,221,235,249,255,260,353,355,371,431,442,641,711,972,
>>> 1065,1411,1663,1776,1925,2596,2957,3355,3828,4860,6288,7294,
>>> 8951,9758,12203,18319,21779,48525,72732,75420,146476,
>>> 192184],[3.0,8.0,1.0,1.0,4.0,2.0,7.0,4.0,2.0,1.0,1.0,2.0,1.0
>>> ,4.0,3.0,1.0,1.0,1.0,1.0,1.0,5.0,1.0,1.0,1.0,2.0,2.0,1.0,1.0
>>> ,1.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0
>>> ,1.0,2.0,1.0,2.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
>>> ,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
>>> ,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0])] (of class
>>> org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema)
>>> at org.apache.spark.ml.classification.LogisticRegression$$anonf
>>> un$6.apply(LogisticRegression.scala:266)
>>> at org.apache.spark.ml.classification.LogisticRegression$$anonf
>>> un$6.apply(LogisticRegression.scala:266)
>>> at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>>> at org.apache.spark.storage.memory.MemoryStore.putIteratorAsVal
>>> ues(MemoryStore.scala:214)
>>> at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator
>>> $1.apply(BlockManager.scala:919)
>>> at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator
>>> $1.apply(BlockManager.scala:910)
>>> at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:866)
>>> at org.apache.spark.storage.BlockManager.doPutIterator(BlockMan
>>> ager.scala:910)
>>> at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockM
>>> anager.scala:668)
>>> at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:330)
>>>
>>> Many thanks
>>> --
>>> *Meeraj Kunnumpurath*
>>>
>>>
>>> *Director and Executive PrincipalService Symphony Ltd00 44 7702 693597*
>>>
>>> *00 971 50 409 0169meeraj@servicesymphony.com
>>> <meeraj@servicesymphony.com>*
>>>
>>
>>
>>
>> --
>> *Meeraj Kunnumpurath*
>>
>>
>> *Director and Executive PrincipalService Symphony Ltd00 44 7702 693597*
>>
>> *00 971 50 409 0169meeraj@servicesymphony.com
>> <meeraj@servicesymphony.com>*
>>
>
>
>
> --
> *Meeraj Kunnumpurath*
>
>
> *Director and Executive PrincipalService Symphony Ltd00 44 7702 693597*
>
> *00 971 50 409 0169meeraj@servicesymphony.com <meeraj@servicesymphony.com>*
>



-- 
*Meeraj Kunnumpurath*


*Director and Executive PrincipalService Symphony Ltd00 44 7702 693597*

*00 971 50 409 0169meeraj@servicesymphony.com <meeraj@servicesymphony.com>*

Mime
View raw message