spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sanjay Subramanian <sanjaysubraman...@yahoo.com.INVALID>
Subject Re: How to identify erroneous input record ?
Date Wed, 24 Dec 2014 16:35:55 GMT
DOH Looks like I did not have enough coffee before I asked this :-) I added the if statement...var
demoRddFilter = demoRdd.filter(line => !line.contains("ISR$CASE$I_F_COD$FOLL_SEQ") || !line.contains("primaryid$caseid$caseversion"))
var demoRddFilterMap = demoRddFilter.map(line => {
  if (line.split('$').length >= 13){
    line.split('$')(0) + "~" + line.split('$')(5) + "~" + line.split('$')(11) + "~" + line.split('$')(12)
  }
})

      From: Sanjay Subramanian <sanjaysubramanian@yahoo.com.INVALID>
 To: "user@spark.apache.org" <user@spark.apache.org> 
 Sent: Wednesday, December 24, 2014 8:28 AM
 Subject: How to identify erroneous input record ?
   
hey guys 
One of my input records has an problem that makes the code fail.
var demoRddFilter = demoRdd.filter(line => !line.contains("ISR$CASE$I_F_COD$FOLL_SEQ")
|| !line.contains("primaryid$caseid$caseversion"))

var demoRddFilterMap = demoRddFilter.map(line => line.split('$')(0) + "~" + line.split('$')(5)
+ "~" + line.split('$')(11) + "~" + line.split('$')(12))demoRddFilterMap.saveAsTextFile("/data/aers/msfx/demo/"
+ outFile)
This is possibly happening because perhaps one input record may not have 13 fields.If this
were Hadoop mapper code , I have 2 ways to solve this 1. test the number of fields of each
line before applying the map function2. enclose the mapping function in a try catch block
so that the mapping function only fails for the erroneous recordHow do I implement 1. or 2.
in the Spark code ?Thanks
sanjay

  #yiv8750085330 #yiv8750085330 -- filtered {font-family:Helvetica;panose-1:2 11 6 4 2 2 2
2 2 4;}#yiv8750085330 filtered {panose-1:2 4 5 3 5 4 6 3 2 4;}#yiv8750085330 filtered {font-family:Calibri;panose-1:2
15 5 2 2 2 4 3 2 4;}#yiv8750085330 p.yiv8750085330MsoNormal, #yiv8750085330 li.yiv8750085330MsoNormal,
#yiv8750085330 div.yiv8750085330MsoNormal {margin:0cm;margin-bottom:.0001pt;font-size:11.0pt;}#yiv8750085330
a:link, #yiv8750085330 span.yiv8750085330MsoHyperlink {color:#0563C1;text-decoration:underline;}#yiv8750085330
a:visited, #yiv8750085330 span.yiv8750085330MsoHyperlinkFollowed {color:#954F72;text-decoration:underline;}#yiv8750085330
p.yiv8750085330MsoListParagraph, #yiv8750085330 li.yiv8750085330MsoListParagraph, #yiv8750085330
div.yiv8750085330MsoListParagraph {margin-top:0cm;margin-right:0cm;margin-bottom:0cm;margin-left:36.0pt;margin-bottom:.0001pt;font-size:11.0pt;}#yiv8750085330
span.yiv8750085330EstiloCorreo17 {color:windowtext;}#yiv8750085330 .yiv8750085330MsoChpDefault
{}#yiv8750085330 filtered {margin:70.85pt 3.0cm 70.85pt 3.0cm;}#yiv8750085330 div.yiv8750085330WordSection1
{}#yiv8750085330 filtered {}#yiv8750085330 filtered {}#yiv8750085330 filtered {}#yiv8750085330
filtered {}#yiv8750085330 filtered {}#yiv8750085330 filtered {}#yiv8750085330 filtered {}#yiv8750085330
filtered {}#yiv8750085330 filtered {}#yiv8750085330 filtered {}#yiv8750085330 ol {margin-bottom:0cm;}#yiv8750085330
ul {margin-bottom:0cm;}#yiv8750085330 

  
Mime
View raw message