spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Neville Li <neville....@gmail.com>
Subject Re: hadoopRDD stalls reading entire directory
Date Mon, 16 Jun 2014 02:12:24 GMT
We use Avro 1.7.4 and specific records only. Looks like this as I replied
earlier.

import org.apache.spark.serializer.KryoRegistrator
import com.esotericsoftware.kryo.Kryo
import com.twitter.chill.avro.AvroSerializer

class AvroRegistrator extends KryoRegistrator {
  override def registerClasses(kryo: Kryo) {
    kryo.register(classOf[SomeAvroClass],
AvroSerializer.SpecificRecordBinarySerializer[SomeAvroClass])
  }
}


On Sun, Jun 15, 2014 at 9:18 PM, Russell Jurney <russell.jurney@gmail.com>
wrote:

> Neville, can you please share com.spotify.spark.AvroRegistrator?
>
>
> On Tue, Jun 3, 2014 at 10:40 PM, Neville Li <neville.lyh@gmail.com> wrote:
>
>> I managed to pass custom registrator via a conf file and some extra
>> flags. The extra jar contains compiled avro classes and custom registrator.
>>
>> spark-shell --properties-file spark-shell.conf --jars <JAR_FILE>
>>
>> # spark-shell.conf
>> spark.serializer        org.apache.spark.serializer.KryoSerializer
>> spark.kryo.registrator  com.spotify.spark.AvroRegistrator
>>
>>
>> On Wed, Jun 4, 2014 at 1:28 AM, Matei Zaharia <matei.zaharia@gmail.com>
>> wrote:
>>
>>> BTW why did hadoopFile not work on a directory? You should be able to
>>> just put in the path of the directory AFAIK (or maybe do directory/*.avro),
>>> though there might be something special with Avro files.
>>>
>>> Matei
>>>
>>> On Jun 3, 2014, at 9:53 PM, Russell Jurney <russell.jurney@gmail.com>
>>> wrote:
>>>
>>> Having a little trouble getting my serializer registered:
>>>
>>> scala> class MyAvroRegistrator extends KryoRegistrator {
>>>      |   override def registerClasses(kryo: Kryo) {
>>>      |     kryo.register(classOf[GenericRecord],
>>> AvroSerializer.GenericRecordSerializer[GenericRecord])
>>>      |     // more Avro types...
>>>      |   }
>>>      | }
>>> <console>:36: error: missing arguments for method
>>> GenericRecordSerializer in object AvroSerializer;
>>> follow this method with `_' if you want to treat it as a partially
>>> applied function
>>>            kryo.register(classOf[GenericRecord],
>>> AvroSerializer.GenericRecordSerializer[GenericRecord])
>>>
>>>
>>> Looking at the class for AvroSerializer, I see:
>>>
>>>   def GenericRecordSerializer[T <: GenericRecord : Manifest](schema:
>>> Schema = null): KSerializer[T] = {
>>>     implicit val inj =  GenericAvroCodecs[T](schema)
>>>     InjectiveSerializer.asKryo
>>>   }
>>>
>>>
>>> Now I'm stuck. It looks like I should be able to specify a null schema
>>> but I can't make it accept that.
>>>
>>>
>>> hadoopFile actually works for me already - but it can't load an entire
>>> directory of avros. I have to use hadoopRDD with a Hadoop JobConf:
>>>
>>> val rdd = sc.hadoopRDD(
>>>   jobConf,
>>>   classOf[org.apache.avro.mapred.AvroInputFormat[GenericRecord]],
>>>   classOf[org.apache.avro.mapred.AvroWrapper[GenericRecord]],
>>>   classOf[org.apache.hadoop.io.NullWritable],
>>>   1)
>>>
>>>
>>> Sorry I'm so slow on the uptake, and thanks!
>>>
>>>
>>> On Tue, Jun 3, 2014 at 9:39 PM, Neville Li <neville.lyh@gmail.com>
>>> wrote:
>>>
>>>> This works for us. I've never tried generic record but it should work.
>>>> sc.hadoopFile(path, classOf[AvroInputFormat[MyAvroRecord]],
>>>> classOf[AvroWrapper[MyAvroRecord]], classOf[NullWritable]).map(_._1.datum)
>>>>
>>>> Also there's some issue with getting identical records so you might
>>>> wanna do anther map with explicit copying.
>>>>
>>>> http://mail-archives.apache.org/mod_mbox/spark-user/201312.mbox/%3CCAES1rLT4fXCYnMLANXk6xjR=hBTj_KXwq9M1LBpqzxsj-HtcsA@mail.gmail.com%3E
>>>>
>>>>
>>>> On Wed, Jun 4, 2014 at 12:31 AM, Russell Jurney <
>>>> russell.jurney@gmail.com> wrote:
>>>>
>>>>> One other thing - having registered my classes - how do I actually
>>>>> load the data?
>>>>>
>>>>>
>>>>> On Tue, Jun 3, 2014 at 9:23 PM, Russell Jurney <
>>>>> russell.jurney@gmail.com> wrote:
>>>>>
>>>>>> Thanks! Will this work with GenericRecords?
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Jun 3, 2014 at 9:00 PM, Neville Li <neville.lyh@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Your registrator should look like this:
>>>>>>>
>>>>>>> import org.apache.spark.serializer.KryoRegistrator
>>>>>>> import com.esotericsoftware.kryo.Kryo
>>>>>>> import com.twitter.chill.avro.AvroSerializer
>>>>>>>
>>>>>>> class MyAvroRegistrator extends KryoRegistrator {
>>>>>>>   override def registerClasses(kryo: Kryo) {
>>>>>>>     kryo.register(classOf[MyAvroRecord],
>>>>>>> AvroSerializer.SpecificRecordBinarySerializer[MyAvroRecord])
>>>>>>>     // more Avro types...
>>>>>>>   }
>>>>>>> }
>>>>>>>
>>>>>>> You might need twitter/chill >= 0.3.6 though, and that's only in
>>>>>>> spark >= 1.0.0.
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jun 3, 2014 at 11:45 PM, Russell Jurney <
>>>>>>> russell.jurney@gmail.com> wrote:
>>>>>>>
>>>>>>>> I keep searching, but I can't find anything that comes close to
>>>>>>>> explaining how to use Kryo with Avro. Anyone got a link to an example,
>>>>>>>> anything? I see the chill project has chill-avro, but how that would map to
>>>>>>>> Spark is... I'm lost.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jun 3, 2014 at 12:10 PM, Russell Jurney <
>>>>>>>> russell.jurney@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Any pointers to get me started with Kryo and Avro?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Jun 3, 2014 at 9:03 AM, Andrew Ash <andrew@andrewash.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> You can configure Kryo to delegate to Avro for serialization
>>>>>>>>>> though, so you end up having objects serialized in Avro flying around the
>>>>>>>>>> network.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Jun 3, 2014 at 8:56 AM, Sandy Ryza <
>>>>>>>>>> sandy.ryza@cloudera.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Russell,
>>>>>>>>>>>
>>>>>>>>>>> You'll need to use Kryo to get around this.  Unfortunately Avro
>>>>>>>>>>> classes are not marked serializable.
>>>>>>>>>>>
>>>>>>>>>>> -Sandy
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jun 2, 2014 at 3:55 PM, Russell Jurney <
>>>>>>>>>>> russell.jurney@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I rebooted my master and then restarted each Worker. Now I see
>>>>>>>>>>>> multiple nodes under the executor tab at hivecluster2:4040.
>>>>>>>>>>>>
>>>>>>>>>>>> Now 6/18 tasks work, but one task fails and it stops progress.
>>>>>>>>>>>> Its error message is this:
>>>>>>>>>>>>
>>>>>>>>>>>> java.io.NotSerializableException
>>>>>>>>>>>> (java.io.NotSerializableException: org.apache.avro.mapred.AvroWrapper)
>>>>>>>>>>>>
>>>>>>>>>>>> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
>>>>>>>>>>>> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
>>>>>>>>>>>> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
>>>>>>>>>>>> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
>>>>>>>>>>>> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
>>>>>>>>>>>> java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377)
>>>>>>>>>>>> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173)
>>>>>>>>>>>> java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
>>>>>>>>>>>> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:28)
>>>>>>>>>>>> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:48)
>>>>>>>>>>>> org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:223)
>>>>>>>>>>>> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:46)
>>>>>>>>>>>> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:45)java.security.AccessController.doPrivileged(Native
>>>>>>>>>>>> Method)javax.security.auth.Subject.doAs(Subject.java:415)
>>>>>>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>>>>>>>>>>>> org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:45)
>>>>>>>>>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
>>>>>>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>>>>>>>> java.lang.Thread.run(Thread.java:744)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Jun 2, 2014 at 3:42 PM, Russell Jurney <
>>>>>>>>>>>> russell.jurney@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks, that clears up the dual port 4040 server issue. The
>>>>>>>>>>>>> system still just pauses and waits forever, however:
>>>>>>>>>>>>>
>>>>>>>>>>>>> scala> rdd.first
>>>>>>>>>>>>> 14/06/02 15:36:00 INFO mapred.FileInputFormat: Total input
>>>>>>>>>>>>> paths to process : 17
>>>>>>>>>>>>> 14/06/02 15:36:00 INFO spark.SparkContext: Starting job: first
>>>>>>>>>>>>> at <console>:46
>>>>>>>>>>>>> 14/06/02 15:36:00 INFO scheduler.DAGScheduler: Got job 0
>>>>>>>>>>>>> (first at <console>:46) with 1 output partitions (allowLocal=true)
>>>>>>>>>>>>> 14/06/02 15:36:00 INFO scheduler.DAGScheduler: Final stage:
>>>>>>>>>>>>> Stage 0 (first at <console>:46)
>>>>>>>>>>>>> 14/06/02 15:36:00 INFO scheduler.DAGScheduler: Parents of
>>>>>>>>>>>>> final stage: List()
>>>>>>>>>>>>> 14/06/02 15:36:00 INFO scheduler.DAGScheduler: Missing
>>>>>>>>>>>>> parents: List()
>>>>>>>>>>>>> 14/06/02 15:36:00 INFO scheduler.DAGScheduler: Computing the
>>>>>>>>>>>>> requested partition locally
>>>>>>>>>>>>> 14/06/02 15:36:00 INFO rdd.HadoopRDD: Input split:
>>>>>>>>>>>>> hdfs://hivecluster2/securityx/web_proxy_mef/2014/05/29/22/part-m-00000.avro:0+3864
>>>>>>>>>>>>> 14/06/02 15:36:00 INFO spark.SparkContext: Job finished: first
>>>>>>>>>>>>> at <console>:46, took 0.387464389 s
>>>>>>>>>>>>> 14/06/02 15:36:00 INFO spark.SparkContext: Starting job: first
>>>>>>>>>>>>> at <console>:46
>>>>>>>>>>>>> 14/06/02 15:36:00 INFO scheduler.DAGScheduler: Got job 1
>>>>>>>>>>>>> (first at <console>:46) with 16 output partitions (allowLocal=true)
>>>>>>>>>>>>> 14/06/02 15:36:00 INFO scheduler.DAGScheduler: Final stage:
>>>>>>>>>>>>> Stage 1 (first at <console>:46)
>>>>>>>>>>>>> 14/06/02 15:36:00 INFO scheduler.DAGScheduler: Parents of
>>>>>>>>>>>>> final stage: List()
>>>>>>>>>>>>> 14/06/02 15:36:00 INFO scheduler.DAGScheduler: Missing
>>>>>>>>>>>>> parents: List()
>>>>>>>>>>>>> 14/06/02 15:36:00 INFO scheduler.DAGScheduler: Submitting
>>>>>>>>>>>>> Stage 1 (HadoopRDD[0] at hadoopRDD at <console>:43), which has no missing
>>>>>>>>>>>>> parents
>>>>>>>>>>>>> 14/06/02 15:36:00 INFO scheduler.DAGScheduler: Submitting 16
>>>>>>>>>>>>> missing tasks from Stage 1 (HadoopRDD[0] at hadoopRDD at <console>:43)
>>>>>>>>>>>>> 14/06/02 15:36:00 INFO scheduler.TaskSchedulerImpl: Adding
>>>>>>>>>>>>> task set 1.0 with 16 tasks
>>>>>>>>>>>>> 14/06/02 15:36:15 WARN scheduler.TaskSchedulerImpl: Initial
>>>>>>>>>>>>> job has not accepted any resources; check your cluster UI to ensure that
>>>>>>>>>>>>> workers are registered and have sufficient memory
>>>>>>>>>>>>> 14/06/02 15:36:30 WARN scheduler.TaskSchedulerImpl: Initial
>>>>>>>>>>>>> job has not accepted any resources; check your cluster UI to ensure that
>>>>>>>>>>>>> workers are registered and have sufficient memory
>>>>>>>>>>>>> 14/06/02 15:36:45 WARN scheduler.TaskSchedulerImpl: Initial
>>>>>>>>>>>>> job has not accepted any resources; check your cluster UI to ensure that
>>>>>>>>>>>>> workers are registered and have sufficient memory
>>>>>>>>>>>>> 14/06/02 15:37:00 WARN scheduler.TaskSchedulerImpl: Initial
>>>>>>>>>>>>> job has not accepted any resources; check your cluster UI to ensure that
>>>>>>>>>>>>> workers are registered and have sufficient memory
>>>>>>>>>>>>> 14/06/02 15:37:15 WARN scheduler.TaskSchedulerImpl: Initial
>>>>>>>>>>>>> job has not accepted any resources; check your cluster UI to ensure that
>>>>>>>>>>>>> workers are registered and have sufficient memory
>>>>>>>>>>>>> 14/06/02 15:37:30 WARN scheduler.TaskSchedulerImpl: Initial
>>>>>>>>>>>>> job has not accepted any resources; check your cluster UI to ensure that
>>>>>>>>>>>>> workers are registered and have sufficient memory
>>>>>>>>>>>>> 14/06/02 15:37:45 WARN scheduler.TaskSchedulerImpl: Initial
>>>>>>>>>>>>> job has not accepted any resources; check your cluster UI to ensure that
>>>>>>>>>>>>> workers are registered and have sufficient memory
>>>>>>>>>>>>> 14/06/02 15:38:00 WARN scheduler.TaskSchedulerImpl: Initial
>>>>>>>>>>>>> job has not accepted any resources; check your cluster UI to ensure that
>>>>>>>>>>>>> workers are registered and have sufficient memory
>>>>>>>>>>>>> 14/06/02 15:38:15 WARN scheduler.TaskSchedulerImpl: Initial
>>>>>>>>>>>>> job has not accepted any resources; check your cluster UI to ensure that
>>>>>>>>>>>>> workers are registered and have sufficient memory
>>>>>>>>>>>>> 14/06/02 15:38:30 WARN scheduler.TaskSchedulerImpl: Initial
>>>>>>>>>>>>> job has not accepted any resources; check your cluster UI to ensure that
>>>>>>>>>>>>> workers are registered and have sufficient memory
>>>>>>>>>>>>> 14/06/02 15:38:45 WARN scheduler.TaskSchedulerImpl: Initial
>>>>>>>>>>>>> job has not accepted any resources; check your cluster UI to ensure that
>>>>>>>>>>>>> workers are registered and have sufficient memory
>>>>>>>>>>>>> 14/06/02 15:39:00 WARN scheduler.TaskSchedulerImpl: Initial
>>>>>>>>>>>>> job has not accepted any resources; check your cluster UI to ensure that
>>>>>>>>>>>>> workers are registered and have sufficient memory
>>>>>>>>>>>>> 14/06/02 15:39:15 WARN scheduler.TaskSchedulerImpl: Initial
>>>>>>>>>>>>> job has not accepted any resources; check your cluster UI to ensure that
>>>>>>>>>>>>> workers are registered and have sufficient memory
>>>>>>>>>>>>> 14/06/02 15:39:30 WARN scheduler.TaskSchedulerImpl: Initial
>>>>>>>>>>>>> job has not accepted any resources; check your cluster UI to ensure that
>>>>>>>>>>>>> workers are registered and have sufficient memory
>>>>>>>>>>>>> 14/06/02 15:39:45 WARN scheduler.TaskSchedulerImpl: Initial
>>>>>>>>>>>>> job has not accepted any resources; check your cluster UI to ensure that
>>>>>>>>>>>>> workers are registered and have sufficient memory
>>>>>>>>>>>>> 14/06/02 15:40:00 WARN scheduler.TaskSchedulerImpl: Initial
>>>>>>>>>>>>> job has not accepted any resources; check your cluster UI to ensure that
>>>>>>>>>>>>> workers are registered and have sufficient memory
>>>>>>>>>>>>> 14/06/02 15:40:15 WARN scheduler.TaskSchedulerImpl: Initial
>>>>>>>>>>>>> job has not accepted any resources; check your cluster UI to ensure that
>>>>>>>>>>>>> workers are registered and have sufficient memory
>>>>>>>>>>>>> 14/06/02 15:40:30 WARN scheduler.TaskSchedulerImpl: Initial
>>>>>>>>>>>>> job has not accepted any resources; check your cluster UI to ensure that
>>>>>>>>>>>>> workers are registered and have sufficient memory
>>>>>>>>>>>>> 14/06/02 15:40:45 WARN scheduler.TaskSchedulerImpl: Initial
>>>>>>>>>>>>> job has not accepted any resources; check your cluster UI to ensure that
>>>>>>>>>>>>> workers are registered and have sufficient memory
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Jun 2, 2014 at 2:57 PM, Aaron Davidson <
>>>>>>>>>>>>> ilikerps@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ah, I apologize! I didn't realize you were running from the
>>>>>>>>>>>>>> spark-shell. The shell has already created its own SparkContext, so you can
>>>>>>>>>>>>>> just do
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> sc.addJar("avro-1.7.6.jar")
>>>>>>>>>>>>>> sc.addJar("avro-mapred-1.7.6.jar")
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The previous instructions would have worked if you were
>>>>>>>>>>>>>> running your own Spark application where you control the creation of the
>>>>>>>>>>>>>> SparkContext.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Jun 2, 2014 at 2:02 PM, Russell Jurney <
>>>>>>>>>>>>>> russell.jurney@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Nothing appears to be running on hivecluster2:8080.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 'sudo jps' does show
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [hivedata@hivecluster2 ~]$ sudo jps
>>>>>>>>>>>>>>> 9953 PepAgent
>>>>>>>>>>>>>>> 13797 JournalNode
>>>>>>>>>>>>>>> 7618 NameNode
>>>>>>>>>>>>>>> 6574 Jps
>>>>>>>>>>>>>>> 12716 Worker
>>>>>>>>>>>>>>> 16671 RunJar
>>>>>>>>>>>>>>> 18675 Main
>>>>>>>>>>>>>>> 18177 JobTracker
>>>>>>>>>>>>>>> 10918 Master
>>>>>>>>>>>>>>> 18139 TaskTracker
>>>>>>>>>>>>>>> 7674 DataNode
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I kill all processes listed. I restart Spark Master on
>>>>>>>>>>>>>>> hivecluster2:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [hivedata@hivecluster2 ~]$ sudo
>>>>>>>>>>>>>>> /opt/cloudera/parcels/SPARK/lib/spark/sbin/start-master.sh
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> starting org.apache.spark.deploy.master.Master, logging to
>>>>>>>>>>>>>>> /var/log/spark/spark-root-org.apache.spark.deploy.master.Master-1-hivecluster2.out
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I run the spark shell again:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [hivedata@hivecluster2 ~]$ spark-shell -usejavacp
>>>>>>>>>>>>>>> -classpath "*.jar"
>>>>>>>>>>>>>>> 14/06/02 13:52:13 INFO spark.HttpServer: Starting HTTP Server
>>>>>>>>>>>>>>> 14/06/02 13:52:13 INFO server.Server: jetty-7.6.8.v20121106
>>>>>>>>>>>>>>> 14/06/02 13:52:13 INFO server.AbstractConnector: Started
>>>>>>>>>>>>>>> SocketConnector@0.0.0.0:52814
>>>>>>>>>>>>>>> Welcome to
>>>>>>>>>>>>>>>       ____              __
>>>>>>>>>>>>>>>      / __/__  ___ _____/ /__
>>>>>>>>>>>>>>>     _\ \/ _ \/ _ `/ __/  '_/
>>>>>>>>>>>>>>>    /___/ .__/\_,_/_/ /_/\_\   version 0.9.0
>>>>>>>>>>>>>>>       /_/
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Using Scala version 2.10.3 (Java HotSpot(TM) 64-Bit Server
>>>>>>>>>>>>>>> VM, Java 1.6.0_31)
>>>>>>>>>>>>>>> Type in expressions to have them evaluated.
>>>>>>>>>>>>>>> Type :help for more information.
>>>>>>>>>>>>>>> 14/06/02 13:52:19 INFO slf4j.Slf4jLogger: Slf4jLogger started
>>>>>>>>>>>>>>> 14/06/02 13:52:19 INFO Remoting: Starting remoting
>>>>>>>>>>>>>>> 14/06/02 13:52:19 INFO Remoting: Remoting started; listening
>>>>>>>>>>>>>>> on addresses :[akka.tcp://spark@hivecluster2:46033]
>>>>>>>>>>>>>>> 14/06/02 13:52:19 INFO Remoting: Remoting now listens on
>>>>>>>>>>>>>>> addresses: [akka.tcp://spark@hivecluster2:46033]
>>>>>>>>>>>>>>> 14/06/02 13:52:19 INFO spark.SparkEnv: Registering
>>>>>>>>>>>>>>> BlockManagerMaster
>>>>>>>>>>>>>>> 14/06/02 13:52:19 INFO storage.DiskBlockManager: Created
>>>>>>>>>>>>>>> local directory at /tmp/spark-local-20140602135219-bd8a
>>>>>>>>>>>>>>> 14/06/02 13:52:19 INFO storage.MemoryStore: MemoryStore
>>>>>>>>>>>>>>> started with capacity 294.4 MB.
>>>>>>>>>>>>>>> 14/06/02 13:52:19 INFO network.ConnectionManager: Bound
>>>>>>>>>>>>>>> socket to port 50645 with id = ConnectionManagerId(hivecluster2,50645)
>>>>>>>>>>>>>>> 14/06/02 13:52:19 INFO storage.BlockManagerMaster: Trying to
>>>>>>>>>>>>>>> register BlockManager
>>>>>>>>>>>>>>> 14/06/02 13:52:19 INFO
>>>>>>>>>>>>>>> storage.BlockManagerMasterActor$BlockManagerInfo: Registering block manager
>>>>>>>>>>>>>>> hivecluster2:50645 with 294.4 MB RAM
>>>>>>>>>>>>>>> 14/06/02 13:52:19 INFO storage.BlockManagerMaster:
>>>>>>>>>>>>>>> Registered BlockManager
>>>>>>>>>>>>>>> 14/06/02 13:52:19 INFO spark.HttpServer: Starting HTTP Server
>>>>>>>>>>>>>>> 14/06/02 13:52:19 INFO server.Server: jetty-7.6.8.v20121106
>>>>>>>>>>>>>>> 14/06/02 13:52:19 INFO server.AbstractConnector: Started
>>>>>>>>>>>>>>> SocketConnector@0.0.0.0:36103
>>>>>>>>>>>>>>> 14/06/02 13:52:19 INFO broadcast.HttpBroadcast: Broadcast
>>>>>>>>>>>>>>> server started at http://10.10.30.211:36103
>>>>>>>>>>>>>>> 14/06/02 13:52:19 INFO spark.SparkEnv: Registering
>>>>>>>>>>>>>>> MapOutputTracker
>>>>>>>>>>>>>>> 14/06/02 13:52:19 INFO spark.HttpFileServer: HTTP File
>>>>>>>>>>>>>>> server directory is /tmp/spark-ecce4c62-fef6-4369-a3d5-e3d7cbd1e00c
>>>>>>>>>>>>>>> 14/06/02 13:52:19 INFO spark.HttpServer: Starting HTTP Server
>>>>>>>>>>>>>>> 14/06/02 13:52:19 INFO server.Server: jetty-7.6.8.v20121106
>>>>>>>>>>>>>>> 14/06/02 13:52:19 INFO server.AbstractConnector: Started
>>>>>>>>>>>>>>> SocketConnector@0.0.0.0:37662
>>>>>>>>>>>>>>> 14/06/02 13:52:19 INFO server.Server: jetty-7.6.8.v20121106
>>>>>>>>>>>>>>> 14/06/02 13:52:19 INFO handler.ContextHandler: started
>>>>>>>>>>>>>>> o.e.j.s.h.ContextHandler{/storage/rdd,null}
>>>>>>>>>>>>>>> 14/06/02 13:52:19 INFO handler.ContextHandler: started
>>>>>>>>>>>>>>> o.e.j.s.h.ContextHandler{/storage,null}
>>>>>>>>>>>>>>> 14/06/02 13:52:19 INFO handler.ContextHandler: started
>>>>>>>>>>>>>>> o.e.j.s.h.ContextHandler{/stages/stage,null}
>>>>>>>>>>>>>>> 14/06/02 13:52:19 INFO handler.ContextHandler: started
>>>>>>>>>>>>>>> o.e.j.s.h.ContextHandler{/stages/pool,null}
>>>>>>>>>>>>>>> 14/06/02 13:52:19 INFO handler.ContextHandler: started
>>>>>>>>>>>>>>> o.e.j.s.h.ContextHandler{/stages,null}
>>>>>>>>>>>>>>> 14/06/02 13:52:19 INFO handler.ContextHandler: started
>>>>>>>>>>>>>>> o.e.j.s.h.ContextHandler{/environment,null}
>>>>>>>>>>>>>>> 14/06/02 13:52:19 INFO handler.ContextHandler: started
>>>>>>>>>>>>>>> o.e.j.s.h.ContextHandler{/executors,null}
>>>>>>>>>>>>>>> 14/06/02 13:52:19 INFO handler.ContextHandler: started
>>>>>>>>>>>>>>> o.e.j.s.h.ContextHandler{/metrics/json,null}
>>>>>>>>>>>>>>> 14/06/02 13:52:19 INFO handler.ContextHandler: started
>>>>>>>>>>>>>>> o.e.j.s.h.ContextHandler{/static,null}
>>>>>>>>>>>>>>> 14/06/02 13:52:19 INFO handler.ContextHandler: started
>>>>>>>>>>>>>>> o.e.j.s.h.ContextHandler{/,null}
>>>>>>>>>>>>>>> 14/06/02 13:52:19 INFO server.AbstractConnector: Started
>>>>>>>>>>>>>>> SelectChannelConnector@0.0.0.0:4040
>>>>>>>>>>>>>>> 14/06/02 13:52:19 INFO ui.SparkUI: Started Spark Web UI at *http://hivecluster2:4040
>>>>>>>>>>>>>>> <http://hivecluster2:4040/>*
>>>>>>>>>>>>>>> 14/06/02 13:52:19 INFO client.AppClient$ClientActor:
>>>>>>>>>>>>>>> Connecting to master spark://hivecluster2:7077...
>>>>>>>>>>>>>>> 14/06/02 13:52:20 INFO cluster.SparkDeploySchedulerBackend:
>>>>>>>>>>>>>>> Connected to Spark cluster with app ID app-20140602135220-0000
>>>>>>>>>>>>>>> Created spark context..
>>>>>>>>>>>>>>> Spark context available as sc.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Note that the Spark Web UI is running at hivecluster2:4040,
>>>>>>>>>>>>>>> I get the UI when I go there. I verify again that nothing exists at
>>>>>>>>>>>>>>> hivecluster2:8080.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I try to run my code:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> val sparkConf = new SparkConf()
>>>>>>>>>>>>>>> sparkConf.setMaster("spark://hivecluster2:7077")
>>>>>>>>>>>>>>> sparkConf.setAppName("Test Spark App")
>>>>>>>>>>>>>>> sparkConf.setJars(Array("avro-1.7.6.jar",
>>>>>>>>>>>>>>> "avro-mapred-1.7.6.jar"))
>>>>>>>>>>>>>>> val sc = new SparkContext(sparkConf)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This produces a new spark server(!) at port 4041:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 14/06/02 13:55:31 INFO server.AbstractConnector: Started
>>>>>>>>>>>>>>> SelectChannelConnector@0.0.0.0:4041
>>>>>>>>>>>>>>> 14/06/02 13:55:31 INFO ui.SparkUI: Started Spark Web UI at
>>>>>>>>>>>>>>> http://hivecluster2:4041
>>>>>>>>>>>>>>> 14/06/02 13:55:31 INFO spark.SparkContext: Added JAR
>>>>>>>>>>>>>>> avro-1.7.6.jar at
>>>>>>>>>>>>>>> http://10.10.30.211:49845/jars/avro-1.7.6.jar with
>>>>>>>>>>>>>>> timestamp 1401742531616
>>>>>>>>>>>>>>>  14/06/02 13:55:31 INFO spark.SparkContext: Added JAR
>>>>>>>>>>>>>>> avro-mapred-1.7.6.jar at
>>>>>>>>>>>>>>> http://10.10.30.211:49845/jars/avro-mapred-1.7.6.jar with
>>>>>>>>>>>>>>> timestamp 1401742531617
>>>>>>>>>>>>>>> 14/06/02 13:55:31 INFO client.AppClient$ClientActor:
>>>>>>>>>>>>>>> Connecting to master spark://hivecluster2:7077...
>>>>>>>>>>>>>>> 14/06/02 13:55:31 INFO cluster.SparkDeploySchedulerBackend:
>>>>>>>>>>>>>>> Connected to Spark cluster with app ID app-20140602135531-0001
>>>>>>>>>>>>>>> sc: org.apache.spark.SparkContext =
>>>>>>>>>>>>>>> org.apache.spark.SparkContext@2e9329e9
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I run the rest of my code...
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> val input = "
>>>>>>>>>>>>>>> hdfs://hivecluster2/securityx/web_proxy_mef/2014/05/29/22/*.avro
>>>>>>>>>>>>>>> "//part-m-000{15,16}.avro"
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> val jobConf= new JobConf(sc.hadoopConfiguration)
>>>>>>>>>>>>>>>  jobConf.setJobName("Test Scala Job")
>>>>>>>>>>>>>>> FileInputFormat.setInputPaths(jobConf, input)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> val rdd = sc.hadoopRDD(
>>>>>>>>>>>>>>>   //confBroadcast.value.value,
>>>>>>>>>>>>>>>   jobConf,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> classOf[org.apache.avro.mapred.AvroInputFormat[GenericRecord]],
>>>>>>>>>>>>>>>   classOf[org.apache.avro.mapred.AvroWrapper[GenericRecord]],
>>>>>>>>>>>>>>>   classOf[org.apache.hadoop.io.NullWritable],
>>>>>>>>>>>>>>>   1)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> val f1 = rdd.first
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I get this:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 14/06/02 14:00:36 INFO mapred.FileInputFormat: Total input
>>>>>>>>>>>>>>> paths to process : 17
>>>>>>>>>>>>>>> 14/06/02 14:00:36 INFO spark.SparkContext: Starting job:
>>>>>>>>>>>>>>> first at <console>:47
>>>>>>>>>>>>>>> 14/06/02 14:00:36 INFO scheduler.DAGScheduler: Got job 0
>>>>>>>>>>>>>>> (first at <console>:47) with 1 output partitions (allowLocal=true)
>>>>>>>>>>>>>>> 14/06/02 14:00:36 INFO scheduler.DAGScheduler: Final stage:
>>>>>>>>>>>>>>> Stage 0 (first at <console>:47)
>>>>>>>>>>>>>>> 14/06/02 14:00:36 INFO scheduler.DAGScheduler: Parents of
>>>>>>>>>>>>>>> final stage: List()
>>>>>>>>>>>>>>> 14/06/02 14:00:36 INFO scheduler.DAGScheduler: Missing
>>>>>>>>>>>>>>> parents: List()
>>>>>>>>>>>>>>> 14/06/02 14:00:36 INFO scheduler.DAGScheduler: Computing the
>>>>>>>>>>>>>>> requested partition locally
>>>>>>>>>>>>>>> 14/06/02 14:00:36 INFO rdd.HadoopRDD: Input split:
>>>>>>>>>>>>>>> hdfs://hivecluster2/securityx/web_proxy_mef/2014/05/29/22/part-m-00000.avro:0+3864
>>>>>>>>>>>>>>> 14/06/02 14:00:36 INFO spark.SparkContext: Job finished:
>>>>>>>>>>>>>>> first at <console>:47, took 0.374416468 s
>>>>>>>>>>>>>>> 14/06/02 14:00:36 INFO spark.SparkContext: Starting job:
>>>>>>>>>>>>>>> first at <console>:47
>>>>>>>>>>>>>>> 14/06/02 14:00:36 INFO scheduler.DAGScheduler: Got job 1
>>>>>>>>>>>>>>> (first at <console>:47) with 16 output partitions (allowLocal=true)
>>>>>>>>>>>>>>> 14/06/02 14:00:36 INFO scheduler.DAGScheduler: Final stage:
>>>>>>>>>>>>>>> Stage 1 (first at <console>:47)
>>>>>>>>>>>>>>> 14/06/02 14:00:36 INFO scheduler.DAGScheduler: Parents of
>>>>>>>>>>>>>>> final stage: List()
>>>>>>>>>>>>>>> 14/06/02 14:00:36 INFO scheduler.DAGScheduler: Missing
>>>>>>>>>>>>>>> parents: List()
>>>>>>>>>>>>>>> 14/06/02 14:00:36 INFO scheduler.DAGScheduler: Submitting
>>>>>>>>>>>>>>> Stage 1 (HadoopRDD[0] at hadoopRDD at <console>:45), which has no missing
>>>>>>>>>>>>>>> parents
>>>>>>>>>>>>>>> 14/06/02 14:00:36 INFO scheduler.DAGScheduler: Submitting 16
>>>>>>>>>>>>>>> missing tasks from Stage 1 (HadoopRDD[0] at hadoopRDD at <console>:45)
>>>>>>>>>>>>>>> 14/06/02 14:00:36 INFO scheduler.TaskSchedulerImpl: Adding
>>>>>>>>>>>>>>> task set 1.0 with 16 tasks
>>>>>>>>>>>>>>> 14/06/02 14:00:51 WARN scheduler.TaskSchedulerImpl: Initial
>>>>>>>>>>>>>>> job has not accepted any resources; check your cluster UI to ensure that
>>>>>>>>>>>>>>> workers are registered and have sufficient memory
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I see my job at http://hivecluster2:4041, but not at
>>>>>>>>>>>>>>> hivecluster2:4040. Task succeeded, 0/16.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> How do I instantiate a new SparkContext without creating a
>>>>>>>>>>>>>>> new web server thing? That seems to be the issue.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Russ
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Jun 2, 2014 at 1:19 PM, Aaron Davidson <
>>>>>>>>>>>>>>> ilikerps@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> You may have to do "sudo jps", because it should definitely
>>>>>>>>>>>>>>>> list your processes.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> What does hivecluster2:8080 look like? My guess is it says
>>>>>>>>>>>>>>>> there are 2 applications registered, and one has taken all the executors.
>>>>>>>>>>>>>>>> There must be two applications running, as those are the only things that
>>>>>>>>>>>>>>>> keep open those 4040/4041 ports.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Jun 2, 2014 at 11:32 AM, Russell Jurney <
>>>>>>>>>>>>>>>> russell.jurney@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> If it matters, I have servers running at
>>>>>>>>>>>>>>>>> http://hivecluster2:4040/stages/ and
>>>>>>>>>>>>>>>>> http://hivecluster2:4041/stages/
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> When I run rdd.first, I see an item at
>>>>>>>>>>>>>>>>> http://hivecluster2:4041/stages/ but no tasks are
>>>>>>>>>>>>>>>>> running. Stage ID 1,
>>>>>>>>>>>>>>>>> first at <console>:46, Tasks: Succeeded/Total 0/16.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Mon, Jun 2, 2014 at 10:09 AM, Russell Jurney
>>>>>>>>>>>>>>>>> <russell.jurney@gmail.com> wrote:
>>>>>>>>>>>>>>>>> > Looks like just worker and master processes are running:
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > [hivedata@hivecluster2 ~]$ jps
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > 10425 Jps
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > [hivedata@hivecluster2 ~]$ ps aux|grep spark
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > hivedata 10424  0.0  0.0 103248   820 pts/3    S+
>>>>>>>>>>>>>>>>> 10:05   0:00 grep spark
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > root     10918  0.5  1.4 4752880 230512 ?      Sl
>>>>>>>>>>>>>>>>> May27  41:43 java -cp
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> :/opt/cloudera/parcels/SPARK-0.9.0-1.cdh4.6.0.p0.98/lib/spark/conf:/opt/cloudera/parcels/SPARK-0.9.0-1.cdh4.6.0.p0.98/lib/spark/core/lib/*:/opt/cloudera/parcels/SPARK-0.9.0-1.cdh4.6.0.p0.98/lib/spark/repl/lib/*:/opt/cloudera/parcels/SPARK-0.9.0-1.cdh4.6.0.p0.98/lib/spark/examples/lib/*:/opt/cloudera/parcels/SPARK-0.9.0-1.cdh4.6.0.p0.98/lib/spark/bagel/lib/*:/opt/cloudera/parcels/SPARK-0.9.0-1.cdh4.6.0.p0.98/lib/spark/mllib/lib/*:/opt/cloudera/parcels/SPARK-0.9.0-1.cdh4.6.0.p0.98/lib/spark/streaming/lib/*:/opt/cloudera/parcels/SPARK-0.9.0-1.cdh4.6.0.p0.98/lib/spark/lib/*:/etc/hadoop/conf:/opt/cloudera/parcels/CDH/lib/hadoop/*:/opt/cloudera/parcels/CDH/lib/hadoop/../hadoop-hdfs/*:/opt/cloudera/parcels/CDH/lib/hadoop/../hadoop-yarn/*:/opt/cloudera/parcels/CDH/lib/hadoop/../hadoop-mapreduce/*:/opt/cloudera/parcels/SPARK-0.9.0-1.cdh4.6.0.p0.98/lib/spark/lib/scala-library.jar:/opt/cloudera/parcels/SPARK-0.9.0-1.cdh4.6.0.p0.98/lib/spark/lib/scala-compiler.jar:/opt/cloudera/parcels/SPARK-0.9.0-1.cdh4.6.0.p0.98/lib/spark/lib/jline.jar
>>>>>>>>>>>>>>>>> > -Dspark.akka.logLifecycleEvents=true
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> -Djava.library.path=/opt/cloudera/parcels/SPARK-0.9.0-1.cdh4.6.0.p0.98/lib/spark/lib:/opt/cloudera/parcels/CDH/lib/hadoop/lib/native
>>>>>>>>>>>>>>>>> > -Xms512m -Xmx512m org.apache.spark.deploy.master.Master
>>>>>>>>>>>>>>>>> --ip hivecluster2
>>>>>>>>>>>>>>>>> > --port 7077 --webui-port 18080
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > root     12715  0.0  0.0 148028   656 ?        S
>>>>>>>>>>>>>>>>>  May27   0:00 sudo
>>>>>>>>>>>>>>>>> > /opt/cloudera/parcels/SPARK/lib/spark/bin/spark-class
>>>>>>>>>>>>>>>>> > org.apache.spark.deploy.worker.Worker
>>>>>>>>>>>>>>>>> spark://hivecluster2:7077
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > root     12716  0.3  1.1 4155884 191340 ?      Sl
>>>>>>>>>>>>>>>>> May27  30:21 java -cp
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> :/opt/cloudera/parcels/SPARK/lib/spark/conf:/opt/cloudera/parcels/SPARK/lib/spark/core/lib/*:/opt/cloudera/parcels/SPARK/lib/spark/repl/lib/*:/opt/cloudera/parcels/SPARK/lib/spark/examples/lib/*:/opt/cloudera/parcels/SPARK/lib/spark/bagel/lib/*:/opt/cloudera/parcels/SPARK/lib/spark/mllib/lib/*:/opt/cloudera/parcels/SPARK/lib/spark/streaming/lib/*:/opt/cloudera/parcels/SPARK/lib/spark/lib/*:/etc/hadoop/conf:/opt/cloudera/parcels/CDH/lib/hadoop/*:/opt/cloudera/parcels/CDH/lib/hadoop/../hadoop-hdfs/*:/opt/cloudera/parcels/CDH/lib/hadoop/../hadoop-yarn/*:/opt/cloudera/parcels/CDH/lib/hadoop/../hadoop-mapreduce/*:/opt/cloudera/parcels/SPARK/lib/spark/lib/scala-library.jar:/opt/cloudera/parcels/SPARK/lib/spark/lib/scala-compiler.jar:/opt/cloudera/parcels/SPARK/lib/spark/lib/jline.jar
>>>>>>>>>>>>>>>>> > -Dspark.akka.logLifecycleEvents=true
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> -Djava.library.path=/opt/cloudera/parcels/SPARK/lib/spark/lib:/opt/cloudera/parcels/CDH/lib/hadoop/lib/native
>>>>>>>>>>>>>>>>> > -Xms512m -Xmx512m org.apache.spark.deploy.worker.Worker
>>>>>>>>>>>>>>>>> > spark://hivecluster2:7077
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > On Sun, Jun 1, 2014 at 7:41 PM, Aaron Davidson <
>>>>>>>>>>>>>>>>> ilikerps@gmail.com> wrote:
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >> Sounds like you have two shells running, and the first
>>>>>>>>>>>>>>>>> one is talking all
>>>>>>>>>>>>>>>>> >> your resources. Do a "jps" and kill the other guy, then
>>>>>>>>>>>>>>>>> try again.
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >> By the way, you can look at http://localhost:8080
>>>>>>>>>>>>>>>>> (replace localhost with
>>>>>>>>>>>>>>>>> >> the server your Spark Master is running on) to see what
>>>>>>>>>>>>>>>>> applications are
>>>>>>>>>>>>>>>>> >> currently started, and what resource allocations they
>>>>>>>>>>>>>>>>> have.
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >> On Sun, Jun 1, 2014 at 6:47 PM, Russell Jurney <
>>>>>>>>>>>>>>>>> russell.jurney@gmail.com>
>>>>>>>>>>>>>>>>> >> wrote:
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> >>> Thanks again. Run results here:
>>>>>>>>>>>>>>>>> >>> https://gist.github.com/rjurney/dc0efae486ba7d55b7d5
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> >>> This time I get a port already in use exception on
>>>>>>>>>>>>>>>>> 4040, but it isn't
>>>>>>>>>>>>>>>>> >>> fatal. Then when I run rdd.first, I get this over and
>>>>>>>>>>>>>>>>> over:
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> >>> 14/06/01 18:35:40 WARN scheduler.TaskSchedulerImpl:
>>>>>>>>>>>>>>>>> Initial job has not
>>>>>>>>>>>>>>>>> >>> accepted any resources; check your cluster UI to
>>>>>>>>>>>>>>>>> ensure that workers are
>>>>>>>>>>>>>>>>> >>> registered and have sufficient memory
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> >>> On Sun, Jun 1, 2014 at 3:09 PM, Aaron Davidson <
>>>>>>>>>>>>>>>>> ilikerps@gmail.com>
>>>>>>>>>>>>>>>>> >>> wrote:
>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>> >>>> You can avoid that by using the constructor that
>>>>>>>>>>>>>>>>> takes a SparkConf, a la
>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>> >>>> val conf = new SparkConf()
>>>>>>>>>>>>>>>>> >>>> conf.setJars("avro.jar", ...)
>>>>>>>>>>>>>>>>> >>>> val sc = new SparkContext(conf)
>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>> >>>> On Sun, Jun 1, 2014 at 2:32 PM, Russell Jurney
>>>>>>>>>>>>>>>>> >>>> <russell.jurney@gmail.com> wrote:
>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>> >>>>> Followup question: the docs to make a new
>>>>>>>>>>>>>>>>> SparkContext require that I
>>>>>>>>>>>>>>>>> >>>>> know where $SPARK_HOME is. However, I have no idea.
>>>>>>>>>>>>>>>>> Any idea where that
>>>>>>>>>>>>>>>>> >>>>> might be?
>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>> >>>>> On Sun, Jun 1, 2014 at 10:28 AM, Aaron Davidson <
>>>>>>>>>>>>>>>>> ilikerps@gmail.com>
>>>>>>>>>>>>>>>>> >>>>> wrote:
>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>> >>>>>> Gotcha. The easiest way to get your dependencies to
>>>>>>>>>>>>>>>>> your Executors
>>>>>>>>>>>>>>>>> >>>>>> would probably be to construct your SparkContext
>>>>>>>>>>>>>>>>> with all necessary jars
>>>>>>>>>>>>>>>>> >>>>>> passed in (as the "jars" parameter), or inside a
>>>>>>>>>>>>>>>>> SparkConf with setJars().
>>>>>>>>>>>>>>>>> >>>>>> Avro is a "necessary jar", but it's possible your
>>>>>>>>>>>>>>>>> application also needs to
>>>>>>>>>>>>>>>>> >>>>>> distribute other ones to the cluster.
>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>> >>>>>> An easy way to make sure all your dependencies get
>>>>>>>>>>>>>>>>> shipped to the
>>>>>>>>>>>>>>>>> >>>>>> cluster is to create an assembly jar of your
>>>>>>>>>>>>>>>>> application, and then you just
>>>>>>>>>>>>>>>>> >>>>>> need to tell Spark about that jar, which includes
>>>>>>>>>>>>>>>>> all your application's
>>>>>>>>>>>>>>>>> >>>>>> transitive dependencies. Maven and sbt both have
>>>>>>>>>>>>>>>>> pretty straightforward ways
>>>>>>>>>>>>>>>>> >>>>>> of producing assembly jars.
>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>> >>>>>> On Sat, May 31, 2014 at 11:23 PM, Russell Jurney
>>>>>>>>>>>>>>>>> >>>>>> <russell.jurney@gmail.com> wrote:
>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>> Thanks for the fast reply.
>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>> I am running CDH 4.4 with the Cloudera Parcel of
>>>>>>>>>>>>>>>>> Spark 0.9.0, in
>>>>>>>>>>>>>>>>> >>>>>>> standalone mode.
>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>> On Saturday, May 31, 2014, Aaron Davidson <
>>>>>>>>>>>>>>>>> ilikerps@gmail.com> wrote:
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>> First issue was because your cluster was
>>>>>>>>>>>>>>>>> configured incorrectly. You
>>>>>>>>>>>>>>>>> >>>>>>>> could probably read 1 file because that was done
>>>>>>>>>>>>>>>>> on the driver node, but
>>>>>>>>>>>>>>>>> >>>>>>>> when it tried to run a job on the cluster, it
>>>>>>>>>>>>>>>>> failed.
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>> Second issue, it seems that the jar containing
>>>>>>>>>>>>>>>>> avro is not getting
>>>>>>>>>>>>>>>>> >>>>>>>> propagated to the Executors. What version of
>>>>>>>>>>>>>>>>> Spark are you running on? What
>>>>>>>>>>>>>>>>> >>>>>>>> deployment mode (YARN, standalone, Mesos)?
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>> On Sat, May 31, 2014 at 9:37 PM, Russell Jurney
>>>>>>>>>>>>>>>>> >>>>>>>> <russell.jurney@gmail.com> wrote:
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>> Now I get this:
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>> scala> rdd.first
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>> 14/05/31 21:36:28 INFO spark.SparkContext:
>>>>>>>>>>>>>>>>> Starting job: first at
>>>>>>>>>>>>>>>>> >>>>>>>> <console>:41
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.DAGScheduler:
>>>>>>>>>>>>>>>>> Got job 4 (first at
>>>>>>>>>>>>>>>>> >>>>>>>> <console>:41) with 1 output partitions
>>>>>>>>>>>>>>>>> (allowLocal=true)
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.DAGScheduler:
>>>>>>>>>>>>>>>>> Final stage: Stage 4
>>>>>>>>>>>>>>>>> >>>>>>>> (first at <console>:41)
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.DAGScheduler:
>>>>>>>>>>>>>>>>> Parents of final
>>>>>>>>>>>>>>>>> >>>>>>>> stage: List()
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.DAGScheduler:
>>>>>>>>>>>>>>>>> Missing parents:
>>>>>>>>>>>>>>>>> >>>>>>>> List()
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.DAGScheduler:
>>>>>>>>>>>>>>>>> Computing the
>>>>>>>>>>>>>>>>> >>>>>>>> requested partition locally
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>> 14/05/31 21:36:28 INFO rdd.HadoopRDD: Input split:
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> hdfs://hivecluster2/securityx/web_proxy_mef/2014/05/29/22/part-m-00000.avro:0+3864
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>> 14/05/31 21:36:28 INFO spark.SparkContext: Job
>>>>>>>>>>>>>>>>> finished: first at
>>>>>>>>>>>>>>>>> >>>>>>>> <console>:41, took 0.037371256 s
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>> 14/05/31 21:36:28 INFO spark.SparkContext:
>>>>>>>>>>>>>>>>> Starting job: first at
>>>>>>>>>>>>>>>>> >>>>>>>> <console>:41
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.DAGScheduler:
>>>>>>>>>>>>>>>>> Got job 5 (first at
>>>>>>>>>>>>>>>>> >>>>>>>> <console>:41) with 16 output partitions
>>>>>>>>>>>>>>>>> (allowLocal=true)
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.DAGScheduler:
>>>>>>>>>>>>>>>>> Final stage: Stage 5
>>>>>>>>>>>>>>>>> >>>>>>>> (first at <console>:41)
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.DAGScheduler:
>>>>>>>>>>>>>>>>> Parents of final
>>>>>>>>>>>>>>>>> >>>>>>>> stage: List()
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.DAGScheduler:
>>>>>>>>>>>>>>>>> Missing parents:
>>>>>>>>>>>>>>>>> >>>>>>>> List()
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.DAGScheduler:
>>>>>>>>>>>>>>>>> Submitting Stage 5
>>>>>>>>>>>>>>>>> >>>>>>>> (HadoopRDD[0] at hadoopRDD at <console>:37),
>>>>>>>>>>>>>>>>> which has no missing parents
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.DAGScheduler:
>>>>>>>>>>>>>>>>> Submitting 16 missing
>>>>>>>>>>>>>>>>> >>>>>>>> tasks from Stage 5 (HadoopRDD[0] at hadoopRDD at
>>>>>>>>>>>>>>>>> <console>:37)
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>> 14/05/31 21:36:28 INFO
>>>>>>>>>>>>>>>>> scheduler.TaskSchedulerImpl: Adding task set
>>>>>>>>>>>>>>>>> >>>>>>>> 5.0 with 16 tasks
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager:
>>>>>>>>>>>>>>>>> Starting task 5.0:0
>>>>>>>>>>>>>>>>> >>>>>>>> as TID 92 on executor 2: hivecluster3 (NODE_LOCAL)
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager:
>>>>>>>>>>>>>>>>> Serialized task
>>>>>>>>>>>>>>>>> >>>>>>>> 5.0:0 as 1294 bytes in 1 ms
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager:
>>>>>>>>>>>>>>>>> Starting task 5.0:3
>>>>>>>>>>>>>>>>> >>>>>>>> as TID 93 on executor 1: hivecluster5.labs.lan
>>>>>>>>>>>>>>>>> (NODE_LOCAL)
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager:
>>>>>>>>>>>>>>>>> Serialized task
>>>>>>>>>>>>>>>>> >>>>>>>> 5.0:3 as 1294 bytes in 0 ms
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager:
>>>>>>>>>>>>>>>>> Starting task 5.0:1
>>>>>>>>>>>>>>>>> >>>>>>>> as TID 94 on executor 4: hivecluster4 (NODE_LOCAL)
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager:
>>>>>>>>>>>>>>>>> Serialized task
>>>>>>>>>>>>>>>>> >>>>>>>> 5.0:1 as 1294 bytes in 1 ms
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager:
>>>>>>>>>>>>>>>>> Starting task 5.0:2
>>>>>>>>>>>>>>>>> >>>>>>>> as TID 95 on executor 0: hivecluster6.labs.lan
>>>>>>>>>>>>>>>>> (NODE_LOCAL)
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager:
>>>>>>>>>>>>>>>>> Serialized task
>>>>>>>>>>>>>>>>> >>>>>>>> 5.0:2 as 1294 bytes in 0 ms
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager:
>>>>>>>>>>>>>>>>> Starting task 5.0:4
>>>>>>>>>>>>>>>>> >>>>>>>> as TID 96 on executor 3: hivecluster1.labs.lan
>>>>>>>>>>>>>>>>> (NODE_LOCAL)
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager:
>>>>>>>>>>>>>>>>> Serialized task
>>>>>>>>>>>>>>>>> >>>>>>>> 5.0:4 as 1294 bytes in 0 ms
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager:
>>>>>>>>>>>>>>>>> Starting task 5.0:6
>>>>>>>>>>>>>>>>> >>>>>>>> as TID 97 on executor 2: hivecluster3 (NODE_LOCAL)
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager:
>>>>>>>>>>>>>>>>> Serialized task
>>>>>>>>>>>>>>>>> >>>>>>>> 5.0:6 as 1294 bytes in 0 ms
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager:
>>>>>>>>>>>>>>>>> Starting task 5.0:5
>>>>>>>>>>>>>>>>> >>>>>>>> as TID 98 on executor 1: hivecluster5.labs.lan
>>>>>>>>>>>>>>>>> (NODE_LOCAL)
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager:
>>>>>>>>>>>>>>>>> Serialized task
>>>>>>>>>>>>>>>>> >>>>>>>> 5.0:5 as 1294 bytes in 0 ms
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager:
>>>>>>>>>>>>>>>>> Starting task 5.0:8
>>>>>>>>>>>>>>>>> >>>>>>>> as TID 99 on executor 4: hivecluster4 (NODE_LOCAL)
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager:
>>>>>>>>>>>>>>>>> Serialized task
>>>>>>>>>>>>>>>>> >>>>>>>> 5.0:8 as 1294 bytes in 0 ms
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager:
>>>>>>>>>>>>>>>>> Starting task 5.0:7
>>>>>>>>>>>>>>>>> >>>>>>>> as TID 100 on executor 0: hivecluster6.labs.lan
>>>>>>>>>>>>>>>>> (NODE_LOCAL)
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager:
>>>>>>>>>>>>>>>>> Serialized task
>>>>>>>>>>>>>>>>> >>>>>>>> 5.0:7 as 1294 bytes in 0 ms
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager:
>>>>>>>>>>>>>>>>> Starting task
>>>>>>>>>>>>>>>>> >>>>>>>> 5.0:10 as TID 101 on executor 3:
>>>>>>>>>>>>>>>>> hivecluster1.labs.lan (NODE_LOCAL)
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager:
>>>>>>>>>>>>>>>>> Serialized task
>>>>>>>>>>>>>>>>> >>>>>>>> 5.0:10 as 1294 bytes in 0 ms
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager:
>>>>>>>>>>>>>>>>> Starting task
>>>>>>>>>>>>>>>>> >>>>>>>> 5.0:14 as TID 102 on executor 2: hivecluster3
>>>>>>>>>>>>>>>>> (NODE_LOCAL)
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager:
>>>>>>>>>>>>>>>>> Serialized task
>>>>>>>>>>>>>>>>> >>>>>>>> 5.0:14 as 1294 bytes in 0 ms
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager:
>>>>>>>>>>>>>>>>> Starting task 5.0:9
>>>>>>>>>>>>>>>>> >>>>>>>> as TID 103 on executor 1: hivecluster5.labs.lan
>>>>>>>>>>>>>>>>> (NODE_LOCAL)
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager:
>>>>>>>>>>>>>>>>> Serialized task
>>>>>>>>>>>>>>>>> >>>>>>>> 5.0:9 as 1294 bytes in 0 ms
>>>>>>>>>>>>>>>>> >>>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>> 14/05/31 21:36:28 INFO scheduler.TaskSetManager:
>>>>>>>>>>>>>>>>> Starting task
>>>>>>>>>>>>>>>>> >>>>>>>> 5.0:11 as TID 104 on executor 4: hivecluster4 (N
>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>>
>>>>>>>>>>>>>>>>> >>>>>>> --
>>>>>>>>>>>>>>>>> >>>>>>> Russell Jurney twitter.com/rjurney
>>>>>>>>>>>>>>>>> russell.jurney@gmail.com
>>>>>>>>>>>>>>>>> >>>>>>> datasyndrome.com
>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>>>> >>>>> --
>>>>>>>>>>>>>>>>> >>>>> Russell Jurney twitter.com/rjurney
>>>>>>>>>>>>>>>>> russell.jurney@gmail.com
>>>>>>>>>>>>>>>>> >>>>> datasyndrome.com
>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>>>> >>> --
>>>>>>>>>>>>>>>>> >>> Russell Jurney twitter.com/rjurney
>>>>>>>>>>>>>>>>> russell.jurney@gmail.com
>>>>>>>>>>>>>>>>> >>> datasyndrome.com
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>> > --
>>>>>>>>>>>>>>>>> > Russell Jurney twitter.com/rjurney
>>>>>>>>>>>>>>>>> russell.jurney@gmail.com datasyndrome.com
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Russell Jurney twitter.com/rjurney
>>>>>>>>>>>>>>>>> russell.jurney@gmail.com datasyndrome.com
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
>>>>>>>>>>>>>>> datasyndrome.com
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
>>>>>>>>>>>>> datasyndrome.com
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
>>>>>>>>>>>> datasyndrome.com
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
>>>>>>>>> datasyndrome.com
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
>>>>>>>> datasyndrome.com
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
>>>>>> datasyndrome.com
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com
>>>>> datasyndrome.com
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome
>>> .com
>>>
>>>
>>>
>>
>
>
> --
> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.
> com
>

Mime
View raw message