spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tathagata Das <tathagata.das1...@gmail.com>
Subject Re: Structured streaming (Kafka) error with Union
Date Fri, 25 Aug 2017 07:12:57 GMT
Can you give the query with the union?

Better still can you do `dataFrame.explain(true)`  and paste it here (where
`dataFrame` is the same one on which you are calling writeStream).

TD

On Tue, Aug 22, 2017 at 10:04 PM, upkar_kohli <upkar.kohli@gmail.com> wrote:

> Hi,
>
> After the change, the code has started intermittently giving error as
> below (only with the union):
>
> This is after running on standalone node (latest Spark 2.2 version).
>
> May be related to https://issues.apache.org/jira/browse/SPARK-19185 ?
>
> 17/08/23 09:52:38 ERROR StreamExecution: Query [id =
> 98c3ce45-9bf8-4d99-b115-b95b78cc06d1, runId = 9c0e333b-a846-460c-b744-0acf2c1248cf]
> terminated with error
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
> in stage 23.0 failed 1 times, most recent failure: Lost task 0.0 in stage
> 23.0 (TID 39, localhost, executor driver): java.util.
> ConcurrentModificationException: KafkaConsumer is not safe for
> multi-threaded access
>         at org.apache.kafka.clients.consumer.KafkaConsumer.
> acquire(KafkaConsumer.java:1431)
>         at org.apache.kafka.clients.consumer.KafkaConsumer.seek(
> KafkaConsumer.java:1132)
>         at org.apache.spark.sql.kafka010.CachedKafkaConsumer.seek(
> CachedKafkaConsumer.scala:294)
>         at org.apache.spark.sql.kafka010.CachedKafkaConsumer.org$
> apache$spark$sql$kafka010$CachedKafkaConsumer$$
> fetchData(CachedKafkaConsumer.scala:205)
>         at org.apache.spark.sql.kafka010.CachedKafkaConsumer$$anonfun$
> get$1.apply(CachedKafkaConsumer.scala:117)
>         at org.apache.spark.sql.kafka010.CachedKafkaConsumer$$anonfun$
> get$1.apply(CachedKafkaConsumer.scala:106)
>         at org.apache.spark.util.UninterruptibleThread.runUninterruptibly(
> UninterruptibleThread.scala:85)
>         at org.apache.spark.sql.kafka010.CachedKafkaConsumer.
> runUninterruptiblyIfPossible(CachedKafkaConsumer.scala:68)
>         at org.apache.spark.sql.kafka010.CachedKafkaConsumer.get(
> CachedKafkaConsumer.scala:106)
>         at org.apache.spark.sql.kafka010.KafkaSourceRDD$$anon$1.
> getNext(KafkaSourceRDD.scala:157)
>         at org.apache.spark.sql.kafka010.KafkaSourceRDD$$anon$1.
> getNext(KafkaSourceRDD.scala:148)
>         at org.apache.spark.util.NextIterator.hasNext(
> NextIterator.scala:73)
>         at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>         at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>         at org.apache.spark.sql.catalyst.expressions.GeneratedClass$
> GeneratedIterator.processNext(Unknown Source)
>
>
>
> Below configuration didn't help for similar issue
>
> http://apache-spark-developers-list.1001551.n3.nabble.com/Streaming-
> ConcurrentModificationExceptions-when-Windowing-td20548.html
>
> running with :
>
> spark-submit --conf spark.executor.cores=1 --packages
> org.apache.spark:spark-sql-kafka-0-10_2.11:2.2.0 kafka_ml.py
>
>
> Regards,
> Upkar
>
> On Wed, Aug 23, 2017 at 8:57 AM, <upkar.kohli@gmail.com> wrote:
>
>> Thanks. This was the issue
>>
>> Regards,
>> Upkar
>>
>> Sent from my iPhone
>>
>> On 23-Aug-2017, at 01:29, Shixiong(Ryan) Zhu <shixiong@databricks.com>
>> wrote:
>>
>> My hunch is you were using the same checkpoint location for "selected_lr"
>> and "selected_all". When you tried to use "selected_all", it was using
>> the checkpoint generated by "selected_lr" and caused the query failed.
>>
>> On Tue, Aug 22, 2017 at 2:12 AM, upkar_kohli <upkar.kohli@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I am doing a POC on ML using structured streaming.
>>>
>>> The code below runs fine if i run streaming script using
>>> selected_lr.selectExpr (just logistic regression), but fails when i run
>>> selected_all.selectExpr (which is union of naive bayes and logistic
>>> regression pipelines).
>>>
>>> Please can you help with what I am doing wrong.
>>>
>>> Regards,
>>> Upkar
>>>
>>> ************************************************************
>>> **************************************
>>> Code Starts ------------------------------------------------------------
>>> ----------------------------------
>>>
>>> from pyspark.ml import Pipeline
>>> from pyspark.ml.classification import LogisticRegression, NaiveBayes
>>> from pyspark.ml.feature import HashingTF, Tokenizer, StopWordsRemover,
>>> CountVectorizer, IDF
>>> from pyspark import SparkConf, SparkContext
>>> from pyspark.sql import SparkSession
>>> from pyspark.sql.functions import concat, col, lit
>>>
>>> if __name__ == "__main__":
>>>     spark = SparkSession\
>>>         .builder\
>>>         .appName("KafkaSentiment")\
>>>         .getOrCreate()
>>>
>>>     # $example on$
>>>     # Prepare training documents from a list of (id, text, label) tuples.
>>>     training = spark.createDataFrame([
>>>         (0, "a b c d e spark", 1.0),
>>>         (1, "b d", 0.0),
>>>         (2, "spark f g h", 1.0),
>>>         (3, "hadoop mapreduce", 0.0)
>>>     ], ["id", "text", "label"])
>>>
>>>     # Configure an ML pipeline, which consists of three stages:
>>> tokenizer, hashingTF, and lr.
>>>     tokenizer = Tokenizer(inputCol="text", outputCol="words")
>>>     remover = StopWordsRemover(inputCol=tokenizer.getOutputCol(),
>>> outputCol="relevant_words")
>>>     hashingTF = HashingTF(inputCol=remover.getOutputCol(),
>>> outputCol="rawFeatures")
>>>     idf = IDF(inputCol="rawFeatures", outputCol="features")
>>>     lr = LogisticRegression(maxIter=10, regParam=0.001)
>>>     nb = NaiveBayes(smoothing=1.0, modelType="multinomial")
>>>     pipeline_lr = Pipeline(stages=[tokenizer, remover, hashingTF, idf,
>>> lr])
>>>     pipeline_nb = Pipeline(stages=[tokenizer, remover, hashingTF, idf,
>>> nb])
>>>
>>>     # Fit the pipeline to training documents.
>>>     model_lr = pipeline_lr.fit(training)
>>>     model_nb = pipeline_nb.fit(training)
>>>
>>>     testing_data = spark.readStream.format("kafka
>>> ").option("kafka.bootstrap.servers", "localhost:9092").option("subscribe",
>>> "spark-stream").load()
>>>     test_ml = testing_data.selectExpr("1 as id", "CAST(value AS STRING)
>>> as text")
>>>
>>>     # Make predictions on test documents and print columns of interest.
>>>     prediction_lr = model_lr.transform(test_ml)
>>>     prediction_nb = model_nb.transform(test_ml)
>>>     selected_lr = prediction_lr.select("id", "text", "prediction")
>>>     selected_nb = prediction_nb.select("id", "text", "prediction")
>>>     print ("printing schema")
>>>     selected_lr.printSchema()
>>>     selected_nb.printSchema()
>>>     test_ml.printSchema()
>>>
>>>     query = test_ml.writeStream.outputMode("append").format("console").s
>>> tart()
>>>     selected_all=selected_lr.union(selected_nb)
>>>     selected_all.printSchema()
>>>     kafka_write = selected_all.selectExpr("CAST(id AS STRING) as key",
>>> "CONCAT(CAST(text AS STRING),':', CAST(prediction AS STRING)) as
>>> value").writeStream.format("kafka").option("checkpointLocation",
>>> "/tmp/kaf_tgt").option("kafka.bootstrap.servers",
>>> "localhost:9092").option("topic", "topic_tgt").start()
>>>
>>>     kafka_write.awaitTermination()
>>>
>>>     query.awaitTermination()
>>>
>>> Code Ends ------------------------------------------------------------
>>> ----------------------------------
>>> ************************************************************
>>> ***********************************
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> jatin@Sandbox.RHEL:/home/jatin/projects/app#spark-submit --packages
>>> org.apache.spark:spark-sql-kafka-0-10_2.11:2.2.0 kafka_ml.py
>>> Ivy Default Cache set to: /home/jatin/.ivy2/cache
>>> The jars for the packages stored in: /home/jatin/.ivy2/jars
>>> :: loading settings :: url = jar:file:/home/jatin/projects/
>>> spark/spark-2.2.0-bin-hadoop2.7/jars/ivy-2.4.0.jar!/org/apac
>>> he/ivy/core/settings/ivysettings.xml
>>> org.apache.spark#spark-sql-kafka-0-10_2.11 added as a dependency
>>> :: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
>>>         confs: [default]
>>>         found org.apache.spark#spark-sql-kafka-0-10_2.11;2.2.0 in
>>> central
>>>         found org.apache.kafka#kafka-clients;0.10.0.1 in central
>>>         found net.jpountz.lz4#lz4;1.3.0 in central
>>>         found org.xerial.snappy#snappy-java;1.1.2.6 in central
>>>         found org.slf4j#slf4j-api;1.7.16 in central
>>>         found org.apache.spark#spark-tags_2.11;2.2.0 in central
>>>         found org.spark-project.spark#unused;1.0.0 in central
>>> :: resolution report :: resolve 9475ms :: artifacts dl 23ms
>>>         :: modules in use:
>>>         net.jpountz.lz4#lz4;1.3.0 from central in [default]
>>>         org.apache.kafka#kafka-clients;0.10.0.1 from central in
>>> [default]
>>>         org.apache.spark#spark-sql-kafka-0-10_2.11;2.2.0 from central
>>> in [default]
>>>         org.apache.spark#spark-tags_2.11;2.2.0 from central in [default]
>>>         org.slf4j#slf4j-api;1.7.16 from central in [default]
>>>         org.spark-project.spark#unused;1.0.0 from central in [default]
>>>         org.xerial.snappy#snappy-java;1.1.2.6 from central in [default]
>>>         ------------------------------------------------------------
>>> ---------
>>>         |                  |            modules            ||
>>> artifacts   |
>>>         |       conf       | number| search|dwnlded|evicted||
>>> number|dwnlded|
>>>         ------------------------------------------------------------
>>> ---------
>>>         |      default     |   7   |   2   |   2   |   0   ||   7   |
>>> 0   |
>>>         ------------------------------------------------------------
>>> ---------
>>> :: retrieving :: org.apache.spark#spark-submit-parent
>>>         confs: [default]
>>>         0 artifacts copied, 7 already retrieved (0kB/15ms)
>>> Using Spark's default log4j profile: org/apache/spark/log4j-default
>>> s.properties
>>> 17/08/22 14:20:15 INFO SparkContext: Running Spark version 2.2.0
>>> 17/08/22 14:20:15 WARN NativeCodeLoader: Unable to load native-hadoop
>>> library for your platform... using builtin-java classes where applicable
>>> 17/08/22 14:20:16 INFO SparkContext: Submitted application:
>>> KafkaSentiment
>>> 17/08/22 14:20:16 INFO SecurityManager: Changing view acls to: jatin
>>> 17/08/22 14:20:16 INFO SecurityManager: Changing modify acls to: jatin
>>> 17/08/22 14:20:16 INFO SecurityManager: Changing view acls groups to:
>>> 17/08/22 14:20:16 INFO SecurityManager: Changing modify acls groups to:
>>> 17/08/22 14:20:16 INFO SecurityManager: SecurityManager: authentication
>>> disabled; ui acls disabled; users  with view permissions: Set(jatin);
>>> groups with view permissions: Set(); users  with modify permissions:
>>> Set(jatin); groups with modify permissions: Set()
>>> 17/08/22 14:20:16 INFO Utils: Successfully started service 'sparkDriver'
>>> on port 41554.
>>> 17/08/22 14:20:16 INFO SparkEnv: Registering MapOutputTracker
>>> 17/08/22 14:20:17 INFO SparkEnv: Registering BlockManagerMaster
>>> 17/08/22 14:20:17 INFO BlockManagerMasterEndpoint: Using
>>> org.apache.spark.storage.DefaultTopologyMapper for getting topology
>>> information
>>> 17/08/22 14:20:17 INFO BlockManagerMasterEndpoint:
>>> BlockManagerMasterEndpoint up
>>> 17/08/22 14:20:17 INFO DiskBlockManager: Created local directory at
>>> /tmp/blockmgr-41217e8a-3b1f-498c-a749-8c51564e0ebe
>>> 17/08/22 14:20:17 INFO MemoryStore: MemoryStore started with capacity
>>> 366.3 MB
>>> 17/08/22 14:20:17 INFO SparkEnv: Registering OutputCommitCoordinator
>>> 17/08/22 14:20:17 INFO Utils: Successfully started service 'SparkUI' on
>>> port 4040.
>>> 17/08/22 14:20:17 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at
>>> http://192.168.25.187:4040
>>> 17/08/22 14:20:17 INFO SparkContext: Added JAR
>>> file:/home/jatin/.ivy2/jars/org.apache.spark_spark-sql-kafka-0-10_2.11-2.2.0.jar
>>> at spark://192.168.25.187:41554/jars/org.apache.spark_spark-sql
>>> -kafka-0-10_2.11-2.2.0.jar with timestamp 1503391817919
>>> 17/08/22 14:20:17 INFO SparkContext: Added JAR
>>> file:/home/jatin/.ivy2/jars/org.apache.kafka_kafka-clients-0.10.0.1.jar
>>> at spark://192.168.25.187:41554/jars/org.apache.kafka_kafka-cli
>>> ents-0.10.0.1.jar with timestamp 1503391817924
>>> 17/08/22 14:20:17 INFO SparkContext: Added JAR
>>> file:/home/jatin/.ivy2/jars/org.apache.spark_spark-tags_2.11-2.2.0.jar
>>> at spark://192.168.25.187:41554/jars/org.apache.spark_spark-tag
>>> s_2.11-2.2.0.jar with timestamp 1503391817924
>>> 17/08/22 14:20:17 INFO SparkContext: Added JAR
>>> file:/home/jatin/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar at
>>> spark://192.168.25.187:41554/jars/org.spark-project.spark_un
>>> used-1.0.0.jar with timestamp 1503391817924
>>> 17/08/22 14:20:17 INFO SparkContext: Added JAR
>>> file:/home/jatin/.ivy2/jars/net.jpountz.lz4_lz4-1.3.0.jar at spark://
>>> 192.168.25.187:41554/jars/net.jpountz.lz4_lz4-1.3.0.jar with timestamp
>>> 1503391817924
>>> 17/08/22 14:20:17 INFO SparkContext: Added JAR
>>> file:/home/jatin/.ivy2/jars/org.xerial.snappy_snappy-java-1.1.2.6.jar
>>> at spark://192.168.25.187:41554/jars/org.xerial.snappy_snappy-j
>>> ava-1.1.2.6.jar with timestamp 1503391817924
>>> 17/08/22 14:20:17 INFO SparkContext: Added JAR
>>> file:/home/jatin/.ivy2/jars/org.slf4j_slf4j-api-1.7.16.jar at spark://
>>> 192.168.25.187:41554/jars/org.slf4j_slf4j-api-1.7.16.jar with timestamp
>>> 1503391817925
>>> 17/08/22 14:20:18 INFO SparkContext: Added file
>>> file:/home/jatin/projects/app/kafka_ml.py at
>>> file:/home/jatin/projects/app/kafka_ml.py with timestamp 1503391818468
>>> 17/08/22 14:20:18 INFO Utils: Copying /home/jatin/projects/app/kafka_ml.py
>>> to /tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33
>>> d4eaeb-d577-4250-a223-d7ea20c82f1b/kafka_ml.py
>>> 17/08/22 14:20:18 INFO SparkContext: Added file
>>> file:/home/jatin/.ivy2/jars/org.apache.spark_spark-sql-kafka-0-10_2.11-2.2.0.jar
>>> at file:/home/jatin/.ivy2/jars/org.apache.spark_spark-sql-kafka-0-10_2.11-2.2.0.jar
>>> with timestamp 1503391818514
>>> 17/08/22 14:20:18 INFO Utils: Copying /home/jatin/.ivy2/jars/org.apa
>>> che.spark_spark-sql-kafka-0-10_2.11-2.2.0.jar to
>>> /tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33
>>> d4eaeb-d577-4250-a223-d7ea20c82f1b/org.apache.spark_spark-sq
>>> l-kafka-0-10_2.11-2.2.0.jar
>>> 17/08/22 14:20:18 INFO SparkContext: Added file
>>> file:/home/jatin/.ivy2/jars/org.apache.kafka_kafka-clients-0.10.0.1.jar
>>> at file:/home/jatin/.ivy2/jars/org.apache.kafka_kafka-clients-0.10.0.1.jar
>>> with timestamp 1503391818538
>>> 17/08/22 14:20:18 INFO Utils: Copying /home/jatin/.ivy2/jars/org.apa
>>> che.kafka_kafka-clients-0.10.0.1.jar to /tmp/spark-9474eda2-128d-4b00-
>>> 8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c8
>>> 2f1b/org.apache.kafka_kafka-clients-0.10.0.1.jar
>>> 17/08/22 14:20:18 INFO SparkContext: Added file
>>> file:/home/jatin/.ivy2/jars/org.apache.spark_spark-tags_2.11-2.2.0.jar
>>> at file:/home/jatin/.ivy2/jars/org.apache.spark_spark-tags_2.11-2.2.0.jar
>>> with timestamp 1503391818567
>>> 17/08/22 14:20:18 INFO Utils: Copying /home/jatin/.ivy2/jars/org.apa
>>> che.spark_spark-tags_2.11-2.2.0.jar to /tmp/spark-9474eda2-128d-4b00-
>>> 8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c8
>>> 2f1b/org.apache.spark_spark-tags_2.11-2.2.0.jar
>>> 17/08/22 14:20:18 INFO SparkContext: Added file
>>> file:/home/jatin/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar at
>>> file:/home/jatin/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar
>>> with timestamp 1503391818573
>>> 17/08/22 14:20:18 INFO Utils: Copying /home/jatin/.ivy2/jars/org.spa
>>> rk-project.spark_unused-1.0.0.jar to /tmp/spark-9474eda2-128d-4b00-
>>> 8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c8
>>> 2f1b/org.spark-project.spark_unused-1.0.0.jar
>>> 17/08/22 14:20:18 INFO SparkContext: Added file
>>> file:/home/jatin/.ivy2/jars/net.jpountz.lz4_lz4-1.3.0.jar at
>>> file:/home/jatin/.ivy2/jars/net.jpountz.lz4_lz4-1.3.0.jar with
>>> timestamp 1503391818581
>>> 17/08/22 14:20:18 INFO Utils: Copying /home/jatin/.ivy2/jars/net.jpountz.lz4_lz4-1.3.0.jar
>>> to /tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33
>>> d4eaeb-d577-4250-a223-d7ea20c82f1b/net.jpountz.lz4_lz4-1.3.0.jar
>>> 17/08/22 14:20:18 INFO SparkContext: Added file
>>> file:/home/jatin/.ivy2/jars/org.xerial.snappy_snappy-java-1.1.2.6.jar
>>> at file:/home/jatin/.ivy2/jars/org.xerial.snappy_snappy-java-1.1.2.6.jar
>>> with timestamp 1503391818602
>>> 17/08/22 14:20:18 INFO Utils: Copying /home/jatin/.ivy2/jars/org.xer
>>> ial.snappy_snappy-java-1.1.2.6.jar to /tmp/spark-9474eda2-128d-4b00-
>>> 8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c8
>>> 2f1b/org.xerial.snappy_snappy-java-1.1.2.6.jar
>>> 17/08/22 14:20:18 INFO SparkContext: Added file
>>> file:/home/jatin/.ivy2/jars/org.slf4j_slf4j-api-1.7.16.jar at
>>> file:/home/jatin/.ivy2/jars/org.slf4j_slf4j-api-1.7.16.jar with
>>> timestamp 1503391818631
>>> 17/08/22 14:20:18 INFO Utils: Copying /home/jatin/.ivy2/jars/org.slf4j_slf4j-api-1.7.16.jar
>>> to /tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33
>>> d4eaeb-d577-4250-a223-d7ea20c82f1b/org.slf4j_slf4j-api-1.7.16.jar
>>> 17/08/22 14:20:18 INFO Executor: Starting executor ID driver on host
>>> localhost
>>> 17/08/22 14:20:18 INFO Utils: Successfully started service
>>> 'org.apache.spark.network.netty.NettyBlockTransferService' on port
>>> 46647.
>>> 17/08/22 14:20:18 INFO NettyBlockTransferService: Server created on
>>> 192.168.25.187:46647
>>> 17/08/22 14:20:18 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy
>>> for block replication policy
>>> 17/08/22 14:20:18 INFO BlockManagerMaster: Registering BlockManager
>>> BlockManagerId(driver, 192.168.25.187, 46647, None)
>>> 17/08/22 14:20:18 INFO BlockManagerMasterEndpoint: Registering block
>>> manager 192.168.25.187:46647 with 366.3 MB RAM, BlockManagerId(driver,
>>> 192.168.25.187, 46647, None)
>>> 17/08/22 14:20:18 INFO BlockManagerMaster: Registered BlockManager
>>> BlockManagerId(driver, 192.168.25.187, 46647, None)
>>> 17/08/22 14:20:18 INFO BlockManager: Initialized BlockManager:
>>> BlockManagerId(driver, 192.168.25.187, 46647, None)
>>> 17/08/22 14:20:19 INFO SharedState: Setting hive.metastore.warehouse.dir
>>> ('null') to the value of spark.sql.warehouse.dir
>>> ('file:/home/jatin/projects/app/spark-warehouse/').
>>> 17/08/22 14:20:19 INFO SharedState: Warehouse path is
>>> 'file:/home/jatin/projects/app/spark-warehouse/'.
>>> 17/08/22 14:20:20 INFO StateStoreCoordinatorRef: Registered
>>> StateStoreCoordinator endpoint
>>> 17/08/22 14:20:26 INFO CodeGenerator: Code generated in 504.736602 ms
>>> 17/08/22 14:20:27 INFO SparkContext: Starting job: treeAggregate at
>>> IDF.scala:54
>>> 17/08/22 14:20:27 INFO DAGScheduler: Got job 0 (treeAggregate at
>>> IDF.scala:54) with 2 output partitions
>>> 17/08/22 14:20:27 INFO DAGScheduler: Final stage: ResultStage 0
>>> (treeAggregate at IDF.scala:54)
>>> 17/08/22 14:20:27 INFO DAGScheduler: Parents of final stage: List()
>>> 17/08/22 14:20:27 INFO DAGScheduler: Missing parents: List()
>>> 17/08/22 14:20:27 INFO DAGScheduler: Submitting ResultStage 0
>>> (MapPartitionsRDD[10] at treeAggregate at IDF.scala:54), which has no
>>> missing parents
>>> 17/08/22 14:20:27 INFO MemoryStore: Block broadcast_0 stored as values
>>> in memory (estimated size 26.0 KB, free 366.3 MB)
>>> 17/08/22 14:20:27 INFO MemoryStore: Block broadcast_0_piece0 stored as
>>> bytes in memory (estimated size 11.5 KB, free 366.3 MB)
>>> 17/08/22 14:20:27 INFO BlockManagerInfo: Added broadcast_0_piece0 in
>>> memory on 192.168.25.187:46647 (size: 11.5 KB, free: 366.3 MB)
>>> 17/08/22 14:20:27 INFO SparkContext: Created broadcast 0 from broadcast
>>> at DAGScheduler.scala:1006
>>> 17/08/22 14:20:27 INFO DAGScheduler: Submitting 2 missing tasks from
>>> ResultStage 0 (MapPartitionsRDD[10] at treeAggregate at IDF.scala:54)
>>> (first 15 tasks are for partitions Vector(0, 1))
>>> 17/08/22 14:20:27 INFO TaskSchedulerImpl: Adding task set 0.0 with 2
>>> tasks
>>> 17/08/22 14:20:27 INFO TaskSetManager: Starting task 0.0 in stage 0.0
>>> (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 4883 bytes)
>>> 17/08/22 14:20:27 INFO TaskSetManager: Starting task 1.0 in stage 0.0
>>> (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 4892 bytes)
>>> 17/08/22 14:20:28 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
>>> 17/08/22 14:20:28 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
>>> 17/08/22 14:20:28 INFO Executor: Fetching file:/home/jatin/.ivy2/jars/or
>>> g.apache.kafka_kafka-clients-0.10.0.1.jar with timestamp 1503391818538
>>> 17/08/22 14:20:28 INFO Utils: /home/jatin/.ivy2/jars/org.apa
>>> che.kafka_kafka-clients-0.10.0.1.jar has been previously copied to
>>> /tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33
>>> d4eaeb-d577-4250-a223-d7ea20c82f1b/org.apache.kafka_kafka-cl
>>> ients-0.10.0.1.jar
>>> 17/08/22 14:20:28 INFO Executor: Fetching file:/home/jatin/.ivy2/jars/or
>>> g.spark-project.spark_unused-1.0.0.jar with timestamp 1503391818573
>>> 17/08/22 14:20:28 INFO Utils: /home/jatin/.ivy2/jars/org.spa
>>> rk-project.spark_unused-1.0.0.jar has been previously copied to
>>> /tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33
>>> d4eaeb-d577-4250-a223-d7ea20c82f1b/org.spark-project.spark_u
>>> nused-1.0.0.jar
>>> 17/08/22 14:20:28 INFO Executor: Fetching file:/home/jatin/.ivy2/jars/or
>>> g.apache.spark_spark-sql-kafka-0-10_2.11-2.2.0.jar with timestamp
>>> 1503391818514
>>> 17/08/22 14:20:28 INFO Utils: /home/jatin/.ivy2/jars/org.apa
>>> che.spark_spark-sql-kafka-0-10_2.11-2.2.0.jar has been previously
>>> copied to /tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33
>>> d4eaeb-d577-4250-a223-d7ea20c82f1b/org.apache.spark_spark-sq
>>> l-kafka-0-10_2.11-2.2.0.jar
>>> 17/08/22 14:20:28 INFO Executor: Fetching file:/home/jatin/projects/app/kafka_ml.py
>>> with timestamp 1503391818468
>>> 17/08/22 14:20:28 INFO Utils: /home/jatin/projects/app/kafka_ml.py has
>>> been previously copied to /tmp/spark-9474eda2-128d-4b00-
>>> 8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c8
>>> 2f1b/kafka_ml.py
>>> 17/08/22 14:20:28 INFO Executor: Fetching file:/home/jatin/.ivy2/jars/org.slf4j_slf4j-api-1.7.16.jar
>>> with timestamp 1503391818631
>>> 17/08/22 14:20:28 INFO Utils: /home/jatin/.ivy2/jars/org.slf4j_slf4j-api-1.7.16.jar
>>> has been previously copied to /tmp/spark-9474eda2-128d-4b00-
>>> 8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c8
>>> 2f1b/org.slf4j_slf4j-api-1.7.16.jar
>>> 17/08/22 14:20:28 INFO Executor: Fetching file:/home/jatin/.ivy2/jars/net.jpountz.lz4_lz4-1.3.0.jar
>>> with timestamp 1503391818581
>>> 17/08/22 14:20:28 INFO Utils: /home/jatin/.ivy2/jars/net.jpountz.lz4_lz4-1.3.0.jar
>>> has been previously copied to /tmp/spark-9474eda2-128d-4b00-
>>> 8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c8
>>> 2f1b/net.jpountz.lz4_lz4-1.3.0.jar
>>> 17/08/22 14:20:28 INFO Executor: Fetching file:/home/jatin/.ivy2/jars/or
>>> g.xerial.snappy_snappy-java-1.1.2.6.jar with timestamp 1503391818602
>>> 17/08/22 14:20:28 INFO Utils: /home/jatin/.ivy2/jars/org.xer
>>> ial.snappy_snappy-java-1.1.2.6.jar has been previously copied to
>>> /tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33
>>> d4eaeb-d577-4250-a223-d7ea20c82f1b/org.xerial.snappy_snappy-
>>> java-1.1.2.6.jar
>>> 17/08/22 14:20:28 INFO Executor: Fetching file:/home/jatin/.ivy2/jars/or
>>> g.apache.spark_spark-tags_2.11-2.2.0.jar with timestamp 1503391818567
>>> 17/08/22 14:20:28 INFO Utils: /home/jatin/.ivy2/jars/org.apa
>>> che.spark_spark-tags_2.11-2.2.0.jar has been previously copied to
>>> /tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33
>>> d4eaeb-d577-4250-a223-d7ea20c82f1b/org.apache.spark_spark-ta
>>> gs_2.11-2.2.0.jar
>>> 17/08/22 14:20:28 INFO Executor: Fetching spark://192.168.25.187:41554/j
>>> ars/org.slf4j_slf4j-api-1.7.16.jar with timestamp 1503391817925
>>> 17/08/22 14:20:28 INFO TransportClientFactory: Successfully created
>>> connection to /192.168.25.187:41554 after 95 ms (0 ms spent in
>>> bootstraps)
>>> 17/08/22 14:20:28 INFO Utils: Fetching spark://192.168.25.187:41554/j
>>> ars/org.slf4j_slf4j-api-1.7.16.jar to /tmp/spark-9474eda2-128d-4b00-
>>> 8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c8
>>> 2f1b/fetchFileTemp9222131133573862634.tmp
>>> 17/08/22 14:20:28 INFO Utils: /tmp/spark-9474eda2-128d-4b00-
>>> 8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c8
>>> 2f1b/fetchFileTemp9222131133573862634.tmp has been previously copied to
>>> /tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33
>>> d4eaeb-d577-4250-a223-d7ea20c82f1b/org.slf4j_slf4j-api-1.7.16.jar
>>> 17/08/22 14:20:28 INFO Executor: Adding file:/tmp/spark-9474eda2-128d-
>>> 4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7e
>>> a20c82f1b/org.slf4j_slf4j-api-1.7.16.jar to class loader
>>> 17/08/22 14:20:28 INFO Executor: Fetching spark://192.168.25.187:41554/j
>>> ars/org.apache.spark_spark-sql-kafka-0-10_2.11-2.2.0.jar with timestamp
>>> 1503391817919
>>> 17/08/22 14:20:28 INFO Utils: Fetching spark://192.168.25.187:41554/j
>>> ars/org.apache.spark_spark-sql-kafka-0-10_2.11-2.2.0.jar to
>>> /tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33
>>> d4eaeb-d577-4250-a223-d7ea20c82f1b/fetchFileTemp3539601253503251859.tmp
>>> 17/08/22 14:20:28 INFO Utils: /tmp/spark-9474eda2-128d-4b00-
>>> 8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c8
>>> 2f1b/fetchFileTemp3539601253503251859.tmp has been previously copied to
>>> /tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33
>>> d4eaeb-d577-4250-a223-d7ea20c82f1b/org.apache.spark_spark-sq
>>> l-kafka-0-10_2.11-2.2.0.jar
>>> 17/08/22 14:20:28 INFO Executor: Adding file:/tmp/spark-9474eda2-128d-
>>> 4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7e
>>> a20c82f1b/org.apache.spark_spark-sql-kafka-0-10_2.11-2.2.0.jar to class
>>> loader
>>> 17/08/22 14:20:28 INFO Executor: Fetching spark://192.168.25.187:41554/j
>>> ars/org.apache.kafka_kafka-clients-0.10.0.1.jar with timestamp
>>> 1503391817924
>>> 17/08/22 14:20:28 INFO Utils: Fetching spark://192.168.25.187:41554/j
>>> ars/org.apache.kafka_kafka-clients-0.10.0.1.jar to
>>> /tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33
>>> d4eaeb-d577-4250-a223-d7ea20c82f1b/fetchFileTemp6110236546923979105.tmp
>>> 17/08/22 14:20:28 INFO Utils: /tmp/spark-9474eda2-128d-4b00-
>>> 8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c8
>>> 2f1b/fetchFileTemp6110236546923979105.tmp has been previously copied to
>>> /tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33
>>> d4eaeb-d577-4250-a223-d7ea20c82f1b/org.apache.kafka_kafka-cl
>>> ients-0.10.0.1.jar
>>> 17/08/22 14:20:28 INFO Executor: Adding file:/tmp/spark-9474eda2-128d-
>>> 4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7e
>>> a20c82f1b/org.apache.kafka_kafka-clients-0.10.0.1.jar to class loader
>>> 17/08/22 14:20:28 INFO Executor: Fetching spark://192.168.25.187:41554/j
>>> ars/net.jpountz.lz4_lz4-1.3.0.jar with timestamp 1503391817924
>>> 17/08/22 14:20:28 INFO Utils: Fetching spark://192.168.25.187:41554/j
>>> ars/net.jpountz.lz4_lz4-1.3.0.jar to /tmp/spark-9474eda2-128d-4b00-
>>> 8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c8
>>> 2f1b/fetchFileTemp7615793935390489572.tmp
>>> 17/08/22 14:20:28 INFO Utils: /tmp/spark-9474eda2-128d-4b00-
>>> 8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c8
>>> 2f1b/fetchFileTemp7615793935390489572.tmp has been previously copied to
>>> /tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33
>>> d4eaeb-d577-4250-a223-d7ea20c82f1b/net.jpountz.lz4_lz4-1.3.0.jar
>>> 17/08/22 14:20:28 INFO Executor: Adding file:/tmp/spark-9474eda2-128d-
>>> 4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7e
>>> a20c82f1b/net.jpountz.lz4_lz4-1.3.0.jar to class loader
>>> 17/08/22 14:20:28 INFO Executor: Fetching spark://192.168.25.187:41554/j
>>> ars/org.xerial.snappy_snappy-java-1.1.2.6.jar with timestamp
>>> 1503391817924
>>> 17/08/22 14:20:28 INFO Utils: Fetching spark://192.168.25.187:41554/j
>>> ars/org.xerial.snappy_snappy-java-1.1.2.6.jar to
>>> /tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33
>>> d4eaeb-d577-4250-a223-d7ea20c82f1b/fetchFileTemp4037226777938816116.tmp
>>> 17/08/22 14:20:28 INFO Utils: /tmp/spark-9474eda2-128d-4b00-
>>> 8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c8
>>> 2f1b/fetchFileTemp4037226777938816116.tmp has been previously copied to
>>> /tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33
>>> d4eaeb-d577-4250-a223-d7ea20c82f1b/org.xerial.snappy_snappy-
>>> java-1.1.2.6.jar
>>> 17/08/22 14:20:28 INFO Executor: Adding file:/tmp/spark-9474eda2-128d-
>>> 4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7e
>>> a20c82f1b/org.xerial.snappy_snappy-java-1.1.2.6.jar to class loader
>>> 17/08/22 14:20:28 INFO Executor: Fetching spark://192.168.25.187:41554/j
>>> ars/org.spark-project.spark_unused-1.0.0.jar with timestamp
>>> 1503391817924
>>> 17/08/22 14:20:28 INFO Utils: Fetching spark://192.168.25.187:41554/j
>>> ars/org.spark-project.spark_unused-1.0.0.jar to
>>> /tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33
>>> d4eaeb-d577-4250-a223-d7ea20c82f1b/fetchFileTemp5870103019955438186.tmp
>>> 17/08/22 14:20:28 INFO Utils: /tmp/spark-9474eda2-128d-4b00-
>>> 8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c8
>>> 2f1b/fetchFileTemp5870103019955438186.tmp has been previously copied to
>>> /tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33
>>> d4eaeb-d577-4250-a223-d7ea20c82f1b/org.spark-project.spark_u
>>> nused-1.0.0.jar
>>> 17/08/22 14:20:28 INFO Executor: Adding file:/tmp/spark-9474eda2-128d-
>>> 4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7e
>>> a20c82f1b/org.spark-project.spark_unused-1.0.0.jar to class loader
>>> 17/08/22 14:20:28 INFO Executor: Fetching spark://192.168.25.187:41554/j
>>> ars/org.apache.spark_spark-tags_2.11-2.2.0.jar with timestamp
>>> 1503391817924
>>> 17/08/22 14:20:28 INFO Utils: Fetching spark://192.168.25.187:41554/j
>>> ars/org.apache.spark_spark-tags_2.11-2.2.0.jar to
>>> /tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33
>>> d4eaeb-d577-4250-a223-d7ea20c82f1b/fetchFileTemp344232108189163563.tmp
>>> 17/08/22 14:20:28 INFO Utils: /tmp/spark-9474eda2-128d-4b00-
>>> 8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c8
>>> 2f1b/fetchFileTemp344232108189163563.tmp has been previously copied to
>>> /tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33
>>> d4eaeb-d577-4250-a223-d7ea20c82f1b/org.apache.spark_spark-ta
>>> gs_2.11-2.2.0.jar
>>> 17/08/22 14:20:28 INFO Executor: Adding file:/tmp/spark-9474eda2-128d-
>>> 4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7e
>>> a20c82f1b/org.apache.spark_spark-tags_2.11-2.2.0.jar to class loader
>>> 17/08/22 14:20:29 INFO CodeGenerator: Code generated in 32.736524 ms
>>> 17/08/22 14:20:29 INFO CodeGenerator: Code generated in 25.33048 ms
>>> 17/08/22 14:20:31 INFO PythonRunner: Times: total = 436, boot = 401,
>>> init = 33, finish = 2
>>> 17/08/22 14:20:31 INFO PythonRunner: Times: total = 439, boot = 406,
>>> init = 32, finish = 1
>>> 17/08/22 14:20:31 INFO MemoryStore: Block taskresult_1 stored as bytes
>>> in memory (estimated size 2.0 MB, free 364.3 MB)
>>> 17/08/22 14:20:31 INFO BlockManagerInfo: Added taskresult_1 in memory on
>>> 192.168.25.187:46647 (size: 2.0 MB, free: 364.3 MB)
>>> 17/08/22 14:20:31 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1).
>>> 2109385 bytes result sent via BlockManager)
>>> 17/08/22 14:20:31 INFO MemoryStore: Block taskresult_0 stored as bytes
>>> in memory (estimated size 2.0 MB, free 362.2 MB)
>>> 17/08/22 14:20:31 INFO BlockManagerInfo: Added taskresult_0 in memory on
>>> 192.168.25.187:46647 (size: 2.0 MB, free: 362.3 MB)
>>> 17/08/22 14:20:31 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0).
>>> 2109385 bytes result sent via BlockManager)
>>> 17/08/22 14:20:31 INFO TransportClientFactory: Successfully created
>>> connection to /192.168.25.187:46647 after 2 ms (0 ms spent in
>>> bootstraps)
>>> 17/08/22 14:20:31 INFO TaskSetManager: Finished task 0.0 in stage 0.0
>>> (TID 0) in 3689 ms on localhost (executor driver) (1/2)
>>> 17/08/22 14:20:31 INFO TaskSetManager: Finished task 1.0 in stage 0.0
>>> (TID 1) in 3682 ms on localhost (executor driver) (2/2)
>>> 17/08/22 14:20:31 INFO BlockManagerInfo: Removed taskresult_0 on
>>> 192.168.25.187:46647 in memory (size: 2.0 MB, free: 364.3 MB)
>>> 17/08/22 14:20:31 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose
>>> tasks have all completed, from pool
>>> 17/08/22 14:20:31 INFO BlockManagerInfo: Removed taskresult_1 on
>>> 192.168.25.187:46647 in memory (size: 2.0 MB, free: 366.3 MB)
>>> 17/08/22 14:20:31 INFO DAGScheduler: ResultStage 0 (treeAggregate at
>>> IDF.scala:54) finished in 3.776 s
>>> 17/08/22 14:20:31 INFO DAGScheduler: Job 0 finished: treeAggregate at
>>> IDF.scala:54, took 4.418658 s
>>> 17/08/22 14:20:32 INFO CodeGenerator: Code generated in 173.848512 ms
>>> 17/08/22 14:20:32 INFO CodeGenerator: Code generated in 84.504745 ms
>>> 17/08/22 14:20:32 INFO Instrumentation: LogisticRegression-LogisticReg
>>> ression_4647a03eff9e40a729dc-1694037276-1: training: numPartitions=2
>>> storageLevel=StorageLevel(disk, memory, deserialized, 1 replicas)
>>> 17/08/22 14:20:32 INFO Instrumentation: LogisticRegression-LogisticReg
>>> ression_4647a03eff9e40a729dc-1694037276-1:
>>> {"elasticNetParam":0.0,"fitIntercept":true,"standardization"
>>> :true,"threshold":0.5,"regParam":0.001,"tol":1.0E-6,"maxIter":10}
>>> 17/08/22 14:20:32 INFO SparkContext: Starting job: treeAggregate at
>>> LogisticRegression.scala:517
>>> 17/08/22 14:20:32 INFO DAGScheduler: Got job 1 (treeAggregate at
>>> LogisticRegression.scala:517) with 2 output partitions
>>> 17/08/22 14:20:32 INFO DAGScheduler: Final stage: ResultStage 1
>>> (treeAggregate at LogisticRegression.scala:517)
>>> 17/08/22 14:20:32 INFO DAGScheduler: Parents of final stage: List()
>>> 17/08/22 14:20:32 INFO DAGScheduler: Missing parents: List()
>>> 17/08/22 14:20:32 INFO DAGScheduler: Submitting ResultStage 1
>>> (MapPartitionsRDD[20] at treeAggregate at LogisticRegression.scala:517),
>>> which has no missing parents
>>> 17/08/22 14:20:32 INFO MemoryStore: Block broadcast_1 stored as values
>>> in memory (estimated size 2.0 MB, free 364.2 MB)
>>> 17/08/22 14:20:32 INFO MemoryStore: Block broadcast_1_piece0 stored as
>>> bytes in memory (estimated size 24.3 KB, free 364.2 MB)
>>> 17/08/22 14:20:32 INFO BlockManagerInfo: Added broadcast_1_piece0 in
>>> memory on 192.168.25.187:46647 (size: 24.3 KB, free: 366.3 MB)
>>> 17/08/22 14:20:33 INFO SparkContext: Created broadcast 1 from broadcast
>>> at DAGScheduler.scala:1006
>>> 17/08/22 14:20:33 INFO DAGScheduler: Submitting 2 missing tasks from
>>> ResultStage 1 (MapPartitionsRDD[20] at treeAggregate at
>>> LogisticRegression.scala:517) (first 15 tasks are for partitions Vector(0,
>>> 1))
>>> 17/08/22 14:20:33 INFO TaskSchedulerImpl: Adding task set 1.0 with 2
>>> tasks
>>> 17/08/22 14:20:33 INFO TaskSetManager: Starting task 0.0 in stage 1.0
>>> (TID 2, localhost, executor driver, partition 0, PROCESS_LOCAL, 4883 bytes)
>>> 17/08/22 14:20:33 INFO TaskSetManager: Starting task 1.0 in stage 1.0
>>> (TID 3, localhost, executor driver, partition 1, PROCESS_LOCAL, 4892 bytes)
>>> 17/08/22 14:20:33 INFO Executor: Running task 0.0 in stage 1.0 (TID 2)
>>> 17/08/22 14:20:33 INFO Executor: Running task 1.0 in stage 1.0 (TID 3)
>>> 17/08/22 14:20:33 INFO CodeGenerator: Code generated in 42.221867 ms
>>> 17/08/22 14:20:33 INFO PythonRunner: Times: total = 53, boot = -3759,
>>> init = 3812, finish = 0
>>> 17/08/22 14:20:33 INFO MemoryStore: Block rdd_19_1 stored as values in
>>> memory (estimated size 272.0 B, free 364.2 MB)
>>> 17/08/22 14:20:33 INFO PythonRunner: Times: total = 47, boot = -3782,
>>> init = 3829, finish = 0
>>> 17/08/22 14:20:33 INFO MemoryStore: Block rdd_19_0 stored as values in
>>> memory (estimated size 288.0 B, free 364.2 MB)
>>> 17/08/22 14:20:33 INFO BlockManagerInfo: Added rdd_19_1 in memory on
>>> 192.168.25.187:46647 (size: 272.0 B, free: 366.3 MB)
>>> 17/08/22 14:20:33 INFO BlockManagerInfo: Added rdd_19_0 in memory on
>>> 192.168.25.187:46647 (size: 288.0 B, free: 366.3 MB)
>>> 17/08/22 14:20:33 INFO ContextCleaner: Cleaned accumulator 1
>>> 17/08/22 14:20:33 INFO ContextCleaner: Cleaned accumulator 2
>>> 17/08/22 14:20:33 INFO BlockManagerInfo: Removed broadcast_0_piece0 on
>>> 192.168.25.187:46647 in memory (size: 11.5 KB, free: 366.3 MB)
>>> 17/08/22 14:20:33 INFO MemoryStore: Block taskresult_2 stored as bytes
>>> in memory (estimated size 16.1 MB, free 348.2 MB)
>>> 17/08/22 14:20:33 INFO BlockManagerInfo: Added taskresult_2 in memory on
>>> 192.168.25.187:46647 (size: 16.1 MB, free: 350.2 MB)
>>> 17/08/22 14:20:33 INFO Executor: Finished task 0.0 in stage 1.0 (TID 2).
>>> 16862200 bytes result sent via BlockManager)
>>> 17/08/22 14:20:33 INFO MemoryStore: Block taskresult_3 stored as bytes
>>> in memory (estimated size 16.1 MB, free 332.1 MB)
>>> 17/08/22 14:20:33 INFO BlockManagerInfo: Added taskresult_3 in memory on
>>> 192.168.25.187:46647 (size: 16.1 MB, free: 334.1 MB)
>>> 17/08/22 14:20:33 INFO Executor: Finished task 1.0 in stage 1.0 (TID 3).
>>> 16862200 bytes result sent via BlockManager)
>>> 17/08/22 14:20:34 INFO BlockManagerInfo: Removed taskresult_2 on
>>> 192.168.25.187:46647 in memory (size: 16.1 MB, free: 350.2 MB)
>>> 17/08/22 14:20:34 INFO TaskSetManager: Finished task 0.0 in stage 1.0
>>> (TID 2) in 1056 ms on localhost (executor driver) (1/2)
>>> 17/08/22 14:20:34 INFO BlockManagerInfo: Removed taskresult_3 on
>>> 192.168.25.187:46647 in memory (size: 16.1 MB, free: 366.3 MB)
>>> 17/08/22 14:20:34 INFO TaskSetManager: Finished task 1.0 in stage 1.0
>>> (TID 3) in 1094 ms on localhost (executor driver) (2/2)
>>> 17/08/22 14:20:34 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose
>>> tasks have all completed, from pool
>>> 17/08/22 14:20:34 INFO DAGScheduler: ResultStage 1 (treeAggregate at
>>> LogisticRegression.scala:517) finished in 1.100 s
>>> 17/08/22 14:20:34 INFO DAGScheduler: Job 1 finished: treeAggregate at
>>> LogisticRegression.scala:517, took 1.217516 s
>>> 17/08/22 14:20:34 INFO Instrumentation: LogisticRegression-LogisticReg
>>> ression_4647a03eff9e40a729dc-1694037276-1: {"numClasses":2}
>>> 17/08/22 14:20:34 INFO Instrumentation: LogisticRegression-LogisticReg
>>> ression_4647a03eff9e40a729dc-1694037276-1: {"numFeatures":262144}
>>> 17/08/22 14:20:34 INFO MemoryStore: Block broadcast_2 stored as values
>>> in memory (estimated size 2.0 MB, free 362.2 MB)
>>> 17/08/22 14:20:34 INFO MemoryStore: Block broadcast_2_piece0 stored as
>>> bytes in memory (estimated size 10.2 KB, free 362.2 MB)
>>> 17/08/22 14:20:34 INFO BlockManagerInfo: Added broadcast_2_piece0 in
>>> memory on 192.168.25.187:46647 (size: 10.2 KB, free: 366.3 MB)
>>> 17/08/22 14:20:34 INFO SparkContext: Created broadcast 2 from broadcast
>>> at LogisticRegression.scala:600
>>> 17/08/22 14:20:34 INFO MemoryStore: Block broadcast_3 stored as values
>>> in memory (estimated size 2.0 MB, free 360.2 MB)
>>> 17/08/22 14:20:34 INFO MemoryStore: Block broadcast_3_piece0 stored as
>>> bytes in memory (estimated size 10.1 KB, free 360.2 MB)
>>> 17/08/22 14:20:34 INFO BlockManagerInfo: Added broadcast_3_piece0 in
>>> memory on 192.168.25.187:46647 (size: 10.1 KB, free: 366.3 MB)
>>> 17/08/22 14:20:34 INFO SparkContext: Created broadcast 3 from broadcast
>>> at LogisticRegression.scala:1879
>>> 17/08/22 14:20:34 INFO SparkContext: Starting job: treeAggregate at
>>> LogisticRegression.scala:1892
>>> 17/08/22 14:20:34 INFO DAGScheduler: Got job 2 (treeAggregate at
>>> LogisticRegression.scala:1892) with 2 output partitions
>>> 17/08/22 14:20:34 INFO DAGScheduler: Final stage: ResultStage 2
>>> (treeAggregate at LogisticRegression.scala:1892)
>>> 17/08/22 14:20:34 INFO DAGScheduler: Parents of final stage: List()
>>> 17/08/22 14:20:34 INFO DAGScheduler: Missing parents: List()
>>> 17/08/22 14:20:34 INFO DAGScheduler: Submitting ResultStage 2
>>> (MapPartitionsRDD[21] at treeAggregate at LogisticRegression.scala:1892),
>>> which has no missing parents
>>> 17/08/22 14:20:34 INFO MemoryStore: Block broadcast_4 stored as values
>>> in memory (estimated size 2.0 MB, free 358.2 MB)
>>> 17/08/22 14:20:34 INFO MemoryStore: Block broadcast_4_piece0 stored as
>>> bytes in memory (estimated size 24.6 KB, free 358.2 MB)
>>> 17/08/22 14:20:34 INFO BlockManagerInfo: Added broadcast_4_piece0 in
>>> memory on 192.168.25.187:46647 (size: 24.6 KB, free: 366.2 MB)
>>> 17/08/22 14:20:34 INFO SparkContext: Created broadcast 4 from broadcast
>>> at DAGScheduler.scala:1006
>>> 17/08/22 14:20:34 INFO DAGScheduler: Submitting 2 missing tasks from
>>> ResultStage 2 (MapPartitionsRDD[21] at treeAggregate at
>>> LogisticRegression.scala:1892) (first 15 tasks are for partitions Vector(0,
>>> 1))
>>> 17/08/22 14:20:34 INFO TaskSchedulerImpl: Adding task set 2.0 with 2
>>> tasks
>>> 17/08/22 14:20:34 INFO TaskSetManager: Starting task 0.0 in stage 2.0
>>> (TID 4, localhost, executor driver, partition 0, PROCESS_LOCAL, 4883 bytes)
>>> 17/08/22 14:20:34 INFO TaskSetManager: Starting task 1.0 in stage 2.0
>>> (TID 5, localhost, executor driver, partition 1, PROCESS_LOCAL, 4892 bytes)
>>> 17/08/22 14:20:34 INFO Executor: Running task 0.0 in stage 2.0 (TID 4)
>>> 17/08/22 14:20:34 INFO Executor: Running task 1.0 in stage 2.0 (TID 5)
>>> 17/08/22 14:20:34 INFO BlockManager: Found block rdd_19_0 locally
>>> 17/08/22 14:20:34 INFO BlockManager: Found block rdd_19_1 locally
>>> 17/08/22 14:20:34 INFO MemoryStore: Block taskresult_5 stored as bytes
>>> in memory (estimated size 2.0 MB, free 356.2 MB)
>>> 17/08/22 14:20:34 INFO BlockManagerInfo: Added taskresult_5 in memory on
>>> 192.168.25.187:46647 (size: 2.0 MB, free: 364.2 MB)
>>> 17/08/22 14:20:34 INFO MemoryStore: Block taskresult_4 stored as bytes
>>> in memory (estimated size 2.0 MB, free 354.1 MB)
>>> 17/08/22 14:20:34 INFO Executor: Finished task 1.0 in stage 2.0 (TID 5).
>>> 2110568 bytes result sent via BlockManager)
>>> 17/08/22 14:20:34 INFO BlockManagerInfo: Added taskresult_4 in memory on
>>> 192.168.25.187:46647 (size: 2.0 MB, free: 362.2 MB)
>>> 17/08/22 14:20:34 INFO Executor: Finished task 0.0 in stage 2.0 (TID 4).
>>> 2110568 bytes result sent via BlockManager)
>>> 17/08/22 14:20:34 INFO BlockManagerInfo: Removed taskresult_5 on
>>> 192.168.25.187:46647 in memory (size: 2.0 MB, free: 364.2 MB)
>>> 17/08/22 14:20:34 INFO TaskSetManager: Finished task 1.0 in stage 2.0
>>> (TID 5) in 124 ms on localhost (executor driver) (1/2)
>>> 17/08/22 14:20:34 INFO TaskSetManager: Finished task 0.0 in stage 2.0
>>> (TID 4) in 147 ms on localhost (executor driver) (2/2)
>>> 17/08/22 14:20:34 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose
>>> tasks have all completed, from pool
>>> 17/08/22 14:20:34 INFO BlockManagerInfo: Removed taskresult_4 on
>>> 192.168.25.187:46647 in memory (size: 2.0 MB, free: 366.2 MB)
>>> 17/08/22 14:20:34 INFO DAGScheduler: ResultStage 2 (treeAggregate at
>>> LogisticRegression.scala:1892) finished in 0.151 s
>>> 17/08/22 14:20:34 INFO DAGScheduler: Job 2 finished: treeAggregate at
>>> LogisticRegression.scala:1892, took 0.210300 s
>>> 17/08/22 14:20:34 WARN BLAS: Failed to load implementation from:
>>> com.github.fommil.netlib.NativeSystemBLAS
>>> 17/08/22 14:20:34 WARN BLAS: Failed to load implementation from:
>>> com.github.fommil.netlib.NativeRefBLAS
>>> 17/08/22 14:20:34 INFO TorrentBroadcast: Destroying Broadcast(3) (from
>>> destroy at LogisticRegression.scala:1933)
>>> 17/08/22 14:20:34 INFO BlockManagerInfo: Removed broadcast_3_piece0 on
>>> 192.168.25.187:46647 in memory (size: 10.1 KB, free: 366.2 MB)
>>> 17/08/22 14:20:34 INFO MemoryStore: Block broadcast_5 stored as values
>>> in memory (estimated size 2.0 MB, free 358.2 MB)
>>> 17/08/22 14:20:34 INFO MemoryStore: Block broadcast_5_piece0 stored as
>>> bytes in memory (estimated size 10.2 KB, free 358.2 MB)
>>> 17/08/22 14:20:34 INFO BlockManagerInfo: Added broadcast_5_piece0 in
>>> memory on 192.168.25.187:46647 (size: 10.2 KB, free: 366.2 MB)
>>> 17/08/22 14:20:34 INFO SparkContext: Created broadcast 5 from broadcast
>>> at LogisticRegression.scala:1879
>>> 17/08/22 14:20:34 INFO SparkContext: Starting job: treeAggregate at
>>> LogisticRegression.scala:1892
>>> 17/08/22 14:20:34 INFO DAGScheduler: Got job 3 (treeAggregate at
>>> LogisticRegression.scala:1892) with 2 output partitions
>>> 17/08/22 14:20:34 INFO DAGScheduler: Final stage: ResultStage 3
>>> (treeAggregate at LogisticRegression.scala:1892)
>>> 17/08/22 14:20:34 INFO DAGScheduler: Parents of final stage: List()
>>> 17/08/22 14:20:34 INFO DAGScheduler: Missing parents: List()
>>> 17/08/22 14:20:34 INFO DAGScheduler: Submitting ResultStage 3
>>> (MapPartitionsRDD[22] at treeAggregate at LogisticRegression.scala:1892),
>>> which has no missing parents
>>> 17/08/22 14:20:34 INFO MemoryStore: Block broadcast_6 stored as values
>>> in memory (estimated size 2.0 MB, free 356.1 MB)
>>> 17/08/22 14:20:34 INFO MemoryStore: Block broadcast_6_piece0 stored as
>>> bytes in memory (estimated size 24.6 KB, free 356.1 MB)
>>> 17/08/22 14:20:35 INFO BlockManagerInfo: Added broadcast_6_piece0 in
>>> memory on 192.168.25.187:46647 (size: 24.6 KB, free: 366.2 MB)
>>> 17/08/22 14:20:35 INFO SparkContext: Created broadcast 6 from broadcast
>>> at DAGScheduler.scala:1006
>>> 17/08/22 14:20:35 INFO DAGScheduler: Submitting 2 missing tasks from
>>> ResultStage 3 (MapPartitionsRDD[22] at treeAggregate at
>>> LogisticRegression.scala:1892) (first 15 tasks are for partitions Vector(0,
>>> 1))
>>> 17/08/22 14:20:35 INFO TaskSchedulerImpl: Adding task set 3.0 with 2
>>> tasks
>>> 17/08/22 14:20:35 INFO TaskSetManager: Starting task 0.0 in stage 3.0
>>> (TID 6, localhost, executor driver, partition 0, PROCESS_LOCAL, 4883 bytes)
>>> 17/08/22 14:20:35 INFO TaskSetManager: Starting task 1.0 in stage 3.0
>>> (TID 7, localhost, executor driver, partition 1, PROCESS_LOCAL, 4892 bytes)
>>> 17/08/22 14:20:35 INFO Executor: Running task 0.0 in stage 3.0 (TID 6)
>>> 17/08/22 14:20:35 INFO Executor: Running task 1.0 in stage 3.0 (TID 7)
>>> 17/08/22 14:20:35 INFO BlockManager: Found block rdd_19_0 locally
>>> 17/08/22 14:20:35 INFO BlockManager: Found block rdd_19_1 locally
>>> 17/08/22 14:20:35 INFO MemoryStore: Block taskresult_6 stored as bytes
>>> in memory (estimated size 2.0 MB, free 354.1 MB)
>>> 17/08/22 14:20:35 INFO BlockManagerInfo: Added taskresult_6 in memory on
>>> 192.168.25.187:46647 (size: 2.0 MB, free: 364.2 MB)
>>> 17/08/22 14:20:35 INFO MemoryStore: Block taskresult_7 stored as bytes
>>> in memory (estimated size 2.0 MB, free 352.1 MB)
>>> 17/08/22 14:20:35 INFO BlockManagerInfo: Added taskresult_7 in memory on
>>> 192.168.25.187:46647 (size: 2.0 MB, free: 362.2 MB)
>>> 17/08/22 14:20:35 INFO Executor: Finished task 0.0 in stage 3.0 (TID 6).
>>> 2110568 bytes result sent via BlockManager)
>>> 17/08/22 14:20:35 INFO Executor: Finished task 1.0 in stage 3.0 (TID 7).
>>> 2110568 bytes result sent via BlockManager)
>>> 17/08/22 14:20:35 INFO BlockManagerInfo: Removed taskresult_7 on
>>> 192.168.25.187:46647 in memory (size: 2.0 MB, free: 364.2 MB)
>>> 17/08/22 14:20:35 INFO TaskSetManager: Finished task 1.0 in stage 3.0
>>> (TID 7) in 177 ms on localhost (executor driver) (1/2)
>>> 17/08/22 14:20:35 INFO BlockManagerInfo: Removed broadcast_4_piece0 on
>>> 192.168.25.187:46647 in memory (size: 24.6 KB, free: 364.2 MB)
>>> 17/08/22 14:20:35 INFO TaskSetManager: Finished task 0.0 in stage 3.0
>>> (TID 6) in 223 ms on localhost (executor driver) (2/2)
>>> 17/08/22 14:20:35 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose
>>> tasks have all completed, from pool
>>> 17/08/22 14:20:35 INFO DAGScheduler: ResultStage 3 (treeAggregate at
>>> LogisticRegression.scala:1892) finished in 0.225 s
>>> 17/08/22 14:20:35 INFO DAGScheduler: Job 3 finished: treeAggregate at
>>> LogisticRegression.scala:1892, took 0.255439 s
>>> 17/08/22 14:20:35 INFO BlockManagerInfo: Removed taskresult_6 on
>>> 192.168.25.187:46647 in memory (size: 2.0 MB, free: 366.2 MB)
>>> 17/08/22 14:20:35 INFO TorrentBroadcast: Destroying Broadcast(5) (from
>>> destroy at LogisticRegression.scala:1933)
>>> 17/08/22 14:20:35 INFO BlockManagerInfo: Removed broadcast_5_piece0 on
>>> 192.168.25.187:46647 in memory (size: 10.2 KB, free: 366.2 MB)
>>> 17/08/22 14:20:35 INFO LBFGS: Step Size: 1.000
>>> 17/08/22 14:20:35 INFO LBFGS: Val and Grad Norm: 0.317022 (rel: 0.543)
>>> 0.358320
>>> 17/08/22 14:20:35 INFO MemoryStore: Block broadcast_7 stored as values
>>> in memory (estimated size 2.0 MB, free 358.2 MB)
>>> 17/08/22 14:20:35 INFO MemoryStore: Block broadcast_7_piece0 stored as
>>> bytes in memory (estimated size 10.3 KB, free 358.2 MB)
>>> 17/08/22 14:20:35 INFO BlockManagerInfo: Added broadcast_7_piece0 in
>>> memory on 192.168.25.187:46647 (size: 10.3 KB, free: 366.2 MB)
>>> 17/08/22 14:20:35 INFO SparkContext: Created broadcast 7 from broadcast
>>> at LogisticRegression.scala:1879
>>> 17/08/22 14:20:35 INFO SparkContext: Starting job: treeAggregate at
>>> LogisticRegression.scala:1892
>>> 17/08/22 14:20:35 INFO DAGScheduler: Got job 4 (treeAggregate at
>>> LogisticRegression.scala:1892) with 2 output partitions
>>> 17/08/22 14:20:35 INFO DAGScheduler: Final stage: ResultStage 4
>>> (treeAggregate at LogisticRegression.scala:1892)
>>> 17/08/22 14:20:35 INFO DAGScheduler: Parents of final stage: List()
>>> 17/08/22 14:20:35 INFO DAGScheduler: Missing parents: List()
>>> 17/08/22 14:20:35 INFO DAGScheduler: Submitting ResultStage 4
>>> (MapPartitionsRDD[23] at treeAggregate at LogisticRegression.scala:1892),
>>> which has no missing parents
>>> 17/08/22 14:20:35 INFO MemoryStore: Block broadcast_8 stored as values
>>> in memory (estimated size 2.0 MB, free 356.1 MB)
>>> 17/08/22 14:20:35 INFO MemoryStore: Block broadcast_8_piece0 stored as
>>> bytes in memory (estimated size 24.6 KB, free 356.1 MB)
>>> 17/08/22 14:20:35 INFO BlockManagerInfo: Added broadcast_8_piece0 in
>>> memory on 192.168.25.187:46647 (size: 24.6 KB, free: 366.2 MB)
>>> 17/08/22 14:20:35 INFO SparkContext: Created broadcast 8 from broadcast
>>> at DAGScheduler.scala:1006
>>> 17/08/22 14:20:35 INFO DAGScheduler: Submitting 2 missing tasks from
>>> ResultStage 4 (MapPartitionsRDD[23] at treeAggregate at
>>> LogisticRegression.scala:1892) (first 15 tasks are for partitions Vector(0,
>>> 1))
>>> 17/08/22 14:20:35 INFO TaskSchedulerImpl: Adding task set 4.0 with 2
>>> tasks
>>> 17/08/22 14:20:35 INFO TaskSetManager: Starting task 0.0 in stage 4.0
>>> (TID 8, localhost, executor driver, partition 0, PROCESS_LOCAL, 4883 bytes)
>>> 17/08/22 14:20:35 INFO TaskSetManager: Starting task 1.0 in stage 4.0
>>> (TID 9, localhost, executor driver, partition 1, PROCESS_LOCAL, 4892 bytes)
>>> 17/08/22 14:20:35 INFO Executor: Running task 0.0 in stage 4.0 (TID 8)
>>> 17/08/22 14:20:35 INFO Executor: Running task 1.0 in stage 4.0 (TID 9)
>>> 17/08/22 14:20:35 INFO BlockManager: Found block rdd_19_0 locally
>>> 17/08/22 14:20:35 INFO BlockManager: Found block rdd_19_1 locally
>>> 17/08/22 14:20:35 INFO MemoryStore: Block taskresult_9 stored as bytes
>>> in memory (estimated size 2.0 MB, free 354.1 MB)
>>> 17/08/22 14:20:35 INFO MemoryStore: Block taskresult_8 stored as bytes
>>> in memory (estimated size 2.0 MB, free 352.1 MB)
>>> 17/08/22 14:20:35 INFO BlockManagerInfo: Added taskresult_9 in memory on
>>> 192.168.25.187:46647 (size: 2.0 MB, free: 364.2 MB)
>>> 17/08/22 14:20:35 INFO Executor: Finished task 1.0 in stage 4.0 (TID 9).
>>> 2110568 bytes result sent via BlockManager)
>>> 17/08/22 14:20:35 INFO BlockManagerInfo: Removed taskresult_9 on
>>> 192.168.25.187:46647 in memory (size: 2.0 MB, free: 366.2 MB)
>>> 17/08/22 14:20:35 INFO TaskSetManager: Finished task 1.0 in stage 4.0
>>> (TID 9) in 103 ms on localhost (executor driver) (1/2)
>>> 17/08/22 14:20:35 INFO BlockManagerInfo: Added taskresult_8 in memory on
>>> 192.168.25.187:46647 (size: 2.0 MB, free: 364.2 MB)
>>> 17/08/22 14:20:35 INFO Executor: Finished task 0.0 in stage 4.0 (TID 8).
>>> 2110568 bytes result sent via BlockManager)
>>> 17/08/22 14:20:35 INFO BlockManagerInfo: Removed taskresult_8 on
>>> 192.168.25.187:46647 in memory (size: 2.0 MB, free: 366.2 MB)
>>> 17/08/22 14:20:35 INFO TaskSetManager: Finished task 0.0 in stage 4.0
>>> (TID 8) in 136 ms on localhost (executor driver) (2/2)
>>> 17/08/22 14:20:35 INFO TaskSchedulerImpl: Removed TaskSet 4.0, whose
>>> tasks have all completed, from pool
>>> 17/08/22 14:20:35 INFO DAGScheduler: ResultStage 4 (treeAggregate at
>>> LogisticRegression.scala:1892) finished in 0.140 s
>>> 17/08/22 14:20:35 INFO DAGScheduler: Job 4 finished: treeAggregate at
>>> LogisticRegression.scala:1892, took 0.182699 s
>>> 17/08/22 14:20:35 INFO TorrentBroadcast: Destroying Broadcast(7) (from
>>> destroy at LogisticRegression.scala:1933)
>>> 17/08/22 14:20:35 INFO BlockManagerInfo: Removed broadcast_7_piece0 on
>>> 192.168.25.187:46647 in memory (size: 10.3 KB, free: 366.2 MB)
>>> 17/08/22 14:20:35 INFO LBFGS: Step Size: 1.000
>>> 17/08/22 14:20:35 INFO LBFGS: Val and Grad Norm: 0.145089 (rel: 0.542)
>>> 0.164387
>>> 17/08/22 14:20:35 INFO MemoryStore: Block broadcast_9 stored as values
>>> in memory (estimated size 2.0 MB, free 356.1 MB)
>>> 17/08/22 14:20:35 INFO MemoryStore: Block broadcast_9_piece0 stored as
>>> bytes in memory (estimated size 10.3 KB, free 356.1 MB)
>>> 17/08/22 14:20:35 INFO BlockManagerInfo: Added broadcast_9_piece0 in
>>> memory on 192.168.25.187:46647 (size: 10.3 KB, free: 366.2 MB)
>>> 17/08/22 14:20:35 INFO SparkContext: Created broadcast 9 from broadcast
>>> at LogisticRegression.scala:1879
>>> 17/08/22 14:20:35 INFO SparkContext: Starting job: treeAggregate at
>>> LogisticRegression.scala:1892
>>> 17/08/22 14:20:35 INFO DAGScheduler: Got job 5 (treeAggregate at
>>> LogisticRegression.scala:1892) with 2 output partitions
>>> 17/08/22 14:20:35 INFO DAGScheduler: Final stage: ResultStage 5
>>> (treeAggregate at LogisticRegression.scala:1892)
>>> 17/08/22 14:20:35 INFO DAGScheduler: Parents of final stage: List()
>>> 17/08/22 14:20:35 INFO DAGScheduler: Missing parents: List()
>>> 17/08/22 14:20:35 INFO DAGScheduler: Submitting ResultStage 5
>>> (MapPartitionsRDD[24] at treeAggregate at LogisticRegression.scala:1892),
>>> which has no missing parents
>>> 17/08/22 14:20:35 INFO MemoryStore: Block broadcast_10 stored as values
>>> in memory (estimated size 2.0 MB, free 354.1 MB)
>>> 17/08/22 14:20:35 INFO MemoryStore: Block broadcast_10_piece0 stored as
>>> bytes in memory (estimated size 24.6 KB, free 354.1 MB)
>>> 17/08/22 14:20:35 INFO BlockManagerInfo: Added broadcast_10_piece0 in
>>> memory on 192.168.25.187:46647 (size: 24.6 KB, free: 366.2 MB)
>>> 17/08/22 14:20:35 INFO SparkContext: Created broadcast 10 from broadcast
>>> at DAGScheduler.scala:1006
>>> 17/08/22 14:20:35 INFO DAGScheduler: Submitting 2 missing tasks from
>>> ResultStage 5 (MapPartitionsRDD[24] at treeAggregate at
>>> LogisticRegression.scala:1892) (first 15 tasks are for partitions Vector(0,
>>> 1))
>>> 17/08/22 14:20:35 INFO TaskSchedulerImpl: Adding task set 5.0 with 2
>>> tasks
>>> 17/08/22 14:20:35 INFO TaskSetManager: Starting task 0.0 in stage 5.0
>>> (TID 10, localhost, executor driver, partition 0, PROCESS_LOCAL, 4883 bytes)
>>> 17/08/22 14:20:35 INFO TaskSetManager: Starting task 1.0 in stage 5.0
>>> (TID 11, localhost, executor driver, partition 1, PROCESS_LOCAL, 4892 bytes)
>>> 17/08/22 14:20:35 INFO Executor: Running task 0.0 in stage 5.0 (TID 10)
>>> 17/08/22 14:20:35 INFO BlockManager: Found block rdd_19_0 locally
>>> 17/08/22 14:20:35 INFO Executor: Running task 1.0 in stage 5.0 (TID 11)
>>> 17/08/22 14:20:35 INFO BlockManager: Found block rdd_19_1 locally
>>> 17/08/22 14:20:35 INFO MemoryStore: Block taskresult_10 stored as bytes
>>> in memory (estimated size 2.0 MB, free 352.0 MB)
>>> 17/08/22 14:20:35 INFO MemoryStore: Block taskresult_11 stored as bytes
>>> in memory (estimated size 2.0 MB, free 350.0 MB)
>>> 17/08/22 14:20:35 INFO BlockManagerInfo: Added taskresult_10 in memory
>>> on 192.168.25.187:46647 (size: 2.0 MB, free: 364.2 MB)
>>> 17/08/22 14:20:35 INFO BlockManagerInfo: Added taskresult_11 in memory
>>> on 192.168.25.187:46647 (size: 2.0 MB, free: 362.2 MB)
>>> 17/08/22 14:20:35 INFO Executor: Finished task 1.0 in stage 5.0 (TID
>>> 11). 2110611 bytes result sent via BlockManager)
>>> 17/08/22 14:20:35 INFO BlockManagerInfo: Removed broadcast_6_piece0 on
>>> 192.168.25.187:46647 in memory (size: 24.6 KB, free: 362.2 MB)
>>> 17/08/22 14:20:35 INFO Executor: Finished task 0.0 in stage 5.0 (TID
>>> 10). 2110611 bytes result sent via BlockManager)
>>> 17/08/22 14:20:36 INFO TaskSetManager: Finished task 1.0 in stage 5.0
>>> (TID 11) in 197 ms on localhost (executor driver) (1/2)
>>> 17/08/22 14:20:36 INFO BlockManagerInfo: Removed taskresult_11 on
>>> 192.168.25.187:46647 in memory (size: 2.0 MB, free: 364.2 MB)
>>> 17/08/22 14:20:36 INFO TaskSetManager: Finished task 0.0 in stage 5.0
>>> (TID 10) in 217 ms on localhost (executor driver) (2/2)
>>> 17/08/22 14:20:36 INFO TaskSchedulerImpl: Removed TaskSet 5.0, whose
>>> tasks have all completed, from pool
>>> 17/08/22 14:20:36 INFO BlockManagerInfo: Removed taskresult_10 on
>>> 192.168.25.187:46647 in memory (size: 2.0 MB, free: 366.2 MB)
>>> 17/08/22 14:20:36 INFO DAGScheduler: ResultStage 5 (treeAggregate at
>>> LogisticRegression.scala:1892) finished in 0.236 s
>>> 17/08/22 14:20:36 INFO BlockManagerInfo: Removed broadcast_8_piece0 on
>>> 192.168.25.187:46647 in memory (size: 24.6 KB, free: 366.2 MB)
>>> 17/08/22 14:20:36 INFO DAGScheduler: Job 5 finished: treeAggregate at
>>> LogisticRegression.scala:1892, took 0.282317 s
>>> 17/08/22 14:20:36 INFO TorrentBroadcast: Destroying Broadcast(9) (from
>>> destroy at LogisticRegression.scala:1933)
>>> 17/08/22 14:20:36 INFO LBFGS: Step Size: 1.000
>>> 17/08/22 14:20:36 INFO LBFGS: Val and Grad Norm: 0.0635321 (rel: 0.562)
>>> 0.0740437
>>> 17/08/22 14:20:36 INFO BlockManagerInfo: Removed broadcast_9_piece0 on
>>> 192.168.25.187:46647 in memory (size: 10.3 KB, free: 366.2 MB)
>>> 17/08/22 14:20:36 INFO MemoryStore: Block broadcast_11 stored as values
>>> in memory (estimated size 2.0 MB, free 358.2 MB)
>>> 17/08/22 14:20:36 INFO MemoryStore: Block broadcast_11_piece0 stored as
>>> bytes in memory (estimated size 10.3 KB, free 358.2 MB)
>>> 17/08/22 14:20:36 INFO BlockManagerInfo: Added broadcast_11_piece0 in
>>> memory on 192.168.25.187:46647 (size: 10.3 KB, free: 366.2 MB)
>>> 17/08/22 14:20:36 INFO SparkContext: Created broadcast 11 from broadcast
>>> at LogisticRegression.scala:1879
>>> 17/08/22 14:20:36 INFO SparkContext: Starting job: treeAggregate at
>>> LogisticRegression.scala:1892
>>> 17/08/22 14:20:36 INFO DAGScheduler: Got job 6 (treeAggregate at
>>> LogisticRegression.scala:1892) with 2 output partitions
>>> 17/08/22 14:20:36 INFO DAGScheduler: Final stage: ResultStage 6
>>> (treeAggregate at LogisticRegression.scala:1892)
>>> 17/08/22 14:20:36 INFO DAGScheduler: Parents of final stage: List()
>>> 17/08/22 14:20:36 INFO DAGScheduler: Missing parents: List()
>>> 17/08/22 14:20:36 INFO DAGScheduler: Submitting ResultStage 6
>>> (MapPartitionsRDD[25] at treeAggregate at LogisticRegression.scala:1892),
>>> which has no missing parents
>>> 17/08/22 14:20:36 INFO MemoryStore: Block broadcast_12 stored as values
>>> in memory (estimated size 2.0 MB, free 356.1 MB)
>>> 17/08/22 14:20:36 INFO MemoryStore: Block broadcast_12_piece0 stored as
>>> bytes in memory (estimated size 24.6 KB, free 356.1 MB)
>>> 17/08/22 14:20:36 INFO BlockManagerInfo: Added broadcast_12_piece0 in
>>> memory on 192.168.25.187:46647 (size: 24.6 KB, free: 366.2 MB)
>>> 17/08/22 14:20:36 INFO SparkContext: Created broadcast 12 from broadcast
>>> at DAGScheduler.scala:1006
>>> 17/08/22 14:20:36 INFO DAGScheduler: Submitting 2 missing tasks from
>>> ResultStage 6 (MapPartitionsRDD[25] at treeAggregate at
>>> LogisticRegression.scala:1892) (first 15 tasks are for partitions Vector(0,
>>> 1))
>>> 17/08/22 14:20:36 INFO TaskSchedulerImpl: Adding task set 6.0 with 2
>>> tasks
>>> 17/08/22 14:20:36 INFO TaskSetManager: Starting task 0.0 in stage 6.0
>>> (TID 12, localhost, executor driver, partition 0, PROCESS_LOCAL, 4883 bytes)
>>> 17/08/22 14:20:36 INFO TaskSetManager: Starting task 1.0 in stage 6.0
>>> (TID 13, localhost, executor driver, partition 1, PROCESS_LOCAL, 4892 bytes)
>>> 17/08/22 14:20:36 INFO Executor: Running task 0.0 in stage 6.0 (TID 12)
>>> 17/08/22 14:20:36 INFO Executor: Running task 1.0 in stage 6.0 (TID 13)
>>> 17/08/22 14:20:36 INFO BlockManager: Found block rdd_19_0 locally
>>> 17/08/22 14:20:36 INFO BlockManager: Found block rdd_19_1 locally
>>> 17/08/22 14:20:36 INFO MemoryStore: Block taskresult_13 stored as bytes
>>> in memory (estimated size 2.0 MB, free 354.1 MB)
>>> 17/08/22 14:20:36 INFO BlockManagerInfo: Added taskresult_13 in memory
>>> on 192.168.25.187:46647 (size: 2.0 MB, free: 364.2 MB)
>>> 17/08/22 14:20:36 INFO MemoryStore: Block taskresult_12 stored as bytes
>>> in memory (estimated size 2.0 MB, free 352.1 MB)
>>> 17/08/22 14:20:36 INFO BlockManagerInfo: Added taskresult_12 in memory
>>> on 192.168.25.187:46647 (size: 2.0 MB, free: 362.2 MB)
>>> 17/08/22 14:20:36 INFO Executor: Finished task 0.0 in stage 6.0 (TID
>>> 12). 2110568 bytes result sent via BlockManager)
>>> 17/08/22 14:20:36 INFO TaskSetManager: Finished task 0.0 in stage 6.0
>>> (TID 12) in 79 ms on localhost (executor driver) (1/2)
>>> 17/08/22 14:20:36 INFO Executor: Finished task 1.0 in stage 6.0 (TID
>>> 13). 2110568 bytes result sent via BlockManager)
>>> 17/08/22 14:20:36 INFO BlockManagerInfo: Removed taskresult_12 on
>>> 192.168.25.187:46647 in memory (size: 2.0 MB, free: 364.2 MB)
>>> 17/08/22 14:20:36 INFO TaskSetManager: Finished task 1.0 in stage 6.0
>>> (TID 13) in 102 ms on localhost (executor driver) (2/2)
>>> 17/08/22 14:20:36 INFO TaskSchedulerImpl: Removed TaskSet 6.0, whose
>>> tasks have all completed, from pool
>>> 17/08/22 14:20:36 INFO DAGScheduler: ResultStage 6 (treeAggregate at
>>> LogisticRegression.scala:1892) finished in 0.106 s
>>> 17/08/22 14:20:36 INFO DAGScheduler: Job 6 finished: treeAggregate at
>>> LogisticRegression.scala:1892, took 0.143539 s
>>> 17/08/22 14:20:36 INFO BlockManagerInfo: Removed taskresult_13 on
>>> 192.168.25.187:46647 in memory (size: 2.0 MB, free: 366.2 MB)
>>> 17/08/22 14:20:36 INFO TorrentBroadcast: Destroying Broadcast(11) (from
>>> destroy at LogisticRegression.scala:1933)
>>> 17/08/22 14:20:36 INFO BlockManagerInfo: Removed broadcast_11_piece0 on
>>> 192.168.25.187:46647 in memory (size: 10.3 KB, free: 366.2 MB)
>>> 17/08/22 14:20:36 INFO LBFGS: Step Size: 1.000
>>> 17/08/22 14:20:36 INFO LBFGS: Val and Grad Norm: 0.0334526 (rel: 0.473)
>>> 0.0350590
>>> 17/08/22 14:20:36 INFO MemoryStore: Block broadcast_13 stored as values
>>> in memory (estimated size 2.0 MB, free 356.1 MB)
>>> 17/08/22 14:20:36 INFO MemoryStore: Block broadcast_13_piece0 stored as
>>> bytes in memory (estimated size 10.3 KB, free 356.1 MB)
>>> 17/08/22 14:20:36 INFO BlockManagerInfo: Added broadcast_13_piece0 in
>>> memory on 192.168.25.187:46647 (size: 10.3 KB, free: 366.2 MB)
>>> 17/08/22 14:20:36 INFO SparkContext: Created broadcast 13 from broadcast
>>> at LogisticRegression.scala:1879
>>> 17/08/22 14:20:36 INFO SparkContext: Starting job: treeAggregate at
>>> LogisticRegression.scala:1892
>>> 17/08/22 14:20:36 INFO DAGScheduler: Got job 7 (treeAggregate at
>>> LogisticRegression.scala:1892) with 2 output partitions
>>> 17/08/22 14:20:36 INFO DAGScheduler: Final stage: ResultStage 7
>>> (treeAggregate at LogisticRegression.scala:1892)
>>> 17/08/22 14:20:36 INFO DAGScheduler: Parents of final stage: List()
>>> 17/08/22 14:20:36 INFO DAGScheduler: Missing parents: List()
>>> 17/08/22 14:20:36 INFO DAGScheduler: Submitting ResultStage 7
>>> (MapPartitionsRDD[26] at treeAggregate at LogisticRegression.scala:1892),
>>> which has no missing parents
>>> 17/08/22 14:20:36 INFO MemoryStore: Block broadcast_14 stored as values
>>> in memory (estimated size 2.0 MB, free 354.1 MB)
>>> 17/08/22 14:20:36 INFO MemoryStore: Block broadcast_14_piece0 stored as
>>> bytes in memory (estimated size 24.6 KB, free 354.1 MB)
>>> 17/08/22 14:20:36 INFO BlockManagerInfo: Added broadcast_14_piece0 in
>>> memory on 192.168.25.187:46647 (size: 24.6 KB, free: 366.2 MB)
>>> 17/08/22 14:20:36 INFO SparkContext: Created broadcast 14 from broadcast
>>> at DAGScheduler.scala:1006
>>> 17/08/22 14:20:36 INFO DAGScheduler: Submitting 2 missing tasks from
>>> ResultStage 7 (MapPartitionsRDD[26] at treeAggregate at
>>> LogisticRegression.scala:1892) (first 15 tasks are for partitions Vector(0,
>>> 1))
>>> 17/08/22 14:20:36 INFO TaskSchedulerImpl: Adding task set 7.0 with 2
>>> tasks
>>> 17/08/22 14:20:36 INFO TaskSetManager: Starting task 0.0 in stage 7.0
>>> (TID 14, localhost, executor driver, partition 0, PROCESS_LOCAL, 4883 bytes)
>>> 17/08/22 14:20:36 INFO TaskSetManager: Starting task 1.0 in stage 7.0
>>> (TID 15, localhost, executor driver, partition 1, PROCESS_LOCAL, 4892 bytes)
>>> 17/08/22 14:20:36 INFO Executor: Running task 0.0 in stage 7.0 (TID 14)
>>> 17/08/22 14:20:36 INFO Executor: Running task 1.0 in stage 7.0 (TID 15)
>>> 17/08/22 14:20:36 INFO BlockManager: Found block rdd_19_0 locally
>>> 17/08/22 14:20:36 INFO BlockManager: Found block rdd_19_1 locally
>>> 17/08/22 14:20:36 INFO MemoryStore: Block taskresult_15 stored as bytes
>>> in memory (estimated size 2.0 MB, free 352.0 MB)
>>> 17/08/22 14:20:36 INFO MemoryStore: Block taskresult_14 stored as bytes
>>> in memory (estimated size 2.0 MB, free 350.0 MB)
>>> 17/08/22 14:20:36 INFO BlockManagerInfo: Added taskresult_15 in memory
>>> on 192.168.25.187:46647 (size: 2.0 MB, free: 364.2 MB)
>>> 17/08/22 14:20:36 INFO Executor: Finished task 1.0 in stage 7.0 (TID
>>> 15). 2110568 bytes result sent via BlockManager)
>>> 17/08/22 14:20:36 INFO BlockManagerInfo: Added taskresult_14 in memory
>>> on 192.168.25.187:46647 (size: 2.0 MB, free: 362.2 MB)
>>> 17/08/22 14:20:36 INFO BlockManagerInfo: Removed broadcast_10_piece0 on
>>> 192.168.25.187:46647 in memory (size: 24.6 KB, free: 362.2 MB)
>>> 17/08/22 14:20:36 INFO Executor: Finished task 0.0 in stage 7.0 (TID
>>> 14). 2110611 bytes result sent via BlockManager)
>>> 17/08/22 14:20:36 INFO BlockManagerInfo: Removed taskresult_15 on
>>> 192.168.25.187:46647 in memory (size: 2.0 MB, free: 364.2 MB)
>>> 17/08/22 14:20:36 INFO TaskSetManager: Finished task 1.0 in stage 7.0
>>> (TID 15) in 387 ms on localhost (executor driver) (1/2)
>>> 17/08/22 14:20:36 INFO BlockManagerInfo: Removed broadcast_1_piece0 on
>>> 192.168.25.187:46647 in memory (size: 24.3 KB, free: 364.2 MB)
>>> 17/08/22 14:20:36 INFO BlockManagerInfo: Removed taskresult_14 on
>>> 192.168.25.187:46647 in memory (size: 2.0 MB, free: 366.2 MB)
>>> 17/08/22 14:20:36 INFO BlockManagerInfo: Removed broadcast_12_piece0 on
>>> 192.168.25.187:46647 in memory (size: 24.6 KB, free: 366.3 MB)
>>> 17/08/22 14:20:36 INFO TaskSetManager: Finished task 0.0 in stage 7.0
>>> (TID 14) in 416 ms on localhost (executor driver) (2/2)
>>> 17/08/22 14:20:36 INFO TaskSchedulerImpl: Removed TaskSet 7.0, whose
>>> tasks have all completed, from pool
>>> 17/08/22 14:20:36 INFO DAGScheduler: ResultStage 7 (treeAggregate at
>>> LogisticRegression.scala:1892) finished in 0.418 s
>>> 17/08/22 14:20:36 INFO DAGScheduler: Job 7 finished: treeAggregate at
>>> LogisticRegression.scala:1892, took 0.459555 s
>>> 17/08/22 14:20:36 INFO TorrentBroadcast: Destroying Broadcast(13) (from
>>> destroy at LogisticRegression.scala:1933)
>>> 17/08/22 14:20:36 INFO BlockManagerInfo: Removed broadcast_13_piece0 on
>>> 192.168.25.187:46647 in memory (size: 10.3 KB, free: 366.3 MB)
>>> 17/08/22 14:20:36 INFO LBFGS: Step Size: 1.000
>>> 17/08/22 14:20:36 INFO LBFGS: Val and Grad Norm: 0.0200470 (rel: 0.401)
>>> 0.0164153
>>> 17/08/22 14:20:36 INFO MemoryStore: Block broadcast_15 stored as values
>>> in memory (estimated size 2.0 MB, free 360.2 MB)
>>> 17/08/22 14:20:36 INFO MemoryStore: Block broadcast_15_piece0 stored as
>>> bytes in memory (estimated size 10.3 KB, free 360.2 MB)
>>> 17/08/22 14:20:36 INFO BlockManagerInfo: Added broadcast_15_piece0 in
>>> memory on 192.168.25.187:46647 (size: 10.3 KB, free: 366.3 MB)
>>> 17/08/22 14:20:36 INFO SparkContext: Created broadcast 15 from broadcast
>>> at LogisticRegression.scala:1879
>>> 17/08/22 14:20:37 INFO SparkContext: Starting job: treeAggregate at
>>> LogisticRegression.scala:1892
>>> 17/08/22 14:20:37 INFO DAGScheduler: Got job 8 (treeAggregate at
>>> LogisticRegression.scala:1892) with 2 output partitions
>>> 17/08/22 14:20:37 INFO DAGScheduler: Final stage: ResultStage 8
>>> (treeAggregate at LogisticRegression.scala:1892)
>>> 17/08/22 14:20:37 INFO DAGScheduler: Parents of final stage: List()
>>> 17/08/22 14:20:37 INFO DAGScheduler: Missing parents: List()
>>> 17/08/22 14:20:37 INFO DAGScheduler: Submitting ResultStage 8
>>> (MapPartitionsRDD[27] at treeAggregate at LogisticRegression.scala:1892),
>>> which has no missing parents
>>> 17/08/22 14:20:37 INFO MemoryStore: Block broadcast_16 stored as values
>>> in memory (estimated size 2.0 MB, free 358.2 MB)
>>> 17/08/22 14:20:37 INFO MemoryStore: Block broadcast_16_piece0 stored as
>>> bytes in memory (estimated size 24.6 KB, free 358.2 MB)
>>> 17/08/22 14:20:37 INFO BlockManagerInfo: Added broadcast_16_piece0 in
>>> memory on 192.168.25.187:46647 (size: 24.6 KB, free: 366.2 MB)
>>> 17/08/22 14:20:37 INFO SparkContext: Created broadcast 16 from broadcast
>>> at DAGScheduler.scala:1006
>>> 17/08/22 14:20:37 INFO DAGScheduler: Submitting 2 missing tasks from
>>> ResultStage 8 (MapPartitionsRDD[27] at treeAggregate at
>>> LogisticRegression.scala:1892) (first 15 tasks are for partitions Vector(0,
>>> 1))
>>> 17/08/22 14:20:37 INFO TaskSchedulerImpl: Adding task set 8.0 with 2
>>> tasks
>>> 17/08/22 14:20:37 INFO TaskSetManager: Starting task 0.0 in stage 8.0
>>> (TID 16, localhost, executor driver, partition 0, PROCESS_LOCAL, 4883 bytes)
>>> 17/08/22 14:20:37 INFO TaskSetManager: Starting task 1.0 in stage 8.0
>>> (TID 17, localhost, executor driver, partition 1, PROCESS_LOCAL, 4892 bytes)
>>> 17/08/22 14:20:37 INFO Executor: Running task 0.0 in stage 8.0 (TID 16)
>>> 17/08/22 14:20:37 INFO Executor: Running task 1.0 in stage 8.0 (TID 17)
>>> 17/08/22 14:20:37 INFO BlockManager: Found block rdd_19_0 locally
>>> 17/08/22 14:20:37 INFO BlockManager: Found block rdd_19_1 locally
>>> 17/08/22 14:20:37 INFO MemoryStore: Block taskresult_16 stored as bytes
>>> in memory (estimated size 2.0 MB, free 356.2 MB)
>>> 17/08/22 14:20:37 INFO BlockManagerInfo: Added taskresult_16 in memory
>>> on 192.168.25.187:46647 (size: 2.0 MB, free: 364.2 MB)
>>> 17/08/22 14:20:37 INFO Executor: Finished task 0.0 in stage 8.0 (TID
>>> 16). 2110568 bytes result sent via BlockManager)
>>> 17/08/22 14:20:37 INFO MemoryStore: Block taskresult_17 stored as bytes
>>> in memory (estimated size 2.0 MB, free 354.1 MB)
>>> 17/08/22 14:20:37 INFO BlockManagerInfo: Added taskresult_17 in memory
>>> on 192.168.25.187:46647 (size: 2.0 MB, free: 362.2 MB)
>>> 17/08/22 14:20:37 INFO BlockManagerInfo: Removed taskresult_16 on
>>> 192.168.25.187:46647 in memory (size: 2.0 MB, free: 364.2 MB)
>>> 17/08/22 14:20:37 INFO Executor: Finished task 1.0 in stage 8.0 (TID
>>> 17). 2110568 bytes result sent via BlockManager)
>>> 17/08/22 14:20:37 INFO TaskSetManager: Finished task 0.0 in stage 8.0
>>> (TID 16) in 67 ms on localhost (executor driver) (1/2)
>>> 17/08/22 14:20:37 INFO BlockManagerInfo: Removed taskresult_17 on
>>> 192.168.25.187:46647 in memory (size: 2.0 MB, free: 366.2 MB)
>>> 17/08/22 14:20:37 INFO TaskSetManager: Finished task 1.0 in stage 8.0
>>> (TID 17) in 110 ms on localhost (executor driver) (2/2)
>>> 17/08/22 14:20:37 INFO TaskSchedulerImpl: Removed TaskSet 8.0, whose
>>> tasks have all completed, from pool
>>> 17/08/22 14:20:37 INFO DAGScheduler: ResultStage 8 (treeAggregate at
>>> LogisticRegression.scala:1892) finished in 0.118 s
>>> 17/08/22 14:20:37 INFO DAGScheduler: Job 8 finished: treeAggregate at
>>> LogisticRegression.scala:1892, took 0.149447 s
>>> 17/08/22 14:20:37 INFO TorrentBroadcast: Destroying Broadcast(15) (from
>>> destroy at LogisticRegression.scala:1933)
>>> 17/08/22 14:20:37 INFO BlockManagerInfo: Removed broadcast_15_piece0 on
>>> 192.168.25.187:46647 in memory (size: 10.3 KB, free: 366.2 MB)
>>> 17/08/22 14:20:37 INFO LBFGS: Step Size: 1.000
>>> 17/08/22 14:20:37 INFO LBFGS: Val and Grad Norm: 0.0146788 (rel: 0.268)
>>> 0.00726759
>>> 17/08/22 14:20:37 INFO MemoryStore: Block broadcast_17 stored as values
>>> in memory (estimated size 2.0 MB, free 358.2 MB)
>>> 17/08/22 14:20:37 INFO MemoryStore: Block broadcast_17_piece0 stored as
>>> bytes in memory (estimated size 10.3 KB, free 358.2 MB)
>>> 17/08/22 14:20:37 INFO BlockManagerInfo: Added broadcast_17_piece0 in
>>> memory on 192.168.25.187:46647 (size: 10.3 KB, free: 366.2 MB)
>>> 17/08/22 14:20:37 INFO SparkContext: Created broadcast 17 from broadcast
>>> at LogisticRegression.scala:1879
>>> 17/08/22 14:20:37 INFO SparkContext: Starting job: treeAggregate at
>>> LogisticRegression.scala:1892
>>> 17/08/22 14:20:37 INFO DAGScheduler: Got job 9 (treeAggregate at
>>> LogisticRegression.scala:1892) with 2 output partitions
>>> 17/08/22 14:20:37 INFO DAGScheduler: Final stage: ResultStage 9
>>> (treeAggregate at LogisticRegression.scala:1892)
>>> 17/08/22 14:20:37 INFO DAGScheduler: Parents of final stage: List()
>>> 17/08/22 14:20:37 INFO DAGScheduler: Missing parents: List()
>>> 17/08/22 14:20:37 INFO DAGScheduler: Submitting ResultStage 9
>>> (MapPartitionsRDD[28] at treeAggregate at LogisticRegression.scala:1892),
>>> which has no missing parents
>>> 17/08/22 14:20:37 INFO MemoryStore: Block broadcast_18 stored as values
>>> in memory (estimated size 2.0 MB, free 356.1 MB)
>>> 17/08/22 14:20:37 INFO MemoryStore: Block broadcast_18_piece0 stored as
>>> bytes in memory (estimated size 24.6 KB, free 356.1 MB)
>>> 17/08/22 14:20:37 INFO BlockManagerInfo: Added broadcast_18_piece0 in
>>> memory on 192.168.25.187:46647 (size: 24.6 KB, free: 366.2 MB)
>>> 17/08/22 14:20:37 INFO SparkContext: Created broadcast 18 from broadcast
>>> at DAGScheduler.scala:1006
>>> 17/08/22 14:20:37 INFO DAGScheduler: Submitting 2 missing tasks from
>>> ResultStage 9 (MapPartitionsRDD[28] at treeAggregate at
>>> LogisticRegression.scala:1892) (first 15 tasks are for partitions Vector(0,
>>> 1))
>>> 17/08/22 14:20:37 INFO TaskSchedulerImpl: Adding task set 9.0 with 2
>>> tasks
>>> 17/08/22 14:20:37 INFO TaskSetManager: Starting task 0.0 in stage 9.0
>>> (TID 18, localhost, executor driver, partition 0, PROCESS_LOCAL, 4883 bytes)
>>> 17/08/22 14:20:37 INFO TaskSetManager: Starting task 1.0 in stage 9.0
>>> (TID 19, localhost, executor driver, partition 1, PROCESS_LOCAL, 4892 bytes)
>>> 17/08/22 14:20:37 INFO Executor: Running task 0.0 in stage 9.0 (TID 18)
>>> 17/08/22 14:20:37 INFO Executor: Running task 1.0 in stage 9.0 (TID 19)
>>> 17/08/22 14:20:37 INFO BlockManager: Found block rdd_19_1 locally
>>> 17/08/22 14:20:37 INFO MemoryStore: Block taskresult_19 stored as bytes
>>> in memory (estimated size 2.0 MB, free 354.1 MB)
>>> 17/08/22 14:20:37 INFO BlockManager: Found block rdd_19_0 locally
>>> 17/08/22 14:20:37 INFO BlockManagerInfo: Added taskresult_19 in memory
>>> on 192.168.25.187:46647 (size: 2.0 MB, free: 364.2 MB)
>>> 17/08/22 14:20:37 INFO BlockManagerInfo: Removed broadcast_16_piece0 on
>>> 192.168.25.187:46647 in memory (size: 24.6 KB, free: 364.2 MB)
>>> 17/08/22 14:20:37 INFO Executor: Finished task 1.0 in stage 9.0 (TID
>>> 19). 2110568 bytes result sent via BlockManager)
>>> 17/08/22 14:20:37 INFO MemoryStore: Block taskresult_18 stored as bytes
>>> in memory (estimated size 2.0 MB, free 354.1 MB)
>>> 17/08/22 14:20:37 INFO BlockManagerInfo: Added taskresult_18 in memory
>>> on 192.168.25.187:46647 (size: 2.0 MB, free: 362.2 MB)
>>> 17/08/22 14:20:37 INFO BlockManagerInfo: Removed taskresult_19 on
>>> 192.168.25.187:46647 in memory (size: 2.0 MB, free: 364.2 MB)
>>> 17/08/22 14:20:37 INFO TaskSetManager: Finished task 1.0 in stage 9.0
>>> (TID 19) in 197 ms on localhost (executor driver) (1/2)
>>> 17/08/22 14:20:37 INFO Executor: Finished task 0.0 in stage 9.0 (TID
>>> 18). 2110611 bytes result sent via BlockManager)
>>> 17/08/22 14:20:37 INFO BlockManagerInfo: Removed taskresult_18 on
>>> 192.168.25.187:46647 in memory (size: 2.0 MB, free: 366.2 MB)
>>> 17/08/22 14:20:37 INFO TaskSetManager: Finished task 0.0 in stage 9.0
>>> (TID 18) in 233 ms on localhost (executor driver) (2/2)
>>> 17/08/22 14:20:37 INFO TaskSchedulerImpl: Removed TaskSet 9.0, whose
>>> tasks have all completed, from pool
>>> 17/08/22 14:20:37 INFO DAGScheduler: ResultStage 9 (treeAggregate at
>>> LogisticRegression.scala:1892) finished in 0.236 s
>>> 17/08/22 14:20:37 INFO DAGScheduler: Job 9 finished: treeAggregate at
>>> LogisticRegression.scala:1892, took 0.274679 s
>>> 17/08/22 14:20:37 INFO TorrentBroadcast: Destroying Broadcast(17) (from
>>> destroy at LogisticRegression.scala:1933)
>>> 17/08/22 14:20:37 INFO BlockManagerInfo: Removed broadcast_17_piece0 on
>>> 192.168.25.187:46647 in memory (size: 10.3 KB, free: 366.2 MB)
>>> 17/08/22 14:20:37 INFO LBFGS: Step Size: 1.000
>>> 17/08/22 14:20:37 INFO LBFGS: Val and Grad Norm: 0.0127666 (rel: 0.130)
>>> 0.00298415
>>> 17/08/22 14:20:37 INFO MemoryStore: Block broadcast_19 stored as values
>>> in memory (estimated size 2.0 MB, free 358.2 MB)
>>> 17/08/22 14:20:37 INFO MemoryStore: Block broadcast_19_piece0 stored as
>>> bytes in memory (estimated size 10.3 KB, free 358.2 MB)
>>> 17/08/22 14:20:37 INFO BlockManagerInfo: Added broadcast_19_piece0 in
>>> memory on 192.168.25.187:46647 (size: 10.3 KB, free: 366.2 MB)
>>> 17/08/22 14:20:37 INFO SparkContext: Created broadcast 19 from broadcast
>>> at LogisticRegression.scala:1879
>>> 17/08/22 14:20:37 INFO SparkContext: Starting job: treeAggregate at
>>> LogisticRegression.scala:1892
>>> 17/08/22 14:20:37 INFO DAGScheduler: Got job 10 (treeAggregate at
>>> LogisticRegression.scala:1892) with 2 output partitions
>>> 17/08/22 14:20:37 INFO DAGScheduler: Final stage: ResultStage 10
>>> (treeAggregate at LogisticRegression.scala:1892)
>>> 17/08/22 14:20:37 INFO DAGScheduler: Parents of final stage: List()
>>> 17/08/22 14:20:37 INFO DAGScheduler: Missing parents: List()
>>> 17/08/22 14:20:37 INFO DAGScheduler: Submitting ResultStage 10
>>> (MapPartitionsRDD[29] at treeAggregate at LogisticRegression.scala:1892),
>>> which has no missing parents
>>> 17/08/22 14:20:37 INFO MemoryStore: Block broadcast_20 stored as values
>>> in memory (estimated size 2.0 MB, free 356.1 MB)
>>> 17/08/22 14:20:37 INFO MemoryStore: Block broadcast_20_piece0 stored as
>>> bytes in memory (estimated size 24.6 KB, free 356.1 MB)
>>> 17/08/22 14:20:37 INFO BlockManagerInfo: Added broadcast_20_piece0 in
>>> memory on 192.168.25.187:46647 (size: 24.6 KB, free: 366.2 MB)
>>> 17/08/22 14:20:37 INFO SparkContext: Created broadcast 20 from broadcast
>>> at DAGScheduler.scala:1006
>>> 17/08/22 14:20:37 INFO DAGScheduler: Submitting 2 missing tasks from
>>> ResultStage 10 (MapPartitionsRDD[29] at treeAggregate at
>>> LogisticRegression.scala:1892) (first 15 tasks are for partitions Vector(0,
>>> 1))
>>> 17/08/22 14:20:37 INFO TaskSchedulerImpl: Adding task set 10.0 with 2
>>> tasks
>>> 17/08/22 14:20:37 INFO TaskSetManager: Starting task 0.0 in stage 10.0
>>> (TID 20, localhost, executor driver, partition 0, PROCESS_LOCAL, 4883 bytes)
>>> 17/08/22 14:20:37 INFO TaskSetManager: Starting task 1.0 in stage 10.0
>>> (TID 21, localhost, executor driver, partition 1, PROCESS_LOCAL, 4892 bytes)
>>> 17/08/22 14:20:37 INFO Executor: Running task 0.0 in stage 10.0 (TID 20)
>>> 17/08/22 14:20:37 INFO Executor: Running task 1.0 in stage 10.0 (TID 21)
>>> 17/08/22 14:20:37 INFO BlockManager: Found block rdd_19_1 locally
>>> 17/08/22 14:20:37 INFO MemoryStore: Block taskresult_21 stored as bytes
>>> in memory (estimated size 2.0 MB, free 354.1 MB)
>>> 17/08/22 14:20:37 INFO BlockManagerInfo: Added taskresult_21 in memory
>>> on 192.168.25.187:46647 (size: 2.0 MB, free: 364.2 MB)
>>> 17/08/22 14:20:37 INFO BlockManager: Found block rdd_19_0 locally
>>> 17/08/22 14:20:37 INFO MemoryStore: Block taskresult_20 stored as bytes
>>> in memory (estimated size 2.0 MB, free 352.1 MB)
>>> 17/08/22 14:20:37 INFO BlockManagerInfo: Added taskresult_20 in memory
>>> on 192.168.25.187:46647 (size: 2.0 MB, free: 362.2 MB)
>>> 17/08/22 14:20:37 INFO Executor: Finished task 1.0 in stage 10.0 (TID
>>> 21). 2110568 bytes result sent via BlockManager)
>>> 17/08/22 14:20:37 INFO Executor: Finished task 0.0 in stage 10.0 (TID
>>> 20). 2110568 bytes result sent via BlockManager)
>>> 17/08/22 14:20:37 INFO BlockManagerInfo: Removed taskresult_21 on
>>> 192.168.25.187:46647 in memory (size: 2.0 MB, free: 364.2 MB)
>>> 17/08/22 14:20:37 INFO TaskSetManager: Finished task 1.0 in stage 10.0
>>> (TID 21) in 95 ms on localhost (executor driver) (1/2)
>>> 17/08/22 14:20:37 INFO BlockManagerInfo: Removed taskresult_20 on
>>> 192.168.25.187:46647 in memory (size: 2.0 MB, free: 366.2 MB)
>>> 17/08/22 14:20:37 INFO TaskSetManager: Finished task 0.0 in stage 10.0
>>> (TID 20) in 115 ms on localhost (executor driver) (2/2)
>>> 17/08/22 14:20:37 INFO TaskSchedulerImpl: Removed TaskSet 10.0, whose
>>> tasks have all completed, from pool
>>> 17/08/22 14:20:37 INFO DAGScheduler: ResultStage 10 (treeAggregate at
>>> LogisticRegression.scala:1892) finished in 0.118 s
>>> 17/08/22 14:20:37 INFO DAGScheduler: Job 10 finished: treeAggregate at
>>> LogisticRegression.scala:1892, took 0.143837 s
>>> 17/08/22 14:20:37 INFO TorrentBroadcast: Destroying Broadcast(19) (from
>>> destroy at LogisticRegression.scala:1933)
>>> 17/08/22 14:20:37 INFO BlockManagerInfo: Removed broadcast_19_piece0 on
>>> 192.168.25.187:46647 in memory (size: 10.3 KB, free: 366.2 MB)
>>> 17/08/22 14:20:37 INFO LBFGS: Step Size: 1.000
>>> 17/08/22 14:20:37 INFO LBFGS: Val and Grad Norm: 0.0122267 (rel: 0.0423)
>>> 0.00119244
>>> 17/08/22 14:20:37 INFO MemoryStore: Block broadcast_21 stored as values
>>> in memory (estimated size 2.0 MB, free 356.1 MB)
>>> 17/08/22 14:20:37 INFO MemoryStore: Block broadcast_21_piece0 stored as
>>> bytes in memory (estimated size 10.3 KB, free 356.1 MB)
>>> 17/08/22 14:20:37 INFO BlockManagerInfo: Added broadcast_21_piece0 in
>>> memory on 192.168.25.187:46647 (size: 10.3 KB, free: 366.2 MB)
>>> 17/08/22 14:20:37 INFO SparkContext: Created broadcast 21 from broadcast
>>> at LogisticRegression.scala:1879
>>> 17/08/22 14:20:37 INFO SparkContext: Starting job: treeAggregate at
>>> LogisticRegression.scala:1892
>>> 17/08/22 14:20:37 INFO DAGScheduler: Got job 11 (treeAggregate at
>>> LogisticRegression.scala:1892) with 2 output partitions
>>> 17/08/22 14:20:37 INFO DAGScheduler: Final stage: ResultStage 11
>>> (treeAggregate at LogisticRegression.scala:1892)
>>> 17/08/22 14:20:37 INFO DAGScheduler: Parents of final stage: List()
>>> 17/08/22 14:20:37 INFO DAGScheduler: Missing parents: List()
>>> 17/08/22 14:20:37 INFO DAGScheduler: Submitting ResultStage 11
>>> (MapPartitionsRDD[30] at treeAggregate at LogisticRegression.scala:1892),
>>> which has no missing parents
>>> 17/08/22 14:20:37 INFO MemoryStore: Block broadcast_22 stored as values
>>> in memory (estimated size 2.0 MB, free 354.1 MB)
>>> 17/08/22 14:20:37 INFO MemoryStore: Block broadcast_22_piece0 stored as
>>> bytes in memory (estimated size 24.6 KB, free 354.1 MB)
>>> 17/08/22 14:20:37 INFO BlockManagerInfo: Added broadcast_22_piece0 in
>>> memory on 192.168.25.187:46647 (size: 24.6 KB, free: 366.2 MB)
>>> 17/08/22 14:20:38 INFO SparkContext: Created broadcast 22 from broadcast
>>> at DAGScheduler.scala:1006
>>> 17/08/22 14:20:38 INFO DAGScheduler: Submitting 2 missing tasks from
>>> ResultStage 11 (MapPartitionsRDD[30] at treeAggregate at
>>> LogisticRegression.scala:1892) (first 15 tasks are for partitions Vector(0,
>>> 1))
>>> 17/08/22 14:20:38 INFO TaskSchedulerImpl: Adding task set 11.0 with 2
>>> tasks
>>> 17/08/22 14:20:38 INFO TaskSetManager: Starting task 0.0 in stage 11.0
>>> (TID 22, localhost, executor driver, partition 0, PROCESS_LOCAL, 4883 bytes)
>>> 17/08/22 14:20:38 INFO TaskSetManager: Starting task 1.0 in stage 11.0
>>> (TID 23, localhost, executor driver, partition 1, PROCESS_LOCAL, 4892 bytes)
>>> 17/08/22 14:20:38 INFO Executor: Running task 0.0 in stage 11.0 (TID 22)
>>> 17/08/22 14:20:38 INFO Executor: Running task 1.0 in stage 11.0 (TID 23)
>>> 17/08/22 14:20:38 INFO BlockManager: Found block rdd_19_1 locally
>>> 17/08/22 14:20:38 INFO BlockManager: Found block rdd_19_0 locally
>>> 17/08/22 14:20:38 INFO MemoryStore: Block taskresult_23 stored as bytes
>>> in memory (estimated size 2.0 MB, free 352.0 MB)
>>> 17/08/22 14:20:38 INFO BlockManagerInfo: Added taskresult_23 in memory
>>> on 192.168.25.187:46647 (size: 2.0 MB, free: 364.2 MB)
>>> 17/08/22 14:20:38 INFO Executor: Finished task 1.0 in stage 11.0 (TID
>>> 23). 2110568 bytes result sent via BlockManager)
>>> 17/08/22 14:20:38 INFO TaskSetManager: Finished task 1.0 in stage 11.0
>>> (TID 23) in 85 ms on localhost (executor driver) (1/2)
>>> 17/08/22 14:20:38 INFO BlockManagerInfo: Removed broadcast_18_piece0 on
>>> 192.168.25.187:46647 in memory (size: 24.6 KB, free: 364.2 MB)
>>> 17/08/22 14:20:38 INFO BlockManagerInfo: Removed taskresult_23 on
>>> 192.168.25.187:46647 in memory (size: 2.0 MB, free: 366.2 MB)
>>> 17/08/22 14:20:38 INFO BlockManagerInfo: Removed broadcast_20_piece0 on
>>> 192.168.25.187:46647 in memory (size: 24.6 KB, free: 366.2 MB)
>>> 17/08/22 14:20:38 INFO MemoryStore: Block taskresult_22 stored as bytes
>>> in memory (estimated size 2.0 MB, free 356.2 MB)
>>> 17/08/22 14:20:38 INFO BlockManagerInfo: Added taskresult_22 in memory
>>> on 192.168.25.187:46647 (size: 2.0 MB, free: 364.2 MB)
>>> 17/08/22 14:20:38 INFO Executor: Finished task 0.0 in stage 11.0 (TID
>>> 22). 2110611 bytes result sent via BlockManager)
>>> 17/08/22 14:20:38 INFO TaskSetManager: Finished task 0.0 in stage 11.0
>>> (TID 22) in 118 ms on localhost (executor driver) (2/2)
>>> 17/08/22 14:20:38 INFO TaskSchedulerImpl: Removed TaskSet 11.0, whose
>>> tasks have all completed, from pool
>>> 17/08/22 14:20:38 INFO DAGScheduler: ResultStage 11 (treeAggregate at
>>> LogisticRegression.scala:1892) finished in 0.122 s
>>> 17/08/22 14:20:38 INFO BlockManagerInfo: Removed taskresult_22 on
>>> 192.168.25.187:46647 in memory (size: 2.0 MB, free: 366.2 MB)
>>> 17/08/22 14:20:38 INFO DAGScheduler: Job 11 finished: treeAggregate at
>>> LogisticRegression.scala:1892, took 0.143423 s
>>> 17/08/22 14:20:38 INFO TorrentBroadcast: Destroying Broadcast(21) (from
>>> destroy at LogisticRegression.scala:1933)
>>> 17/08/22 14:20:38 INFO BlockManagerInfo: Removed broadcast_21_piece0 on
>>> 192.168.25.187:46647 in memory (size: 10.3 KB, free: 366.2 MB)
>>> 17/08/22 14:20:38 INFO LBFGS: Step Size: 1.000
>>> 17/08/22 14:20:38 INFO LBFGS: Val and Grad Norm: 0.0120434 (rel: 0.0150)
>>> 0.000857606
>>> 17/08/22 14:20:38 INFO MemoryStore: Block broadcast_23 stored as values
>>> in memory (estimated size 2.0 MB, free 358.2 MB)
>>> 17/08/22 14:20:38 INFO MemoryStore: Block broadcast_23_piece0 stored as
>>> bytes in memory (estimated size 10.3 KB, free 358.2 MB)
>>> 17/08/22 14:20:38 INFO BlockManagerInfo: Added broadcast_23_piece0 in
>>> memory on 192.168.25.187:46647 (size: 10.3 KB, free: 366.2 MB)
>>> 17/08/22 14:20:38 INFO SparkContext: Created broadcast 23 from broadcast
>>> at LogisticRegression.scala:1879
>>> 17/08/22 14:20:38 INFO SparkContext: Starting job: treeAggregate at
>>> LogisticRegression.scala:1892
>>> 17/08/22 14:20:38 INFO DAGScheduler: Got job 12 (treeAggregate at
>>> LogisticRegression.scala:1892) with 2 output partitions
>>> 17/08/22 14:20:38 INFO DAGScheduler: Final stage: ResultStage 12
>>> (treeAggregate at LogisticRegression.scala:1892)
>>> 17/08/22 14:20:38 INFO DAGScheduler: Parents of final stage: List()
>>> 17/08/22 14:20:38 INFO DAGScheduler: Missing parents: List()
>>> 17/08/22 14:20:38 INFO DAGScheduler: Submitting ResultStage 12
>>> (MapPartitionsRDD[31] at treeAggregate at LogisticRegression.scala:1892),
>>> which has no missing parents
>>> 17/08/22 14:20:38 INFO MemoryStore: Block broadcast_24 stored as values
>>> in memory (estimated size 2.0 MB, free 356.1 MB)
>>> 17/08/22 14:20:38 INFO MemoryStore: Block broadcast_24_piece0 stored as
>>> bytes in memory (estimated size 24.6 KB, free 356.1 MB)
>>> 17/08/22 14:20:38 INFO BlockManagerInfo: Added broadcast_24_piece0 in
>>> memory on 192.168.25.187:46647 (size: 24.6 KB, free: 366.2 MB)
>>> 17/08/22 14:20:38 INFO SparkContext: Created broadcast 24 from broadcast
>>> at DAGScheduler.scala:1006
>>> 17/08/22 14:20:38 INFO DAGScheduler: Submitting 2 missing tasks from
>>> ResultStage 12 (MapPartitionsRDD[31] at treeAggregate at
>>> LogisticRegression.scala:1892) (first 15 tasks are for partitions Vector(0,
>>> 1))
>>> 17/08/22 14:20:38 INFO TaskSchedulerImpl: Adding task set 12.0 with 2
>>> tasks
>>> 17/08/22 14:20:38 INFO TaskSetManager: Starting task 0.0 in stage 12.0
>>> (TID 24, localhost, executor driver, partition 0, PROCESS_LOCAL, 4883 bytes)
>>> 17/08/22 14:20:38 INFO TaskSetManager: Starting task 1.0 in stage 12.0
>>> (TID 25, localhost, executor driver, partition 1, PROCESS_LOCAL, 4892 bytes)
>>> 17/08/22 14:20:38 INFO Executor: Running task 0.0 in stage 12.0 (TID 24)
>>> 17/08/22 14:20:38 INFO Executor: Running task 1.0 in stage 12.0 (TID 25)
>>> 17/08/22 14:20:38 INFO BlockManager: Found block rdd_19_0 locally
>>> 17/08/22 14:20:38 INFO MemoryStore: Block taskresult_24 stored as bytes
>>> in memory (estimated size 2.0 MB, free 354.1 MB)
>>> 17/08/22 14:20:38 INFO BlockManagerInfo: Added taskresult_24 in memory
>>> on 192.168.25.187:46647 (size: 2.0 MB, free: 364.2 MB)
>>> 17/08/22 14:20:38 INFO Executor: Finished task 0.0 in stage 12.0 (TID
>>> 24). 2110568 bytes result sent via BlockManager)
>>> 17/08/22 14:20:38 INFO BlockManager: Found block rdd_19_1 locally
>>> 17/08/22 14:20:38 INFO MemoryStore: Block taskresult_25 stored as bytes
>>> in memory (estimated size 2.0 MB, free 352.1 MB)
>>> 17/08/22 14:20:38 INFO TaskSetManager: Finished task 0.0 in stage 12.0
>>> (TID 24) in 57 ms on localhost (executor driver) (1/2)
>>> 17/08/22 14:20:38 INFO BlockManagerInfo: Added taskresult_25 in memory
>>> on 192.168.25.187:46647 (size: 2.0 MB, free: 362.2 MB)
>>> 17/08/22 14:20:38 INFO Executor: Finished task 1.0 in stage 12.0 (TID
>>> 25). 2110568 bytes result sent via BlockManager)
>>> 17/08/22 14:20:38 INFO BlockManagerInfo: Removed taskresult_24 on
>>> 192.168.25.187:46647 in memory (size: 2.0 MB, free: 364.2 MB)
>>> 17/08/22 14:20:38 INFO TaskSetManager: Finished task 1.0 in stage 12.0
>>> (TID 25) in 94 ms on localhost (executor driver) (2/2)
>>> 17/08/22 14:20:38 INFO TaskSchedulerImpl: Removed TaskSet 12.0, whose
>>> tasks have all completed, from pool
>>> 17/08/22 14:20:38 INFO BlockManagerInfo: Removed taskresult_25 on
>>> 192.168.25.187:46647 in memory (size: 2.0 MB, free: 366.2 MB)
>>> 17/08/22 14:20:38 INFO DAGScheduler: ResultStage 12 (treeAggregate at
>>> LogisticRegression.scala:1892) finished in 0.098 s
>>> 17/08/22 14:20:38 INFO DAGScheduler: Job 12 finished: treeAggregate at
>>> LogisticRegression.scala:1892, took 0.125965 s
>>> 17/08/22 14:20:38 INFO TorrentBroadcast: Destroying Broadcast(23) (from
>>> destroy at LogisticRegression.scala:1933)
>>> 17/08/22 14:20:38 INFO BlockManagerInfo: Removed broadcast_23_piece0 on
>>> 192.168.25.187:46647 in memory (size: 10.3 KB, free: 366.2 MB)
>>> 17/08/22 14:20:38 INFO LBFGS: Step Size: 1.000
>>> 17/08/22 14:20:38 INFO LBFGS: Val and Grad Norm: 0.0118476 (rel: 0.0163)
>>> 0.00102213
>>> 17/08/22 14:20:38 INFO LBFGS: Converged because max iterations reached
>>> 17/08/22 14:20:38 INFO TorrentBroadcast: Destroying Broadcast(2) (from
>>> destroy at LogisticRegression.scala:796)
>>> 17/08/22 14:20:38 INFO BlockManagerInfo: Removed broadcast_2_piece0 on
>>> 192.168.25.187:46647 in memory (size: 10.2 KB, free: 366.2 MB)
>>> 17/08/22 14:20:38 INFO MapPartitionsRDD: Removing RDD 19 from
>>> persistence list
>>> 17/08/22 14:20:38 INFO BlockManager: Removing RDD 19
>>> 17/08/22 14:20:38 INFO CodeGenerator: Code generated in 69.50928 ms
>>> 17/08/22 14:20:38 INFO BlockManagerInfo: Removed broadcast_24_piece0 on
>>> 192.168.25.187:46647 in memory (size: 24.6 KB, free: 366.3 MB)
>>> 17/08/22 14:20:39 INFO BlockManagerInfo: Removed broadcast_22_piece0 on
>>> 192.168.25.187:46647 in memory (size: 24.6 KB, free: 366.3 MB)
>>> 17/08/22 14:20:39 INFO Instrumentation: LogisticRegression-LogisticReg
>>> ression_4647a03eff9e40a729dc-1694037276-1: training finished
>>> 17/08/22 14:20:39 INFO SparkContext: Starting job: treeAggregate at
>>> IDF.scala:54
>>> 17/08/22 14:20:39 INFO DAGScheduler: Got job 13 (treeAggregate at
>>> IDF.scala:54) with 2 output partitions
>>> 17/08/22 14:20:39 INFO DAGScheduler: Final stage: ResultStage 13
>>> (treeAggregate at IDF.scala:54)
>>> 17/08/22 14:20:39 INFO DAGScheduler: Parents of final stage: List()
>>> 17/08/22 14:20:39 INFO DAGScheduler: Missing parents: List()
>>> 17/08/22 14:20:39 INFO DAGScheduler: Submitting ResultStage 13
>>> (MapPartitionsRDD[42] at treeAggregate at IDF.scala:54), which has no
>>> missing parents
>>> 17/08/22 14:20:39 INFO MemoryStore: Block broadcast_25 stored as values
>>> in memory (estimated size 26.0 KB, free 364.2 MB)
>>> 17/08/22 14:20:39 INFO MemoryStore: Block broadcast_25_piece0 stored as
>>> bytes in memory (estimated size 11.6 KB, free 364.2 MB)
>>> 17/08/22 14:20:39 INFO BlockManagerInfo: Added broadcast_25_piece0 in
>>> memory on 192.168.25.187:46647 (size: 11.6 KB, free: 366.3 MB)
>>> 17/08/22 14:20:39 INFO SparkContext: Created broadcast 25 from broadcast
>>> at DAGScheduler.scala:1006
>>> 17/08/22 14:20:39 INFO DAGScheduler: Submitting 2 missing tasks from
>>> ResultStage 13 (MapPartitionsRDD[42] at treeAggregate at IDF.scala:54)
>>> (first 15 tasks are for partitions Vector(0, 1))
>>> 17/08/22 14:20:39 INFO TaskSchedulerImpl: Adding task set 13.0 with 2
>>> tasks
>>> 17/08/22 14:20:39 INFO TaskSetManager: Starting task 0.0 in stage 13.0
>>> (TID 26, localhost, executor driver, partition 0, PROCESS_LOCAL, 4883 bytes)
>>> 17/08/22 14:20:39 INFO TaskSetManager: Starting task 1.0 in stage 13.0
>>> (TID 27, localhost, executor driver, partition 1, PROCESS_LOCAL, 4892 bytes)
>>> 17/08/22 14:20:39 INFO Executor: Running task 0.0 in stage 13.0 (TID 26)
>>> 17/08/22 14:20:39 INFO Executor: Running task 1.0 in stage 13.0 (TID 27)
>>> 17/08/22 14:20:39 INFO PythonRunner: Times: total = 52, boot = -6281,
>>> init = 6333, finish = 0
>>> 17/08/22 14:20:39 INFO PythonRunner: Times: total = 58, boot = -6294,
>>> init = 6351, finish = 1
>>> 17/08/22 14:20:39 INFO MemoryStore: Block taskresult_26 stored as bytes
>>> in memory (estimated size 2.0 MB, free 362.2 MB)
>>> 17/08/22 14:20:39 INFO BlockManagerInfo: Added taskresult_26 in memory
>>> on 192.168.25.187:46647 (size: 2.0 MB, free: 364.3 MB)
>>> 17/08/22 14:20:39 INFO MemoryStore: Block taskresult_27 stored as bytes
>>> in memory (estimated size 2.0 MB, free 360.2 MB)
>>> 17/08/22 14:20:39 INFO BlockManagerInfo: Added taskresult_27 in memory
>>> on 192.168.25.187:46647 (size: 2.0 MB, free: 362.2 MB)
>>> 17/08/22 14:20:39 INFO Executor: Finished task 1.0 in stage 13.0 (TID
>>> 27). 2109342 bytes result sent via BlockManager)
>>> 17/08/22 14:20:39 INFO Executor: Finished task 0.0 in stage 13.0 (TID
>>> 26). 2109342 bytes result sent via BlockManager)
>>> 17/08/22 14:20:39 INFO TaskSetManager: Finished task 0.0 in stage 13.0
>>> (TID 26) in 156 ms on localhost (executor driver) (1/2)
>>> 17/08/22 14:20:39 INFO BlockManagerInfo: Removed taskresult_26 on
>>> 192.168.25.187:46647 in memory (size: 2.0 MB, free: 364.3 MB)
>>> 17/08/22 14:20:39 INFO BlockManagerInfo: Removed taskresult_27 on
>>> 192.168.25.187:46647 in memory (size: 2.0 MB, free: 366.3 MB)
>>> 17/08/22 14:20:39 INFO TaskSetManager: Finished task 1.0 in stage 13.0
>>> (TID 27) in 179 ms on localhost (executor driver) (2/2)
>>> 17/08/22 14:20:39 INFO TaskSchedulerImpl: Removed TaskSet 13.0, whose
>>> tasks have all completed, from pool
>>> 17/08/22 14:20:39 INFO DAGScheduler: ResultStage 13 (treeAggregate at
>>> IDF.scala:54) finished in 0.185 s
>>> 17/08/22 14:20:39 INFO DAGScheduler: Job 13 finished: treeAggregate at
>>> IDF.scala:54, took 0.204303 s
>>> 17/08/22 14:20:39 INFO Instrumentation: NaiveBayes-NaiveBayes_4836ba16aa3f054bc95c-1042799920-2:
>>> training: numPartitions=2 storageLevel=StorageLevel(1 replicas)
>>> 17/08/22 14:20:39 INFO Instrumentation: NaiveBayes-NaiveBayes_4836ba16aa3f054bc95c-1042799920-2:
>>> {"smoothing":1.0,"featuresCol":"features","modelType":"multi
>>> nomial","labelCol":"label","predictionCol":"prediction","raw
>>> PredictionCol":"rawPrediction","probabilityCol":"probability"}
>>> 17/08/22 14:20:39 INFO CodeGenerator: Code generated in 49.860121 ms
>>> 17/08/22 14:20:40 INFO SparkContext: Starting job: head at
>>> NaiveBayes.scala:154
>>> 17/08/22 14:20:40 INFO DAGScheduler: Got job 14 (head at
>>> NaiveBayes.scala:154) with 1 output partitions
>>> 17/08/22 14:20:40 INFO DAGScheduler: Final stage: ResultStage 14 (head
>>> at NaiveBayes.scala:154)
>>> 17/08/22 14:20:40 INFO DAGScheduler: Parents of final stage: List()
>>> 17/08/22 14:20:40 INFO DAGScheduler: Missing parents: List()
>>> 17/08/22 14:20:40 INFO DAGScheduler: Submitting ResultStage 14
>>> (MapPartitionsRDD[49] at head at NaiveBayes.scala:154), which has no
>>> missing parents
>>> 17/08/22 14:20:40 INFO MemoryStore: Block broadcast_26 stored as values
>>> in memory (estimated size 2.0 MB, free 362.2 MB)
>>> 17/08/22 14:20:40 INFO MemoryStore: Block broadcast_26_piece0 stored as
>>> bytes in memory (estimated size 22.0 KB, free 362.2 MB)
>>> 17/08/22 14:20:40 INFO BlockManagerInfo: Added broadcast_26_piece0 in
>>> memory on 192.168.25.187:46647 (size: 22.0 KB, free: 366.2 MB)
>>> 17/08/22 14:20:40 INFO SparkContext: Created broadcast 26 from broadcast
>>> at DAGScheduler.scala:1006
>>> 17/08/22 14:20:40 INFO DAGScheduler: Submitting 1 missing tasks from
>>> ResultStage 14 (MapPartitionsRDD[49] at head at NaiveBayes.scala:154)
>>> (first 15 tasks are for partitions Vector(0))
>>> 17/08/22 14:20:40 INFO TaskSchedulerImpl: Adding task set 14.0 with 1
>>> tasks
>>> 17/08/22 14:20:40 INFO TaskSetManager: Starting task 0.0 in stage 14.0
>>> (TID 28, localhost, executor driver, partition 0, PROCESS_LOCAL, 4883 bytes)
>>> 17/08/22 14:20:40 INFO Executor: Running task 0.0 in stage 14.0 (TID 28)
>>> 17/08/22 14:20:40 INFO PythonRunner: Times: total = 42, boot = -541,
>>> init = 583, finish = 0
>>> 17/08/22 14:20:40 INFO Executor: Finished task 0.0 in stage 14.0 (TID
>>> 28). 1713 bytes result sent to driver
>>> 17/08/22 14:20:40 INFO TaskSetManager: Finished task 0.0 in stage 14.0
>>> (TID 28) in 78 ms on localhost (executor driver) (1/1)
>>> 17/08/22 14:20:40 INFO TaskSchedulerImpl: Removed TaskSet 14.0, whose
>>> tasks have all completed, from pool
>>> 17/08/22 14:20:40 INFO DAGScheduler: ResultStage 14 (head at
>>> NaiveBayes.scala:154) finished in 0.081 s
>>> 17/08/22 14:20:40 INFO DAGScheduler: Job 14 finished: head at
>>> NaiveBayes.scala:154, took 0.129266 s
>>> 17/08/22 14:20:40 INFO Instrumentation: NaiveBayes-NaiveBayes_4836ba16aa3f054bc95c-1042799920-2:
>>> {"numFeatures":262144}
>>> 17/08/22 14:20:40 INFO SparkContext: Starting job: collect at
>>> NaiveBayes.scala:174
>>> 17/08/22 14:20:40 INFO DAGScheduler: Registering RDD 54 (map at
>>> NaiveBayes.scala:162)
>>> 17/08/22 14:20:40 INFO DAGScheduler: Got job 15 (collect at
>>> NaiveBayes.scala:174) with 2 output partitions
>>> 17/08/22 14:20:40 INFO DAGScheduler: Final stage: ResultStage 16
>>> (collect at NaiveBayes.scala:174)
>>> 17/08/22 14:20:40 INFO DAGScheduler: Parents of final stage:
>>> List(ShuffleMapStage 15)
>>> 17/08/22 14:20:40 INFO DAGScheduler: Missing parents:
>>> List(ShuffleMapStage 15)
>>> 17/08/22 14:20:40 INFO DAGScheduler: Submitting ShuffleMapStage 15
>>> (MapPartitionsRDD[54] at map at NaiveBayes.scala:162), which has no missing
>>> parents
>>> 17/08/22 14:20:40 INFO BlockManagerInfo: Removed broadcast_26_piece0 on
>>> 192.168.25.187:46647 in memory (size: 22.0 KB, free: 366.3 MB)
>>> 17/08/22 14:20:40 INFO MemoryStore: Block broadcast_27 stored as values
>>> in memory (estimated size 6.0 MB, free 358.2 MB)
>>> 17/08/22 14:20:40 INFO ContextCleaner: Cleaned accumulator 373
>>> 17/08/22 14:20:40 INFO ContextCleaner: Cleaned accumulator 374
>>> 17/08/22 14:20:40 INFO BlockManagerInfo: Removed broadcast_25_piece0 on
>>> 192.168.25.187:46647 in memory (size: 11.6 KB, free: 366.3 MB)
>>> 17/08/22 14:20:40 INFO MemoryStore: Block broadcast_27_piece0 stored as
>>> bytes in memory (estimated size 45.0 KB, free 358.2 MB)
>>> 17/08/22 14:20:40 INFO BlockManagerInfo: Added broadcast_27_piece0 in
>>> memory on 192.168.25.187:46647 (size: 45.0 KB, free: 366.2 MB)
>>> 17/08/22 14:20:40 INFO ContextCleaner: Cleaned accumulator 346
>>> 17/08/22 14:20:40 INFO ContextCleaner: Cleaned accumulator 345
>>> 17/08/22 14:20:40 INFO SparkContext: Created broadcast 27 from broadcast
>>> at DAGScheduler.scala:1006
>>> 17/08/22 14:20:40 INFO DAGScheduler: Submitting 2 missing tasks from
>>> ShuffleMapStage 15 (MapPartitionsRDD[54] at map at NaiveBayes.scala:162)
>>> (first 15 tasks are for partitions Vector(0, 1))
>>> 17/08/22 14:20:40 INFO TaskSchedulerImpl: Adding task set 15.0 with 2
>>> tasks
>>> 17/08/22 14:20:40 INFO TaskSetManager: Starting task 0.0 in stage 15.0
>>> (TID 29, localhost, executor driver, partition 0, PROCESS_LOCAL, 4872 bytes)
>>> 17/08/22 14:20:40 INFO TaskSetManager: Starting task 1.0 in stage 15.0
>>> (TID 30, localhost, executor driver, partition 1, PROCESS_LOCAL, 4881 bytes)
>>> 17/08/22 14:20:40 INFO Executor: Running task 1.0 in stage 15.0 (TID 30)
>>> 17/08/22 14:20:40 INFO Executor: Running task 0.0 in stage 15.0 (TID 29)
>>> 17/08/22 14:20:40 INFO PythonRunner: Times: total = 55, boot = -340,
>>> init = 394, finish = 1
>>> 17/08/22 14:20:40 INFO PythonRunner: Times: total = 47, boot = -962,
>>> init = 1009, finish = 0
>>> 17/08/22 14:20:40 INFO Executor: Finished task 1.0 in stage 15.0 (TID
>>> 30). 1867 bytes result sent to driver
>>> 17/08/22 14:20:40 INFO Executor: Finished task 0.0 in stage 15.0 (TID
>>> 29). 1867 bytes result sent to driver
>>> 17/08/22 14:20:40 INFO TaskSetManager: Finished task 1.0 in stage 15.0
>>> (TID 30) in 220 ms on localhost (executor driver) (1/2)
>>> 17/08/22 14:20:40 INFO TaskSetManager: Finished task 0.0 in stage 15.0
>>> (TID 29) in 224 ms on localhost (executor driver) (2/2)
>>> 17/08/22 14:20:40 INFO TaskSchedulerImpl: Removed TaskSet 15.0, whose
>>> tasks have all completed, from pool
>>> 17/08/22 14:20:40 INFO DAGScheduler: ShuffleMapStage 15 (map at
>>> NaiveBayes.scala:162) finished in 0.227 s
>>> 17/08/22 14:20:40 INFO DAGScheduler: looking for newly runnable stages
>>> 17/08/22 14:20:40 INFO DAGScheduler: running: Set()
>>> 17/08/22 14:20:40 INFO DAGScheduler: waiting: Set(ResultStage 16)
>>> 17/08/22 14:20:40 INFO DAGScheduler: failed: Set()
>>> 17/08/22 14:20:40 INFO DAGScheduler: Submitting ResultStage 16
>>> (ShuffledRDD[55] at aggregateByKey at NaiveBayes.scala:163), which has no
>>> missing parents
>>> 17/08/22 14:20:40 INFO MemoryStore: Block broadcast_28 stored as values
>>> in memory (estimated size 6.0 MB, free 352.1 MB)
>>> 17/08/22 14:20:40 INFO MemoryStore: Block broadcast_28_piece0 stored as
>>> bytes in memory (estimated size 45.3 KB, free 352.1 MB)
>>> 17/08/22 14:20:40 INFO BlockManagerInfo: Added broadcast_28_piece0 in
>>> memory on 192.168.25.187:46647 (size: 45.3 KB, free: 366.2 MB)
>>> 17/08/22 14:20:40 INFO SparkContext: Created broadcast 28 from broadcast
>>> at DAGScheduler.scala:1006
>>> 17/08/22 14:20:40 INFO DAGScheduler: Submitting 2 missing tasks from
>>> ResultStage 16 (ShuffledRDD[55] at aggregateByKey at NaiveBayes.scala:163)
>>> (first 15 tasks are for partitions Vector(0, 1))
>>> 17/08/22 14:20:40 INFO TaskSchedulerImpl: Adding task set 16.0 with 2
>>> tasks
>>> 17/08/22 14:20:40 INFO TaskSetManager: Starting task 1.0 in stage 16.0
>>> (TID 31, localhost, executor driver, partition 1, PROCESS_LOCAL, 4621 bytes)
>>> 17/08/22 14:20:40 INFO TaskSetManager: Starting task 0.0 in stage 16.0
>>> (TID 32, localhost, executor driver, partition 0, ANY, 4621 bytes)
>>> 17/08/22 14:20:40 INFO Executor: Running task 0.0 in stage 16.0 (TID 32)
>>> 17/08/22 14:20:40 INFO Executor: Running task 1.0 in stage 16.0 (TID 31)
>>> 17/08/22 14:20:40 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty
>>> blocks out of 2 blocks
>>> 17/08/22 14:20:40 INFO ShuffleBlockFetcherIterator: Started 0 remote
>>> fetches in 13 ms
>>> 17/08/22 14:20:40 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty
>>> blocks out of 2 blocks
>>> 17/08/22 14:20:40 INFO ShuffleBlockFetcherIterator: Started 0 remote
>>> fetches in 19 ms
>>> 17/08/22 14:20:40 INFO Executor: Finished task 1.0 in stage 16.0 (TID
>>> 31). 1888 bytes result sent to driver
>>> 17/08/22 14:20:40 INFO TaskSetManager: Finished task 1.0 in stage 16.0
>>> (TID 31) in 99 ms on localhost (executor driver) (1/2)
>>> 17/08/22 14:20:40 INFO MemoryStore: Block taskresult_32 stored as bytes
>>> in memory (estimated size 4.0 MB, free 348.1 MB)
>>> 17/08/22 14:20:40 INFO BlockManagerInfo: Added taskresult_32 in memory
>>> on 192.168.25.187:46647 (size: 4.0 MB, free: 362.2 MB)
>>> 17/08/22 14:20:40 INFO Executor: Finished task 0.0 in stage 16.0 (TID
>>> 32). 4217031 bytes result sent via BlockManager)
>>> 17/08/22 14:20:40 INFO TaskSetManager: Finished task 0.0 in stage 16.0
>>> (TID 32) in 217 ms on localhost (executor driver) (2/2)
>>> 17/08/22 14:20:40 INFO TaskSchedulerImpl: Removed TaskSet 16.0, whose
>>> tasks have all completed, from pool
>>> 17/08/22 14:20:40 INFO DAGScheduler: ResultStage 16 (collect at
>>> NaiveBayes.scala:174) finished in 0.225 s
>>> 17/08/22 14:20:40 INFO DAGScheduler: Job 15 finished: collect at
>>> NaiveBayes.scala:174, took 0.589796 s
>>> 17/08/22 14:20:40 INFO BlockManagerInfo: Removed taskresult_32 on
>>> 192.168.25.187:46647 in memory (size: 4.0 MB, free: 366.2 MB)
>>> 17/08/22 14:20:40 INFO Instrumentation: NaiveBayes-NaiveBayes_4836ba16aa3f054bc95c-1042799920-2:
>>> {"numClasses":2}
>>> 17/08/22 14:20:41 INFO Instrumentation: NaiveBayes-NaiveBayes_4836ba16aa3f054bc95c-1042799920-2:
>>> training finished
>>> 17/08/22 14:20:41 INFO SparkSqlParser: Parsing command: 1 as id
>>> 17/08/22 14:20:41 INFO SparkSqlParser: Parsing command: CAST(value AS
>>> STRING) as text
>>> 17/08/22 14:20:42 INFO BlockManagerInfo: Removed broadcast_27_piece0 on
>>> 192.168.25.187:46647 in memory (size: 45.0 KB, free: 366.2 MB)
>>> 17/08/22 14:20:42 INFO ContextCleaner: Cleaned accumulator 372
>>> 17/08/22 14:20:42 INFO BlockManagerInfo: Removed broadcast_28_piece0 on
>>> 192.168.25.187:46647 in memory (size: 45.3 KB, free: 366.3 MB)
>>> 17/08/22 14:20:42 INFO ContextCleaner: Cleaned accumulator 371
>>> 17/08/22 14:20:42 INFO ContextCleaner: Cleaned accumulator 400
>>> 17/08/22 14:20:42 INFO ContextCleaner: Cleaned accumulator 399
>>> 17/08/22 14:20:42 INFO ContextCleaner: Cleaned shuffle 0
>>> printing schema
>>> root
>>>  |-- id: integer (nullable = false)
>>>  |-- text: string (nullable = true)
>>>  |-- prediction: double (nullable = true)
>>>
>>> root
>>>  |-- id: integer (nullable = false)
>>>  |-- text: string (nullable = true)
>>>  |-- prediction: double (nullable = true)
>>>
>>> root
>>>  |-- id: integer (nullable = false)
>>>  |-- text: string (nullable = true)
>>>
>>> 17/08/22 14:20:42 INFO StreamExecution: Starting [id =
>>> db779c29-4790-4b60-b035-f5701f6cc8dd, runId =
>>> 1996e5c6-aac3-45b2-871e-c2ba22c2da92]. Use
>>> /tmp/temporary-5d7c55ec-4704-48ce-bf49-bb900f85a284 to store the query
>>> checkpoint.
>>> root
>>>  |-- id: integer (nullable = false)
>>>  |-- text: string (nullable = true)
>>>  |-- prediction: double (nullable = true)
>>>
>>> 17/08/22 14:20:42 INFO ConsumerConfig: ConsumerConfig values:
>>>         metric.reporters = []
>>>         metadata.max.age.ms = 300000
>>>         partition.assignment.strategy = [org.apache.kafka.clients.cons
>>> umer.RangeAssignor]
>>>         reconnect.backoff.ms = 50
>>>         sasl.kerberos.ticket.renew.window.factor = 0.8
>>>         max.partition.fetch.bytes = 1048576
>>>         bootstrap.servers = [localhost:9092]
>>>         ssl.keystore.type = JKS
>>>         enable.auto.commit = false
>>>         sasl.mechanism = GSSAPI
>>>         interceptor.classes = null
>>>         exclude.internal.topics = true
>>>         ssl.truststore.password = null
>>>         client.id =
>>>         ssl.endpoint.identification.algorithm = null
>>>         max.poll.records = 1
>>>         check.crcs = true
>>>         request.timeout.ms = 40000
>>>         heartbeat.interval.ms = 3000
>>>         auto.commit.interval.ms = 5000
>>>         receive.buffer.bytes = 65536
>>>         ssl.truststore.type = JKS
>>>         ssl.truststore.location = null
>>>         ssl.keystore.password = null
>>>         fetch.min.bytes = 1
>>>         send.buffer.bytes = 131072
>>>         value.deserializer = class org.apache.kafka.common.serial
>>> ization.ByteArrayDeserializer
>>>         group.id = spark-kafka-source-49941407-b4
>>> 15-4f5b-a584-5524a84e9066-1732634835-driver-0
>>>         retry.backoff.ms = 100
>>>         sasl.kerberos.kinit.cmd = /usr/bin/kinit
>>>         sasl.kerberos.service.name = null
>>>         sasl.kerberos.ticket.renew.jitter = 0.05
>>>         ssl.trustmanager.algorithm = PKIX
>>>         ssl.key.password = null
>>>         fetch.max.wait.ms = 500
>>>         sasl.kerberos.min.time.before.relogin = 60000
>>>         connections.max.idle.ms = 540000
>>>         session.timeout.ms = 30000
>>>         metrics.num.samples = 2
>>>         key.deserializer = class org.apache.kafka.common.serial
>>> ization.ByteArrayDeserializer
>>>         ssl.protocol = TLS
>>>         ssl.provider = null
>>>         ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
>>>         ssl.keystore.location = null
>>>         ssl.cipher.suites = null
>>>         security.protocol = PLAINTEXT
>>>         ssl.keymanager.algorithm = SunX509
>>>         metrics.sample.window.ms = 30000
>>>         auto.offset.reset = earliest
>>>
>>> 17/08/22 14:20:42 INFO SparkSqlParser: Parsing command: CAST(id AS
>>> STRING) as key
>>> 17/08/22 14:20:42 INFO SparkSqlParser: Parsing command: CONCAT(CAST(text
>>> AS STRING),':', CAST(prediction AS STRING)) as value
>>> 17/08/22 14:20:42 INFO ConsumerConfig: ConsumerConfig values:
>>>         metric.reporters = []
>>>         metadata.max.age.ms = 300000
>>>         partition.assignment.strategy = [org.apache.kafka.clients.cons
>>> umer.RangeAssignor]
>>>         reconnect.backoff.ms = 50
>>>         sasl.kerberos.ticket.renew.window.factor = 0.8
>>>         max.partition.fetch.bytes = 1048576
>>>         bootstrap.servers = [localhost:9092]
>>>         ssl.keystore.type = JKS
>>>         enable.auto.commit = false
>>>         sasl.mechanism = GSSAPI
>>>         interceptor.classes = null
>>>         exclude.internal.topics = true
>>>         ssl.truststore.password = null
>>>         client.id = consumer-1
>>>         ssl.endpoint.identification.algorithm = null
>>>         max.poll.records = 1
>>>         check.crcs = true
>>>         request.timeout.ms = 40000
>>>         heartbeat.interval.ms = 3000
>>>         auto.commit.interval.ms = 5000
>>>         receive.buffer.bytes = 65536
>>>         ssl.truststore.type = JKS
>>>         ssl.truststore.location = null
>>>         ssl.keystore.password = null
>>>         fetch.min.bytes = 1
>>>         send.buffer.bytes = 131072
>>>         value.deserializer = class org.apache.kafka.common.serial
>>> ization.ByteArrayDeserializer
>>>         group.id = spark-kafka-source-49941407-b4
>>> 15-4f5b-a584-5524a84e9066-1732634835-driver-0
>>>         retry.backoff.ms = 100
>>>         sasl.kerberos.kinit.cmd = /usr/bin/kinit
>>>         sasl.kerberos.service.name = null
>>>         sasl.kerberos.ticket.renew.jitter = 0.05
>>>         ssl.trustmanager.algorithm = PKIX
>>>         ssl.key.password = null
>>>         fetch.max.wait.ms = 500
>>>         sasl.kerberos.min.time.before.relogin = 60000
>>>         connections.max.idle.ms = 540000
>>>         session.timeout.ms = 30000
>>>         metrics.num.samples = 2
>>>         key.deserializer = class org.apache.kafka.common.serial
>>> ization.ByteArrayDeserializer
>>>         ssl.protocol = TLS
>>>         ssl.provider = null
>>>         ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
>>>         ssl.keystore.location = null
>>>         ssl.cipher.suites = null
>>>         security.protocol = PLAINTEXT
>>>         ssl.keymanager.algorithm = SunX509
>>>         metrics.sample.window.ms = 30000
>>>         auto.offset.reset = earliest
>>>
>>> 17/08/22 14:20:42 INFO StreamExecution: Starting [id =
>>> b4288c77-f554-4c98-bc63-5539db7d1869, runId =
>>> 3efa5ffc-230f-4819-a7d2-47ae0e6a8a43]. Use /tmp/kaf_tgt to store the
>>> query checkpoint.
>>> 17/08/22 14:20:42 INFO ConsumerConfig: ConsumerConfig values:
>>>         metric.reporters = []
>>>         metadata.max.age.ms = 300000
>>>         partition.assignment.strategy = [org.apache.kafka.clients.cons
>>> umer.RangeAssignor]
>>>         reconnect.backoff.ms = 50
>>>         sasl.kerberos.ticket.renew.window.factor = 0.8
>>>         max.partition.fetch.bytes = 1048576
>>>         bootstrap.servers = [localhost:9092]
>>>         ssl.keystore.type = JKS
>>>         enable.auto.commit = false
>>>         sasl.mechanism = GSSAPI
>>>         interceptor.classes = null
>>>         exclude.internal.topics = true
>>>         ssl.truststore.password = null
>>>         client.id =
>>>         ssl.endpoint.identification.algorithm = null
>>>         max.poll.records = 1
>>>         check.crcs = true
>>>         request.timeout.ms = 40000
>>>         heartbeat.interval.ms = 3000
>>>         auto.commit.interval.ms = 5000
>>>         receive.buffer.bytes = 65536
>>>         ssl.truststore.type = JKS
>>>         ssl.truststore.location = null
>>>         ssl.keystore.password = null
>>>         fetch.min.bytes = 1
>>>         send.buffer.bytes = 131072
>>>         value.deserializer = class org.apache.kafka.common.serial
>>> ization.ByteArrayDeserializer
>>>         group.id = spark-kafka-source-607b6303-bd
>>> a0-4e93-b6fd-d46d4a4d2e1f--19863691-driver-0
>>>         retry.backoff.ms = 100
>>>         sasl.kerberos.kinit.cmd = /usr/bin/kinit
>>>         sasl.kerberos.service.name = null
>>>         sasl.kerberos.ticket.renew.jitter = 0.05
>>>         ssl.trustmanager.algorithm = PKIX
>>>         ssl.key.password = null
>>>         fetch.max.wait.ms = 500
>>>         sasl.kerberos.min.time.before.relogin = 60000
>>>         connections.max.idle.ms = 540000
>>>         session.timeout.ms = 30000
>>>         metrics.num.samples = 2
>>>         key.deserializer = class org.apache.kafka.common.serial
>>> ization.ByteArrayDeserializer
>>>         ssl.protocol = TLS
>>>         ssl.provider = null
>>>         ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
>>>         ssl.keystore.location = null
>>>         ssl.cipher.suites = null
>>>         security.protocol = PLAINTEXT
>>>         ssl.keymanager.algorithm = SunX509
>>>         metrics.sample.window.ms = 30000
>>>         auto.offset.reset = earliest
>>>
>>> 17/08/22 14:20:42 INFO ConsumerConfig: ConsumerConfig values:
>>>         metric.reporters = []
>>>         metadata.max.age.ms = 300000
>>>         partition.assignment.strategy = [org.apache.kafka.clients.cons
>>> umer.RangeAssignor]
>>>         reconnect.backoff.ms = 50
>>>         sasl.kerberos.ticket.renew.window.factor = 0.8
>>>         max.partition.fetch.bytes = 1048576
>>>         bootstrap.servers = [localhost:9092]
>>>         ssl.keystore.type = JKS
>>>         enable.auto.commit = false
>>>         sasl.mechanism = GSSAPI
>>>         interceptor.classes = null
>>>         exclude.internal.topics = true
>>>         ssl.truststore.password = null
>>>         client.id = consumer-2
>>>         ssl.endpoint.identification.algorithm = null
>>>         max.poll.records = 1
>>>         check.crcs = true
>>>         request.timeout.ms = 40000
>>>         heartbeat.interval.ms = 3000
>>>         auto.commit.interval.ms = 5000
>>>         receive.buffer.bytes = 65536
>>>         ssl.truststore.type = JKS
>>>         ssl.truststore.location = null
>>>         ssl.keystore.password = null
>>>         fetch.min.bytes = 1
>>>         send.buffer.bytes = 131072
>>>         value.deserializer = class org.apache.kafka.common.serial
>>> ization.ByteArrayDeserializer
>>>         group.id = spark-kafka-source-607b6303-bd
>>> a0-4e93-b6fd-d46d4a4d2e1f--19863691-driver-0
>>>         retry.backoff.ms = 100
>>>         sasl.kerberos.kinit.cmd = /usr/bin/kinit
>>>         sasl.kerberos.service.name = null
>>>         sasl.kerberos.ticket.renew.jitter = 0.05
>>>         ssl.trustmanager.algorithm = PKIX
>>>         ssl.key.password = null
>>>         fetch.max.wait.ms = 500
>>>         sasl.kerberos.min.time.before.relogin = 60000
>>>         connections.max.idle.ms = 540000
>>>         session.timeout.ms = 30000
>>>         metrics.num.samples = 2
>>>         key.deserializer = class org.apache.kafka.common.serial
>>> ization.ByteArrayDeserializer
>>>         ssl.protocol = TLS
>>>         ssl.provider = null
>>>         ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
>>>         ssl.keystore.location = null
>>>         ssl.cipher.suites = null
>>>         security.protocol = PLAINTEXT
>>>         ssl.keymanager.algorithm = SunX509
>>>         metrics.sample.window.ms = 30000
>>>         auto.offset.reset = earliest
>>>
>>> 17/08/22 14:20:42 INFO AppInfoParser: Kafka version : 0.10.0.1
>>> 17/08/22 14:20:42 INFO AppInfoParser: Kafka commitId : a7a17cdec9eaa6c5
>>> 17/08/22 14:20:42 INFO AppInfoParser: Kafka version : 0.10.0.1
>>> 17/08/22 14:20:42 INFO AppInfoParser: Kafka commitId : a7a17cdec9eaa6c5
>>> 17/08/22 14:20:42 INFO StreamExecution: Starting new streaming query.
>>> 17/08/22 14:20:43 ERROR StreamExecution: Query [id =
>>> b4288c77-f554-4c98-bc63-5539db7d1869, runId =
>>> 3efa5ffc-230f-4819-a7d2-47ae0e6a8a43] terminated with error
>>> java.lang.AssertionError: assertion failed
>>>         at scala.Predef$.assert(Predef.scala:156)
>>>         at org.apache.spark.sql.execution.streaming.OffsetSeq.toStreamP
>>> rogress(OffsetSeq.scala:38)
>>>         at org.apache.spark.sql.execution.streaming.StreamExecution.org
>>> $apache$spark$sql$execution$streaming$StreamExecution$$popul
>>> ateStartOffsets(StreamExecution.scala:429)
>>>         at org.apache.spark.sql.execution.streaming.StreamExecution$$an
>>> onfun$org$apache$spark$sql$execution$streaming$StreamExecuti
>>> on$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(
>>> StreamExecution.scala:297)
>>>         at org.apache.spark.sql.execution.streaming.StreamExecution$$an
>>> onfun$org$apache$spark$sql$execution$streaming$StreamExecuti
>>> on$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply(StreamExe
>>> cution.scala:294)
>>>         at org.apache.spark.sql.execution.streaming.StreamExecution$$an
>>> onfun$org$apache$spark$sql$execution$streaming$StreamExecuti
>>> on$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply(StreamExe
>>> cution.scala:294)
>>>         at org.apache.spark.sql.execution.streaming.ProgressReporter$cl
>>> ass.reportTimeTaken(ProgressReporter.scala:279)
>>>         at org.apache.spark.sql.execution.streaming.StreamExecution.rep
>>> ortTimeTaken(StreamExecution.scala:58)
>>>         at org.apache.spark.sql.execution.streaming.StreamExecution$$an
>>> onfun$org$apache$spark$sql$execution$streaming$StreamExecuti
>>> on$$runBatches$1.apply$mcZ$sp(StreamExecution.scala:294)
>>>         at org.apache.spark.sql.execution.streaming.ProcessingTimeExecu
>>> tor.execute(TriggerExecutor.scala:56)
>>>         at org.apache.spark.sql.execution.streaming.StreamExecution.org
>>> $apache$spark$sql$execution$streaming$StreamExecution$$runBa
>>> tches(StreamExecution.scala:290)
>>>         at org.apache.spark.sql.execution.streaming.StreamExecution$$an
>>> on$1.run(StreamExecution.scala:206)
>>> 17/08/22 14:20:43 INFO AbstractCoordinator: Discovered coordinator
>>> Sandbox.RHEL:9092 (id: 2147483647 <(214)%20748-3647> rack: null) for
>>> group spark-kafka-source-49941407-b415-4f5b-a584-5524a84e9066-1732
>>> 634835-driver-0.
>>> 17/08/22 14:20:43 INFO ConsumerCoordinator: Revoking previously assigned
>>> partitions [] for group spark-kafka-source-49941407-b4
>>> 15-4f5b-a584-5524a84e9066-1732634835-driver-0
>>> 17/08/22 14:20:43 INFO AbstractCoordinator: (Re-)joining group
>>> spark-kafka-source-49941407-b415-4f5b-a584-5524a84e9066-1732
>>> 634835-driver-0
>>> 17/08/22 14:20:43 INFO AbstractCoordinator: Successfully joined group
>>> spark-kafka-source-49941407-b415-4f5b-a584-5524a84e9066-1732634835-driver-0
>>> with generation 1
>>> 17/08/22 14:20:43 INFO ConsumerCoordinator: Setting newly assigned
>>> partitions [spark-stream-0] for group spark-kafka-source-49941407-b4
>>> 15-4f5b-a584-5524a84e9066-1732634835-driver-0
>>> 17/08/22 14:20:43 INFO KafkaSource: Initial offsets:
>>> {"spark-stream":{"0":15}}
>>> 17/08/22 14:20:44 INFO StreamExecution: Committed offsets for batch 0.
>>> Metadata OffsetSeqMetadata(0,1503391843742,Map(spark.sql.shuffle.partitions
>>> -> 200))
>>> 17/08/22 14:20:44 INFO KafkaSource: GetBatch called with start = None,
>>> end = {"spark-stream":{"0":15}}
>>> 17/08/22 14:20:44 INFO KafkaSource: Partitions added: Map()
>>> 17/08/22 14:20:44 INFO KafkaSource: GetBatch generating RDD of offset
>>> range: KafkaSourceRDDOffsetRange(spark-stream-0,15,15,None)
>>> -------------------------------------------
>>> Batch: 0
>>> -------------------------------------------
>>> 17/08/22 14:20:44 INFO CodeGenerator: Code generated in 26.485704 ms
>>> 17/08/22 14:20:44 INFO SparkContext: Starting job: start at
>>> NativeMethodAccessorImpl.java:0
>>> 17/08/22 14:20:44 INFO DAGScheduler: Got job 16 (start at
>>> NativeMethodAccessorImpl.java:0) with 1 output partitions
>>> 17/08/22 14:20:44 INFO DAGScheduler: Final stage: ResultStage 17 (start
>>> at NativeMethodAccessorImpl.java:0)
>>> 17/08/22 14:20:44 INFO DAGScheduler: Parents of final stage: List()
>>> 17/08/22 14:20:44 INFO DAGScheduler: Missing parents: List()
>>> 17/08/22 14:20:44 INFO DAGScheduler: Submitting ResultStage 17
>>> (MapPartitionsRDD[60] at start at NativeMethodAccessorImpl.java:0),
>>> which has no missing parents
>>> 17/08/22 14:20:44 INFO MemoryStore: Block broadcast_29 stored as values
>>> in memory (estimated size 8.9 KB, free 364.2 MB)
>>> 17/08/22 14:20:44 INFO MemoryStore: Block broadcast_29_piece0 stored as
>>> bytes in memory (estimated size 4.8 KB, free 364.2 MB)
>>> 17/08/22 14:20:44 INFO BlockManagerInfo: Added broadcast_29_piece0 in
>>> memory on 192.168.25.187:46647 (size: 4.8 KB, free: 366.3 MB)
>>> 17/08/22 14:20:44 INFO SparkContext: Created broadcast 29 from broadcast
>>> at DAGScheduler.scala:1006
>>> 17/08/22 14:20:44 INFO DAGScheduler: Submitting 1 missing tasks from
>>> ResultStage 17 (MapPartitionsRDD[60] at start at
>>> NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions
>>> Vector(0))
>>> 17/08/22 14:20:44 INFO TaskSchedulerImpl: Adding task set 17.0 with 1
>>> tasks
>>> 17/08/22 14:20:44 INFO TaskSetManager: Starting task 0.0 in stage 17.0
>>> (TID 33, localhost, executor driver, partition 0, PROCESS_LOCAL, 5007 bytes)
>>> 17/08/22 14:20:44 INFO Executor: Running task 0.0 in stage 17.0 (TID 33)
>>> 17/08/22 14:20:44 INFO ConsumerConfig: ConsumerConfig values:
>>>         metric.reporters = []
>>>         metadata.max.age.ms = 300000
>>>         partition.assignment.strategy = [org.apache.kafka.clients.cons
>>> umer.RangeAssignor]
>>>         reconnect.backoff.ms = 50
>>>         sasl.kerberos.ticket.renew.window.factor = 0.8
>>>         max.partition.fetch.bytes = 1048576
>>>         bootstrap.servers = [localhost:9092]
>>>         ssl.keystore.type = JKS
>>>         enable.auto.commit = false
>>>         sasl.mechanism = GSSAPI
>>>         interceptor.classes = null
>>>         exclude.internal.topics = true
>>>         ssl.truststore.password = null
>>>         client.id =
>>>         ssl.endpoint.identification.algorithm = null
>>>         max.poll.records = 2147483647 <(214)%20748-3647>
>>>         check.crcs = true
>>>         request.timeout.ms = 40000
>>>         heartbeat.interval.ms = 3000
>>>         auto.commit.interval.ms = 5000
>>>         receive.buffer.bytes = 65536
>>>         ssl.truststore.type = JKS
>>>         ssl.truststore.location = null
>>>         ssl.keystore.password = null
>>>         fetch.min.bytes = 1
>>>         send.buffer.bytes = 131072
>>>         value.deserializer = class org.apache.kafka.common.serial
>>> ization.ByteArrayDeserializer
>>>         group.id = spark-kafka-source-49941407-b4
>>> 15-4f5b-a584-5524a84e9066-1732634835-executor
>>>         retry.backoff.ms = 100
>>>         sasl.kerberos.kinit.cmd = /usr/bin/kinit
>>>         sasl.kerberos.service.name = null
>>>         sasl.kerberos.ticket.renew.jitter = 0.05
>>>         ssl.trustmanager.algorithm = PKIX
>>>         ssl.key.password = null
>>>         fetch.max.wait.ms = 500
>>>         sasl.kerberos.min.time.before.relogin = 60000
>>>         connections.max.idle.ms = 540000
>>>         session.timeout.ms = 30000
>>>         metrics.num.samples = 2
>>>         key.deserializer = class org.apache.kafka.common.serial
>>> ization.ByteArrayDeserializer
>>>         ssl.protocol = TLS
>>>         ssl.provider = null
>>>         ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
>>>         ssl.keystore.location = null
>>>         ssl.cipher.suites = null
>>>         security.protocol = PLAINTEXT
>>>         ssl.keymanager.algorithm = SunX509
>>>         metrics.sample.window.ms = 30000
>>>         auto.offset.reset = none
>>>
>>> 17/08/22 14:20:44 INFO ConsumerConfig: ConsumerConfig values:
>>>         metric.reporters = []
>>>         metadata.max.age.ms = 300000
>>>         partition.assignment.strategy = [org.apache.kafka.clients.cons
>>> umer.RangeAssignor]
>>>         reconnect.backoff.ms = 50
>>>         sasl.kerberos.ticket.renew.window.factor = 0.8
>>>         max.partition.fetch.bytes = 1048576
>>>         bootstrap.servers = [localhost:9092]
>>>         ssl.keystore.type = JKS
>>>         enable.auto.commit = false
>>>         sasl.mechanism = GSSAPI
>>>         interceptor.classes = null
>>>         exclude.internal.topics = true
>>>         ssl.truststore.password = null
>>>         client.id = consumer-3
>>>         ssl.endpoint.identification.algorithm = null
>>>         max.poll.records = 2147483647 <(214)%20748-3647>
>>>         check.crcs = true
>>>         request.timeout.ms = 40000
>>>         heartbeat.interval.ms = 3000
>>>         auto.commit.interval.ms = 5000
>>>         receive.buffer.bytes = 65536
>>>         ssl.truststore.type = JKS
>>>         ssl.truststore.location = null
>>>         ssl.keystore.password = null
>>>         fetch.min.bytes = 1
>>>         send.buffer.bytes = 131072
>>>         value.deserializer = class org.apache.kafka.common.serial
>>> ization.ByteArrayDeserializer
>>>         group.id = spark-kafka-source-49941407-b4
>>> 15-4f5b-a584-5524a84e9066-1732634835-executor
>>>         retry.backoff.ms = 100
>>>         sasl.kerberos.kinit.cmd = /usr/bin/kinit
>>>         sasl.kerberos.service.name = null
>>>         sasl.kerberos.ticket.renew.jitter = 0.05
>>>         ssl.trustmanager.algorithm = PKIX
>>>         ssl.key.password = null
>>>         fetch.max.wait.ms = 500
>>>         sasl.kerberos.min.time.before.relogin = 60000
>>>         connections.max.idle.ms = 540000
>>>         session.timeout.ms = 30000
>>>         metrics.num.samples = 2
>>>         key.deserializer = class org.apache.kafka.common.serial
>>> ization.ByteArrayDeserializer
>>>         ssl.protocol = TLS
>>>         ssl.provider = null
>>>         ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
>>>         ssl.keystore.location = null
>>>         ssl.cipher.suites = null
>>>         security.protocol = PLAINTEXT
>>>         ssl.keymanager.algorithm = SunX509
>>>         metrics.sample.window.ms = 30000
>>>         auto.offset.reset = none
>>>
>>> 17/08/22 14:20:44 INFO AppInfoParser: Kafka version : 0.10.0.1
>>> 17/08/22 14:20:44 INFO AppInfoParser: Kafka commitId : a7a17cdec9eaa6c5
>>> 17/08/22 14:20:44 INFO KafkaSourceRDD: Beginning offset 15 is the same
>>> as ending offset skipping spark-stream 0
>>> 17/08/22 14:20:44 INFO CodeGenerator: Code generated in 27.911741 ms
>>> 17/08/22 14:20:44 INFO Executor: Finished task 0.0 in stage 17.0 (TID
>>> 33). 1117 bytes result sent to driver
>>> 17/08/22 14:20:44 INFO TaskSetManager: Finished task 0.0 in stage 17.0
>>> (TID 33) in 69 ms on localhost (executor driver) (1/1)
>>> 17/08/22 14:20:44 INFO TaskSchedulerImpl: Removed TaskSet 17.0, whose
>>> tasks have all completed, from pool
>>> 17/08/22 14:20:44 INFO DAGScheduler: ResultStage 17 (start at
>>> NativeMethodAccessorImpl.java:0) finished in 0.071 s
>>> 17/08/22 14:20:44 INFO DAGScheduler: Job 16 finished: start at
>>> NativeMethodAccessorImpl.java:0, took 0.087629 s
>>> 17/08/22 14:20:44 INFO SparkContext: Starting job: start at
>>> NativeMethodAccessorImpl.java:0
>>> 17/08/22 14:20:44 INFO DAGScheduler: Got job 17 (start at
>>> NativeMethodAccessorImpl.java:0) with 1 output partitions
>>> 17/08/22 14:20:44 INFO DAGScheduler: Final stage: ResultStage 18 (start
>>> at NativeMethodAccessorImpl.java:0)
>>> 17/08/22 14:20:44 INFO DAGScheduler: Parents of final stage: List()
>>> 17/08/22 14:20:44 INFO DAGScheduler: Missing parents: List()
>>> 17/08/22 14:20:44 INFO DAGScheduler: Submitting ResultStage 18
>>> (MapPartitionsRDD[64] at start at NativeMethodAccessorImpl.java:0),
>>> which has no missing parents
>>> 17/08/22 14:20:44 INFO MemoryStore: Block broadcast_30 stored as values
>>> in memory (estimated size 8.1 KB, free 364.2 MB)
>>> 17/08/22 14:20:44 INFO MemoryStore: Block broadcast_30_piece0 stored as
>>> bytes in memory (estimated size 4.4 KB, free 364.2 MB)
>>> 17/08/22 14:20:44 INFO BlockManagerInfo: Added broadcast_30_piece0 in
>>> memory on 192.168.25.187:46647 (size: 4.4 KB, free: 366.3 MB)
>>> 17/08/22 14:20:44 INFO SparkContext: Created broadcast 30 from broadcast
>>> at DAGScheduler.scala:1006
>>> 17/08/22 14:20:44 INFO DAGScheduler: Submitting 1 missing tasks from
>>> ResultStage 18 (MapPartitionsRDD[64] at start at
>>> NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions
>>> Vector(0))
>>> 17/08/22 14:20:44 INFO TaskSchedulerImpl: Adding task set 18.0 with 1
>>> tasks
>>> 17/08/22 14:20:44 INFO TaskSetManager: Starting task 0.0 in stage 18.0
>>> (TID 34, localhost, executor driver, partition 0, PROCESS_LOCAL, 4835 bytes)
>>> 17/08/22 14:20:44 INFO Executor: Running task 0.0 in stage 18.0 (TID 34)
>>> 17/08/22 14:20:44 INFO CodeGenerator: Code generated in 25.878504 ms
>>> 17/08/22 14:20:44 INFO Executor: Finished task 0.0 in stage 18.0 (TID
>>> 34). 999 bytes result sent to driver
>>> 17/08/22 14:20:44 INFO TaskSetManager: Finished task 0.0 in stage 18.0
>>> (TID 34) in 47 ms on localhost (executor driver) (1/1)
>>> 17/08/22 14:20:44 INFO TaskSchedulerImpl: Removed TaskSet 18.0, whose
>>> tasks have all completed, from pool
>>> 17/08/22 14:20:44 INFO DAGScheduler: ResultStage 18 (start at
>>> NativeMethodAccessorImpl.java:0) finished in 0.047 s
>>> 17/08/22 14:20:44 INFO DAGScheduler: Job 17 finished: start at
>>> NativeMethodAccessorImpl.java:0, took 0.057313 s
>>> 17/08/22 14:20:44 INFO SparkContext: Starting job: start at
>>> NativeMethodAccessorImpl.java:0
>>> 17/08/22 14:20:44 INFO DAGScheduler: Got job 18 (start at
>>> NativeMethodAccessorImpl.java:0) with 1 output partitions
>>> 17/08/22 14:20:44 INFO DAGScheduler: Final stage: ResultStage 19 (start
>>> at NativeMethodAccessorImpl.java:0)
>>> 17/08/22 14:20:44 INFO DAGScheduler: Parents of final stage: List()
>>> 17/08/22 14:20:44 INFO DAGScheduler: Missing parents: List()
>>> 17/08/22 14:20:44 INFO DAGScheduler: Submitting ResultStage 19
>>> (MapPartitionsRDD[64] at start at NativeMethodAccessorImpl.java:0),
>>> which has no missing parents
>>> 17/08/22 14:20:44 INFO MemoryStore: Block broadcast_31 stored as values
>>> in memory (estimated size 8.1 KB, free 364.2 MB)
>>> 17/08/22 14:20:44 INFO MemoryStore: Block broadcast_31_piece0 stored as
>>> bytes in memory (estimated size 4.4 KB, free 364.2 MB)
>>> 17/08/22 14:20:44 INFO BlockManagerInfo: Added broadcast_31_piece0 in
>>> memory on 192.168.25.187:46647 (size: 4.4 KB, free: 366.3 MB)
>>> 17/08/22 14:20:44 INFO SparkContext: Created broadcast 31 from broadcast
>>> at DAGScheduler.scala:1006
>>> 17/08/22 14:20:44 INFO DAGScheduler: Submitting 1 missing tasks from
>>> ResultStage 19 (MapPartitionsRDD[64] at start at
>>> NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions
>>> Vector(1))
>>> 17/08/22 14:20:44 INFO TaskSchedulerImpl: Adding task set 19.0 with 1
>>> tasks
>>> 17/08/22 14:20:44 INFO TaskSetManager: Starting task 0.0 in stage 19.0
>>> (TID 35, localhost, executor driver, partition 1, PROCESS_LOCAL, 4835 bytes)
>>> 17/08/22 14:20:44 INFO Executor: Running task 0.0 in stage 19.0 (TID 35)
>>> 17/08/22 14:20:44 INFO Executor: Finished task 0.0 in stage 19.0 (TID
>>> 35). 999 bytes result sent to driver
>>> 17/08/22 14:20:44 INFO TaskSetManager: Finished task 0.0 in stage 19.0
>>> (TID 35) in 15 ms on localhost (executor driver) (1/1)
>>> 17/08/22 14:20:44 INFO TaskSchedulerImpl: Removed TaskSet 19.0, whose
>>> tasks have all completed, from pool
>>> 17/08/22 14:20:44 INFO DAGScheduler: ResultStage 19 (start at
>>> NativeMethodAccessorImpl.java:0) finished in 0.015 s
>>> 17/08/22 14:20:44 INFO DAGScheduler: Job 18 finished: start at
>>> NativeMethodAccessorImpl.java:0, took 0.024716 s
>>> +---+----+
>>> | id|text|
>>> +---+----+
>>> +---+----+
>>>
>>> 17/08/22 14:20:45 INFO StreamExecution: Streaming query made progress: {
>>>   "id" : "db779c29-4790-4b60-b035-f5701f6cc8dd",
>>>   "runId" : "1996e5c6-aac3-45b2-871e-c2ba22c2da92",
>>>   "name" : null,
>>>   "timestamp" : "2017-08-22T08:50:42.983Z",
>>>   "numInputRows" : 0,
>>>   "processedRowsPerSecond" : 0.0,
>>>   "durationMs" : {
>>>     "addBatch" : 462,
>>>     "getBatch" : 85,
>>>     "getOffset" : 631,
>>>     "queryPlanning" : 201,
>>>     "triggerExecution" : 1891,
>>>     "walCommit" : 301
>>>   },
>>>   "stateOperators" : [ ],
>>>   "sources" : [ {
>>>     "description" : "KafkaSource[Subscribe[spark-stream]]",
>>>     "startOffset" : null,
>>>     "endOffset" : {
>>>       "spark-stream" : {
>>>         "0" : 15
>>>       }
>>>     },
>>>     "numInputRows" : 0,
>>>     "processedRowsPerSecond" : 0.0
>>>   } ],
>>>   "sink" : {
>>>     "description" : "org.apache.spark.sql.executio
>>> n.streaming.ConsoleSink@68e8a192"
>>>   }
>>> }
>>> 17/08/22 14:20:45 INFO StreamExecution: Streaming query made progress: {
>>>   "id" : "db779c29-4790-4b60-b035-f5701f6cc8dd",
>>>   "runId" : "1996e5c6-aac3-45b2-871e-c2ba22c2da92",
>>>   "name" : null,
>>>   "timestamp" : "2017-08-22T08:50:45.199Z",
>>>   "numInputRows" : 0,
>>>   "inputRowsPerSecond" : 0.0,
>>>   "processedRowsPerSecond" : 0.0,
>>>   "durationMs" : {
>>>     "getOffset" : 3,
>>>     "triggerExecution" : 3
>>>   },
>>>   "stateOperators" : [ ],
>>>   "sources" : [ {
>>>     "description" : "KafkaSource[Subscribe[spark-stream]]",
>>>     "startOffset" : {
>>>       "spark-stream" : {
>>>         "0" : 15
>>>       }
>>>     },
>>>     "endOffset" : {
>>>       "spark-stream" : {
>>>         "0" : 15
>>>       }
>>>     },
>>>     "numInputRows" : 0,
>>>     "inputRowsPerSecond" : 0.0,
>>>     "processedRowsPerSecond" : 0.0
>>>   } ],
>>>   "sink" : {
>>>     "description" : "org.apache.spark.sql.executio
>>> n.streaming.ConsoleSink@68e8a192"
>>>   }
>>> }
>>> Traceback (most recent call last):
>>>   File "/home/jatin/projects/app/kafka_ml.py", line 67, in <module>
>>>     kafka_write.awaitTermination()
>>>   File "/home/jatin/projects/spark/spark-2.2.0-bin-hadoop2.7/python
>>> /lib/pyspark.zip/pyspark/sql/streaming.py", line 106, in
>>> awaitTermination
>>>   File "/home/jatin/projects/spark/spark-2.2.0-bin-hadoop2.7/python
>>> /lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
>>>   File "/home/jatin/projects/spark/spark-2.2.0-bin-hadoop2.7/python
>>> /lib/pyspark.zip/pyspark/sql/utils.py", line 75, in deco
>>> pyspark.sql.utils.StreamingQueryException: u'assertion failed\n===
>>> Streaming Query ===\nIdentifier: [id = b4288c77-f554-4c98-bc63-5539db7d1869,
>>> runId = 3efa5ffc-230f-4819-a7d2-47ae0e6a8a43]\nCurrent Committed
>>> Offsets: {}\nCurrent Available Offsets: {}\n\nCurrent State: ACTIVE\nThread
>>> State: RUNNABLE\n\nLogical Plan:\nProject [cast(id#174 as string) AS
>>> key#326, concat(cast(text#175 as string), :, cast(prediction#224 as
>>> string)) AS value#327]\n+- Union\n   :- Project [id#174, text#175,
>>> prediction#224]\n   :  +- Project [id#174, text#175, words#179,
>>> relevant_words#184, rawFeatures#190, features#197, rawPrediction#205,
>>> probability#214, UDF(rawPrediction#205) AS prediction#224]\n   :     +-
>>> Project [id#174, text#175, words#179, relevant_words#184, rawFeatures#190,
>>> features#197, rawPrediction#205, UDF(rawPrediction#205) AS
>>> probability#214]\n   :        +- Project [id#174, text#175, words#179,
>>> relevant_words#184, rawFeatures#190, features#197, UDF(features#197) AS
>>> rawPrediction#205]\n   :           +- Project [id#174, text#175, words#179,
>>> relevant_words#184, rawFeatures#190, UDF(rawFeatures#190) AS
>>> features#197]\n   :              +- Project [id#174, text#175, words#179,
>>> relevant_words#184, UDF(relevant_words#184) AS rawFeatures#190]\n   :
>>>           +- Project [id#174, text#175, words#179, UDF(words#179) AS
>>> relevant_words#184]\n   :                    +- Project [id#174, text#175,
>>> UDF(text#175) AS words#179]\n   :                       +- Project [1 AS
>>> id#174, cast(value#160 as string) AS text#175]\n   :
>>>    +- StreamingExecutionRelation KafkaSource[Subscribe[spark-stream]],
>>> [key#159, value#160, topic#161, partition#162, offset#163L, timestamp#164,
>>> timestampType#165]\n   +- Project [id#174, text#175, prediction#290]\n
>>>  +- Project [id#174, text#175, words#245, relevant_words#250,
>>> rawFeatures#256, features#263, rawPrediction#271, probability#280,
>>> UDF(rawPrediction#271) AS prediction#290]\n         +- Project [id#174,
>>> text#175, words#245, relevant_words#250, rawFeatures#256, features#263,
>>> rawPrediction#271, UDF(rawPrediction#271) AS probability#280]\n
>>>  +- Project [id#174, text#175, words#245, relevant_words#250,
>>> rawFeatures#256, features#263, UDF(features#263) AS rawPrediction#271]\n
>>>             +- Project [id#174, text#175, words#245, relevant_words#250,
>>> rawFeatures#256, UDF(rawFeatures#256) AS features#263]\n
>>>  +- Project [id#174, text#175, words#245, relevant_words#250,
>>> UDF(relevant_words#250) AS rawFeatures#256]\n                     +-
>>> Project [id#174, text#175, words#245, UDF(words#245) AS
>>> relevant_words#250]\n                        +- Project [id#174, text#175,
>>> UDF(text#175) AS words#245]\n                           +- Project [1 AS
>>> id#174, cast(value#160 as string) AS text#175]\n
>>>    +- StreamingExecutionRelation KafkaSource[Subscribe[spark-stream]],
>>> [key#159, value#160, topic#161, partition#162, offset#163L, timestamp#164,
>>> timestampType#165]\n'
>>> 17/08/22 14:20:45 INFO SparkContext: Invoking stop() from shutdown hook
>>> 17/08/22 14:20:45 INFO SparkUI: Stopped Spark web UI at
>>> http://192.168.25.187:4040
>>> 17/08/22 14:20:45 INFO MapOutputTrackerMasterEndpoint:
>>> MapOutputTrackerMasterEndpoint stopped!
>>> 17/08/22 14:20:45 INFO MemoryStore: MemoryStore cleared
>>> 17/08/22 14:20:45 INFO BlockManager: BlockManager stopped
>>> 17/08/22 14:20:45 INFO BlockManagerMaster: BlockManagerMaster stopped
>>> 17/08/22 14:20:45 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
>>> OutputCommitCoordinator stopped!
>>> 17/08/22 14:20:45 INFO SparkContext: Successfully stopped SparkContext
>>> 17/08/22 14:20:45 INFO ShutdownHookManager: Shutdown hook called
>>> 17/08/22 14:20:45 INFO ShutdownHookManager: Deleting directory
>>> /tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/pyspark-146b
>>> 573e-66f7-41af-bf89-dc2b5a426c75
>>> 17/08/22 14:20:45 INFO ShutdownHookManager: Deleting directory
>>> /tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7
>>> 17/08/22 14:20:45 INFO ShutdownHookManager: Deleting directory
>>> /tmp/temporary-5d7c55ec-4704-48ce-bf49-bb900f85a284
>>> jatin@Sandbox.RHEL:/home/jatin/projects/app#vi kafka_ml.py
>>> jatin@Sandbox.RHEL:/home/jatin/projects/app#
>>> jatin@Sandbox.RHEL:/home/jatin/projects/app#
>>>
>>>
>>
>

Mime
View raw message