spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From upkar_kohli <upkar.ko...@gmail.com>
Subject Structured streaming (Kafka) error with Union
Date Tue, 22 Aug 2017 09:12:35 GMT
Hi,

I am doing a POC on ML using structured streaming.

The code below runs fine if i run streaming script using
selected_lr.selectExpr (just logistic regression), but fails when i run
selected_all.selectExpr (which is union of naive bayes and logistic
regression pipelines).

Please can you help with what I am doing wrong.

Regards,
Upkar

**************************************************************************************************
Code Starts
----------------------------------------------------------------------------------------------

from pyspark.ml import Pipeline
from pyspark.ml.classification import LogisticRegression, NaiveBayes
from pyspark.ml.feature import HashingTF, Tokenizer, StopWordsRemover,
CountVectorizer, IDF
from pyspark import SparkConf, SparkContext
from pyspark.sql import SparkSession
from pyspark.sql.functions import concat, col, lit

if __name__ == "__main__":
    spark = SparkSession\
        .builder\
        .appName("KafkaSentiment")\
        .getOrCreate()

    # $example on$
    # Prepare training documents from a list of (id, text, label) tuples.
    training = spark.createDataFrame([
        (0, "a b c d e spark", 1.0),
        (1, "b d", 0.0),
        (2, "spark f g h", 1.0),
        (3, "hadoop mapreduce", 0.0)
    ], ["id", "text", "label"])

    # Configure an ML pipeline, which consists of three stages: tokenizer,
hashingTF, and lr.
    tokenizer = Tokenizer(inputCol="text", outputCol="words")
    remover = StopWordsRemover(inputCol=tokenizer.getOutputCol(),
outputCol="relevant_words")
    hashingTF = HashingTF(inputCol=remover.getOutputCol(),
outputCol="rawFeatures")
    idf = IDF(inputCol="rawFeatures", outputCol="features")
    lr = LogisticRegression(maxIter=10, regParam=0.001)
    nb = NaiveBayes(smoothing=1.0, modelType="multinomial")
    pipeline_lr = Pipeline(stages=[tokenizer, remover, hashingTF, idf, lr])
    pipeline_nb = Pipeline(stages=[tokenizer, remover, hashingTF, idf, nb])

    # Fit the pipeline to training documents.
    model_lr = pipeline_lr.fit(training)
    model_nb = pipeline_nb.fit(training)

    testing_data =
spark.readStream.format("kafka").option("kafka.bootstrap.servers",
"localhost:9092").option("subscribe", "spark-stream").load()
    test_ml = testing_data.selectExpr("1 as id", "CAST(value AS STRING) as
text")

    # Make predictions on test documents and print columns of interest.
    prediction_lr = model_lr.transform(test_ml)
    prediction_nb = model_nb.transform(test_ml)
    selected_lr = prediction_lr.select("id", "text", "prediction")
    selected_nb = prediction_nb.select("id", "text", "prediction")
    print ("printing schema")
    selected_lr.printSchema()
    selected_nb.printSchema()
    test_ml.printSchema()

    query =
test_ml.writeStream.outputMode("append").format("console").start()
    selected_all=selected_lr.union(selected_nb)
    selected_all.printSchema()
    kafka_write = selected_all.selectExpr("CAST(id AS STRING) as key",
"CONCAT(CAST(text AS STRING),':', CAST(prediction AS STRING)) as
value").writeStream.format("kafka").option("checkpointLocation",
"/tmp/kaf_tgt").option("kafka.bootstrap.servers",
"localhost:9092").option("topic", "topic_tgt").start()

    kafka_write.awaitTermination()

    query.awaitTermination()

Code Ends
----------------------------------------------------------------------------------------------
***********************************************************************************************








jatin@Sandbox.RHEL:/home/jatin/projects/app#spark-submit --packages
org.apache.spark:spark-sql-kafka-0-10_2.11:2.2.0 kafka_ml.py
Ivy Default Cache set to: /home/jatin/.ivy2/cache
The jars for the packages stored in: /home/jatin/.ivy2/jars
:: loading settings :: url =
jar:file:/home/jatin/projects/spark/spark-2.2.0-bin-hadoop2.7/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.apache.spark#spark-sql-kafka-0-10_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
        confs: [default]
        found org.apache.spark#spark-sql-kafka-0-10_2.11;2.2.0 in central
        found org.apache.kafka#kafka-clients;0.10.0.1 in central
        found net.jpountz.lz4#lz4;1.3.0 in central
        found org.xerial.snappy#snappy-java;1.1.2.6 in central
        found org.slf4j#slf4j-api;1.7.16 in central
        found org.apache.spark#spark-tags_2.11;2.2.0 in central
        found org.spark-project.spark#unused;1.0.0 in central
:: resolution report :: resolve 9475ms :: artifacts dl 23ms
        :: modules in use:
        net.jpountz.lz4#lz4;1.3.0 from central in [default]
        org.apache.kafka#kafka-clients;0.10.0.1 from central in [default]
        org.apache.spark#spark-sql-kafka-0-10_2.11;2.2.0 from central in
[default]
        org.apache.spark#spark-tags_2.11;2.2.0 from central in [default]
        org.slf4j#slf4j-api;1.7.16 from central in [default]
        org.spark-project.spark#unused;1.0.0 from central in [default]
        org.xerial.snappy#snappy-java;1.1.2.6 from central in [default]

---------------------------------------------------------------------
        |                  |            modules            ||   artifacts
|
        |       conf       | number| search|dwnlded|evicted||
number|dwnlded|

---------------------------------------------------------------------
        |      default     |   7   |   2   |   2   |   0   ||   7   |   0
|

---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent
        confs: [default]
        0 artifacts copied, 7 already retrieved (0kB/15ms)
Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
17/08/22 14:20:15 INFO SparkContext: Running Spark version 2.2.0
17/08/22 14:20:15 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
17/08/22 14:20:16 INFO SparkContext: Submitted application: KafkaSentiment
17/08/22 14:20:16 INFO SecurityManager: Changing view acls to: jatin
17/08/22 14:20:16 INFO SecurityManager: Changing modify acls to: jatin
17/08/22 14:20:16 INFO SecurityManager: Changing view acls groups to:
17/08/22 14:20:16 INFO SecurityManager: Changing modify acls groups to:
17/08/22 14:20:16 INFO SecurityManager: SecurityManager: authentication
disabled; ui acls disabled; users  with view permissions: Set(jatin);
groups with view permissions: Set(); users  with modify permissions:
Set(jatin); groups with modify permissions: Set()
17/08/22 14:20:16 INFO Utils: Successfully started service 'sparkDriver' on
port 41554.
17/08/22 14:20:16 INFO SparkEnv: Registering MapOutputTracker
17/08/22 14:20:17 INFO SparkEnv: Registering BlockManagerMaster
17/08/22 14:20:17 INFO BlockManagerMasterEndpoint: Using
org.apache.spark.storage.DefaultTopologyMapper for getting topology
information
17/08/22 14:20:17 INFO BlockManagerMasterEndpoint:
BlockManagerMasterEndpoint up
17/08/22 14:20:17 INFO DiskBlockManager: Created local directory at
/tmp/blockmgr-41217e8a-3b1f-498c-a749-8c51564e0ebe
17/08/22 14:20:17 INFO MemoryStore: MemoryStore started with capacity 366.3
MB
17/08/22 14:20:17 INFO SparkEnv: Registering OutputCommitCoordinator
17/08/22 14:20:17 INFO Utils: Successfully started service 'SparkUI' on
port 4040.
17/08/22 14:20:17 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at
http://192.168.25.187:4040
17/08/22 14:20:17 INFO SparkContext: Added JAR
file:/home/jatin/.ivy2/jars/org.apache.spark_spark-sql-kafka-0-10_2.11-2.2.0.jar
at spark://
192.168.25.187:41554/jars/org.apache.spark_spark-sql-kafka-0-10_2.11-2.2.0.jar
with timestamp 1503391817919
17/08/22 14:20:17 INFO SparkContext: Added JAR
file:/home/jatin/.ivy2/jars/org.apache.kafka_kafka-clients-0.10.0.1.jar at
spark://
192.168.25.187:41554/jars/org.apache.kafka_kafka-clients-0.10.0.1.jar with
timestamp 1503391817924
17/08/22 14:20:17 INFO SparkContext: Added JAR
file:/home/jatin/.ivy2/jars/org.apache.spark_spark-tags_2.11-2.2.0.jar at
spark://192.168.25.187:41554/jars/org.apache.spark_spark-tags_2.11-2.2.0.jar
with timestamp 1503391817924
17/08/22 14:20:17 INFO SparkContext: Added JAR
file:/home/jatin/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar at
spark://192.168.25.187:41554/jars/org.spark-project.spark_unused-1.0.0.jar
with timestamp 1503391817924
17/08/22 14:20:17 INFO SparkContext: Added JAR
file:/home/jatin/.ivy2/jars/net.jpountz.lz4_lz4-1.3.0.jar at spark://
192.168.25.187:41554/jars/net.jpountz.lz4_lz4-1.3.0.jar with timestamp
1503391817924
17/08/22 14:20:17 INFO SparkContext: Added JAR
file:/home/jatin/.ivy2/jars/org.xerial.snappy_snappy-java-1.1.2.6.jar at
spark://192.168.25.187:41554/jars/org.xerial.snappy_snappy-java-1.1.2.6.jar
with timestamp 1503391817924
17/08/22 14:20:17 INFO SparkContext: Added JAR
file:/home/jatin/.ivy2/jars/org.slf4j_slf4j-api-1.7.16.jar at spark://
192.168.25.187:41554/jars/org.slf4j_slf4j-api-1.7.16.jar with timestamp
1503391817925
17/08/22 14:20:18 INFO SparkContext: Added file
file:/home/jatin/projects/app/kafka_ml.py at
file:/home/jatin/projects/app/kafka_ml.py with timestamp 1503391818468
17/08/22 14:20:18 INFO Utils: Copying /home/jatin/projects/app/kafka_ml.py
to
/tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c82f1b/kafka_ml.py
17/08/22 14:20:18 INFO SparkContext: Added file
file:/home/jatin/.ivy2/jars/org.apache.spark_spark-sql-kafka-0-10_2.11-2.2.0.jar
at
file:/home/jatin/.ivy2/jars/org.apache.spark_spark-sql-kafka-0-10_2.11-2.2.0.jar
with timestamp 1503391818514
17/08/22 14:20:18 INFO Utils: Copying
/home/jatin/.ivy2/jars/org.apache.spark_spark-sql-kafka-0-10_2.11-2.2.0.jar
to
/tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c82f1b/org.apache.spark_spark-sql-kafka-0-10_2.11-2.2.0.jar
17/08/22 14:20:18 INFO SparkContext: Added file
file:/home/jatin/.ivy2/jars/org.apache.kafka_kafka-clients-0.10.0.1.jar at
file:/home/jatin/.ivy2/jars/org.apache.kafka_kafka-clients-0.10.0.1.jar
with timestamp 1503391818538
17/08/22 14:20:18 INFO Utils: Copying
/home/jatin/.ivy2/jars/org.apache.kafka_kafka-clients-0.10.0.1.jar to
/tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c82f1b/org.apache.kafka_kafka-clients-0.10.0.1.jar
17/08/22 14:20:18 INFO SparkContext: Added file
file:/home/jatin/.ivy2/jars/org.apache.spark_spark-tags_2.11-2.2.0.jar at
file:/home/jatin/.ivy2/jars/org.apache.spark_spark-tags_2.11-2.2.0.jar with
timestamp 1503391818567
17/08/22 14:20:18 INFO Utils: Copying
/home/jatin/.ivy2/jars/org.apache.spark_spark-tags_2.11-2.2.0.jar to
/tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c82f1b/org.apache.spark_spark-tags_2.11-2.2.0.jar
17/08/22 14:20:18 INFO SparkContext: Added file
file:/home/jatin/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar at
file:/home/jatin/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar with
timestamp 1503391818573
17/08/22 14:20:18 INFO Utils: Copying
/home/jatin/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar to
/tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c82f1b/org.spark-project.spark_unused-1.0.0.jar
17/08/22 14:20:18 INFO SparkContext: Added file
file:/home/jatin/.ivy2/jars/net.jpountz.lz4_lz4-1.3.0.jar at
file:/home/jatin/.ivy2/jars/net.jpountz.lz4_lz4-1.3.0.jar with timestamp
1503391818581
17/08/22 14:20:18 INFO Utils: Copying
/home/jatin/.ivy2/jars/net.jpountz.lz4_lz4-1.3.0.jar to
/tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c82f1b/net.jpountz.lz4_lz4-1.3.0.jar
17/08/22 14:20:18 INFO SparkContext: Added file
file:/home/jatin/.ivy2/jars/org.xerial.snappy_snappy-java-1.1.2.6.jar at
file:/home/jatin/.ivy2/jars/org.xerial.snappy_snappy-java-1.1.2.6.jar with
timestamp 1503391818602
17/08/22 14:20:18 INFO Utils: Copying
/home/jatin/.ivy2/jars/org.xerial.snappy_snappy-java-1.1.2.6.jar to
/tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c82f1b/org.xerial.snappy_snappy-java-1.1.2.6.jar
17/08/22 14:20:18 INFO SparkContext: Added file
file:/home/jatin/.ivy2/jars/org.slf4j_slf4j-api-1.7.16.jar at
file:/home/jatin/.ivy2/jars/org.slf4j_slf4j-api-1.7.16.jar with timestamp
1503391818631
17/08/22 14:20:18 INFO Utils: Copying
/home/jatin/.ivy2/jars/org.slf4j_slf4j-api-1.7.16.jar to
/tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c82f1b/org.slf4j_slf4j-api-1.7.16.jar
17/08/22 14:20:18 INFO Executor: Starting executor ID driver on host
localhost
17/08/22 14:20:18 INFO Utils: Successfully started service
'org.apache.spark.network.netty.NettyBlockTransferService' on port 46647.
17/08/22 14:20:18 INFO NettyBlockTransferService: Server created on
192.168.25.187:46647
17/08/22 14:20:18 INFO BlockManager: Using
org.apache.spark.storage.RandomBlockReplicationPolicy for block replication
policy
17/08/22 14:20:18 INFO BlockManagerMaster: Registering BlockManager
BlockManagerId(driver, 192.168.25.187, 46647, None)
17/08/22 14:20:18 INFO BlockManagerMasterEndpoint: Registering block
manager 192.168.25.187:46647 with 366.3 MB RAM, BlockManagerId(driver,
192.168.25.187, 46647, None)
17/08/22 14:20:18 INFO BlockManagerMaster: Registered BlockManager
BlockManagerId(driver, 192.168.25.187, 46647, None)
17/08/22 14:20:18 INFO BlockManager: Initialized BlockManager:
BlockManagerId(driver, 192.168.25.187, 46647, None)
17/08/22 14:20:19 INFO SharedState: Setting hive.metastore.warehouse.dir
('null') to the value of spark.sql.warehouse.dir
('file:/home/jatin/projects/app/spark-warehouse/').
17/08/22 14:20:19 INFO SharedState: Warehouse path is
'file:/home/jatin/projects/app/spark-warehouse/'.
17/08/22 14:20:20 INFO StateStoreCoordinatorRef: Registered
StateStoreCoordinator endpoint
17/08/22 14:20:26 INFO CodeGenerator: Code generated in 504.736602 ms
17/08/22 14:20:27 INFO SparkContext: Starting job: treeAggregate at
IDF.scala:54
17/08/22 14:20:27 INFO DAGScheduler: Got job 0 (treeAggregate at
IDF.scala:54) with 2 output partitions
17/08/22 14:20:27 INFO DAGScheduler: Final stage: ResultStage 0
(treeAggregate at IDF.scala:54)
17/08/22 14:20:27 INFO DAGScheduler: Parents of final stage: List()
17/08/22 14:20:27 INFO DAGScheduler: Missing parents: List()
17/08/22 14:20:27 INFO DAGScheduler: Submitting ResultStage 0
(MapPartitionsRDD[10] at treeAggregate at IDF.scala:54), which has no
missing parents
17/08/22 14:20:27 INFO MemoryStore: Block broadcast_0 stored as values in
memory (estimated size 26.0 KB, free 366.3 MB)
17/08/22 14:20:27 INFO MemoryStore: Block broadcast_0_piece0 stored as
bytes in memory (estimated size 11.5 KB, free 366.3 MB)
17/08/22 14:20:27 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory
on 192.168.25.187:46647 (size: 11.5 KB, free: 366.3 MB)
17/08/22 14:20:27 INFO SparkContext: Created broadcast 0 from broadcast at
DAGScheduler.scala:1006
17/08/22 14:20:27 INFO DAGScheduler: Submitting 2 missing tasks from
ResultStage 0 (MapPartitionsRDD[10] at treeAggregate at IDF.scala:54)
(first 15 tasks are for partitions Vector(0, 1))
17/08/22 14:20:27 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
17/08/22 14:20:27 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID
0, localhost, executor driver, partition 0, PROCESS_LOCAL, 4883 bytes)
17/08/22 14:20:27 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID
1, localhost, executor driver, partition 1, PROCESS_LOCAL, 4892 bytes)
17/08/22 14:20:28 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
17/08/22 14:20:28 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
17/08/22 14:20:28 INFO Executor: Fetching
file:/home/jatin/.ivy2/jars/org.apache.kafka_kafka-clients-0.10.0.1.jar
with timestamp 1503391818538
17/08/22 14:20:28 INFO Utils:
/home/jatin/.ivy2/jars/org.apache.kafka_kafka-clients-0.10.0.1.jar has been
previously copied to
/tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c82f1b/org.apache.kafka_kafka-clients-0.10.0.1.jar
17/08/22 14:20:28 INFO Executor: Fetching
file:/home/jatin/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar with
timestamp 1503391818573
17/08/22 14:20:28 INFO Utils:
/home/jatin/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar has been
previously copied to
/tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c82f1b/org.spark-project.spark_unused-1.0.0.jar
17/08/22 14:20:28 INFO Executor: Fetching
file:/home/jatin/.ivy2/jars/org.apache.spark_spark-sql-kafka-0-10_2.11-2.2.0.jar
with timestamp 1503391818514
17/08/22 14:20:28 INFO Utils:
/home/jatin/.ivy2/jars/org.apache.spark_spark-sql-kafka-0-10_2.11-2.2.0.jar
has been previously copied to
/tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c82f1b/org.apache.spark_spark-sql-kafka-0-10_2.11-2.2.0.jar
17/08/22 14:20:28 INFO Executor: Fetching
file:/home/jatin/projects/app/kafka_ml.py with timestamp 1503391818468
17/08/22 14:20:28 INFO Utils: /home/jatin/projects/app/kafka_ml.py has been
previously copied to
/tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c82f1b/kafka_ml.py
17/08/22 14:20:28 INFO Executor: Fetching
file:/home/jatin/.ivy2/jars/org.slf4j_slf4j-api-1.7.16.jar with timestamp
1503391818631
17/08/22 14:20:28 INFO Utils:
/home/jatin/.ivy2/jars/org.slf4j_slf4j-api-1.7.16.jar has been previously
copied to
/tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c82f1b/org.slf4j_slf4j-api-1.7.16.jar
17/08/22 14:20:28 INFO Executor: Fetching
file:/home/jatin/.ivy2/jars/net.jpountz.lz4_lz4-1.3.0.jar with timestamp
1503391818581
17/08/22 14:20:28 INFO Utils:
/home/jatin/.ivy2/jars/net.jpountz.lz4_lz4-1.3.0.jar has been previously
copied to
/tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c82f1b/net.jpountz.lz4_lz4-1.3.0.jar
17/08/22 14:20:28 INFO Executor: Fetching
file:/home/jatin/.ivy2/jars/org.xerial.snappy_snappy-java-1.1.2.6.jar with
timestamp 1503391818602
17/08/22 14:20:28 INFO Utils:
/home/jatin/.ivy2/jars/org.xerial.snappy_snappy-java-1.1.2.6.jar has been
previously copied to
/tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c82f1b/org.xerial.snappy_snappy-java-1.1.2.6.jar
17/08/22 14:20:28 INFO Executor: Fetching
file:/home/jatin/.ivy2/jars/org.apache.spark_spark-tags_2.11-2.2.0.jar with
timestamp 1503391818567
17/08/22 14:20:28 INFO Utils:
/home/jatin/.ivy2/jars/org.apache.spark_spark-tags_2.11-2.2.0.jar has been
previously copied to
/tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c82f1b/org.apache.spark_spark-tags_2.11-2.2.0.jar
17/08/22 14:20:28 INFO Executor: Fetching spark://
192.168.25.187:41554/jars/org.slf4j_slf4j-api-1.7.16.jar with timestamp
1503391817925
17/08/22 14:20:28 INFO TransportClientFactory: Successfully created
connection to /192.168.25.187:41554 after 95 ms (0 ms spent in bootstraps)
17/08/22 14:20:28 INFO Utils: Fetching spark://
192.168.25.187:41554/jars/org.slf4j_slf4j-api-1.7.16.jar to
/tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c82f1b/fetchFileTemp9222131133573862634.tmp
17/08/22 14:20:28 INFO Utils:
/tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c82f1b/fetchFileTemp9222131133573862634.tmp
has been previously copied to
/tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c82f1b/org.slf4j_slf4j-api-1.7.16.jar
17/08/22 14:20:28 INFO Executor: Adding
file:/tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c82f1b/org.slf4j_slf4j-api-1.7.16.jar
to class loader
17/08/22 14:20:28 INFO Executor: Fetching spark://
192.168.25.187:41554/jars/org.apache.spark_spark-sql-kafka-0-10_2.11-2.2.0.jar
with timestamp 1503391817919
17/08/22 14:20:28 INFO Utils: Fetching spark://
192.168.25.187:41554/jars/org.apache.spark_spark-sql-kafka-0-10_2.11-2.2.0.jar
to
/tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c82f1b/fetchFileTemp3539601253503251859.tmp
17/08/22 14:20:28 INFO Utils:
/tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c82f1b/fetchFileTemp3539601253503251859.tmp
has been previously copied to
/tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c82f1b/org.apache.spark_spark-sql-kafka-0-10_2.11-2.2.0.jar
17/08/22 14:20:28 INFO Executor: Adding
file:/tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c82f1b/org.apache.spark_spark-sql-kafka-0-10_2.11-2.2.0.jar
to class loader
17/08/22 14:20:28 INFO Executor: Fetching spark://
192.168.25.187:41554/jars/org.apache.kafka_kafka-clients-0.10.0.1.jar with
timestamp 1503391817924
17/08/22 14:20:28 INFO Utils: Fetching spark://
192.168.25.187:41554/jars/org.apache.kafka_kafka-clients-0.10.0.1.jar to
/tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c82f1b/fetchFileTemp6110236546923979105.tmp
17/08/22 14:20:28 INFO Utils:
/tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c82f1b/fetchFileTemp6110236546923979105.tmp
has been previously copied to
/tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c82f1b/org.apache.kafka_kafka-clients-0.10.0.1.jar
17/08/22 14:20:28 INFO Executor: Adding
file:/tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c82f1b/org.apache.kafka_kafka-clients-0.10.0.1.jar
to class loader
17/08/22 14:20:28 INFO Executor: Fetching spark://
192.168.25.187:41554/jars/net.jpountz.lz4_lz4-1.3.0.jar with timestamp
1503391817924
17/08/22 14:20:28 INFO Utils: Fetching spark://
192.168.25.187:41554/jars/net.jpountz.lz4_lz4-1.3.0.jar to
/tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c82f1b/fetchFileTemp7615793935390489572.tmp
17/08/22 14:20:28 INFO Utils:
/tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c82f1b/fetchFileTemp7615793935390489572.tmp
has been previously copied to
/tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c82f1b/net.jpountz.lz4_lz4-1.3.0.jar
17/08/22 14:20:28 INFO Executor: Adding
file:/tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c82f1b/net.jpountz.lz4_lz4-1.3.0.jar
to class loader
17/08/22 14:20:28 INFO Executor: Fetching spark://
192.168.25.187:41554/jars/org.xerial.snappy_snappy-java-1.1.2.6.jar with
timestamp 1503391817924
17/08/22 14:20:28 INFO Utils: Fetching spark://
192.168.25.187:41554/jars/org.xerial.snappy_snappy-java-1.1.2.6.jar to
/tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c82f1b/fetchFileTemp4037226777938816116.tmp
17/08/22 14:20:28 INFO Utils:
/tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c82f1b/fetchFileTemp4037226777938816116.tmp
has been previously copied to
/tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c82f1b/org.xerial.snappy_snappy-java-1.1.2.6.jar
17/08/22 14:20:28 INFO Executor: Adding
file:/tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c82f1b/org.xerial.snappy_snappy-java-1.1.2.6.jar
to class loader
17/08/22 14:20:28 INFO Executor: Fetching spark://
192.168.25.187:41554/jars/org.spark-project.spark_unused-1.0.0.jar with
timestamp 1503391817924
17/08/22 14:20:28 INFO Utils: Fetching spark://
192.168.25.187:41554/jars/org.spark-project.spark_unused-1.0.0.jar to
/tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c82f1b/fetchFileTemp5870103019955438186.tmp
17/08/22 14:20:28 INFO Utils:
/tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c82f1b/fetchFileTemp5870103019955438186.tmp
has been previously copied to
/tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c82f1b/org.spark-project.spark_unused-1.0.0.jar
17/08/22 14:20:28 INFO Executor: Adding
file:/tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c82f1b/org.spark-project.spark_unused-1.0.0.jar
to class loader
17/08/22 14:20:28 INFO Executor: Fetching spark://
192.168.25.187:41554/jars/org.apache.spark_spark-tags_2.11-2.2.0.jar with
timestamp 1503391817924
17/08/22 14:20:28 INFO Utils: Fetching spark://
192.168.25.187:41554/jars/org.apache.spark_spark-tags_2.11-2.2.0.jar to
/tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c82f1b/fetchFileTemp344232108189163563.tmp
17/08/22 14:20:28 INFO Utils:
/tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c82f1b/fetchFileTemp344232108189163563.tmp
has been previously copied to
/tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c82f1b/org.apache.spark_spark-tags_2.11-2.2.0.jar
17/08/22 14:20:28 INFO Executor: Adding
file:/tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/userFiles-33d4eaeb-d577-4250-a223-d7ea20c82f1b/org.apache.spark_spark-tags_2.11-2.2.0.jar
to class loader
17/08/22 14:20:29 INFO CodeGenerator: Code generated in 32.736524 ms
17/08/22 14:20:29 INFO CodeGenerator: Code generated in 25.33048 ms
17/08/22 14:20:31 INFO PythonRunner: Times: total = 436, boot = 401, init =
33, finish = 2
17/08/22 14:20:31 INFO PythonRunner: Times: total = 439, boot = 406, init =
32, finish = 1
17/08/22 14:20:31 INFO MemoryStore: Block taskresult_1 stored as bytes in
memory (estimated size 2.0 MB, free 364.3 MB)
17/08/22 14:20:31 INFO BlockManagerInfo: Added taskresult_1 in memory on
192.168.25.187:46647 (size: 2.0 MB, free: 364.3 MB)
17/08/22 14:20:31 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1).
2109385 bytes result sent via BlockManager)
17/08/22 14:20:31 INFO MemoryStore: Block taskresult_0 stored as bytes in
memory (estimated size 2.0 MB, free 362.2 MB)
17/08/22 14:20:31 INFO BlockManagerInfo: Added taskresult_0 in memory on
192.168.25.187:46647 (size: 2.0 MB, free: 362.3 MB)
17/08/22 14:20:31 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0).
2109385 bytes result sent via BlockManager)
17/08/22 14:20:31 INFO TransportClientFactory: Successfully created
connection to /192.168.25.187:46647 after 2 ms (0 ms spent in bootstraps)
17/08/22 14:20:31 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID
0) in 3689 ms on localhost (executor driver) (1/2)
17/08/22 14:20:31 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID
1) in 3682 ms on localhost (executor driver) (2/2)
17/08/22 14:20:31 INFO BlockManagerInfo: Removed taskresult_0 on
192.168.25.187:46647 in memory (size: 2.0 MB, free: 364.3 MB)
17/08/22 14:20:31 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks
have all completed, from pool
17/08/22 14:20:31 INFO BlockManagerInfo: Removed taskresult_1 on
192.168.25.187:46647 in memory (size: 2.0 MB, free: 366.3 MB)
17/08/22 14:20:31 INFO DAGScheduler: ResultStage 0 (treeAggregate at
IDF.scala:54) finished in 3.776 s
17/08/22 14:20:31 INFO DAGScheduler: Job 0 finished: treeAggregate at
IDF.scala:54, took 4.418658 s
17/08/22 14:20:32 INFO CodeGenerator: Code generated in 173.848512 ms
17/08/22 14:20:32 INFO CodeGenerator: Code generated in 84.504745 ms
17/08/22 14:20:32 INFO Instrumentation:
LogisticRegression-LogisticRegression_4647a03eff9e40a729dc-1694037276-1:
training: numPartitions=2 storageLevel=StorageLevel(disk, memory,
deserialized, 1 replicas)
17/08/22 14:20:32 INFO Instrumentation:
LogisticRegression-LogisticRegression_4647a03eff9e40a729dc-1694037276-1:
{"elasticNetParam":0.0,"fitIntercept":true,"standardization":true,"threshold":0.5,"regParam":0.001,"tol":1.0E-6,"maxIter":10}
17/08/22 14:20:32 INFO SparkContext: Starting job: treeAggregate at
LogisticRegression.scala:517
17/08/22 14:20:32 INFO DAGScheduler: Got job 1 (treeAggregate at
LogisticRegression.scala:517) with 2 output partitions
17/08/22 14:20:32 INFO DAGScheduler: Final stage: ResultStage 1
(treeAggregate at LogisticRegression.scala:517)
17/08/22 14:20:32 INFO DAGScheduler: Parents of final stage: List()
17/08/22 14:20:32 INFO DAGScheduler: Missing parents: List()
17/08/22 14:20:32 INFO DAGScheduler: Submitting ResultStage 1
(MapPartitionsRDD[20] at treeAggregate at LogisticRegression.scala:517),
which has no missing parents
17/08/22 14:20:32 INFO MemoryStore: Block broadcast_1 stored as values in
memory (estimated size 2.0 MB, free 364.2 MB)
17/08/22 14:20:32 INFO MemoryStore: Block broadcast_1_piece0 stored as
bytes in memory (estimated size 24.3 KB, free 364.2 MB)
17/08/22 14:20:32 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory
on 192.168.25.187:46647 (size: 24.3 KB, free: 366.3 MB)
17/08/22 14:20:33 INFO SparkContext: Created broadcast 1 from broadcast at
DAGScheduler.scala:1006
17/08/22 14:20:33 INFO DAGScheduler: Submitting 2 missing tasks from
ResultStage 1 (MapPartitionsRDD[20] at treeAggregate at
LogisticRegression.scala:517) (first 15 tasks are for partitions Vector(0,
1))
17/08/22 14:20:33 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
17/08/22 14:20:33 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID
2, localhost, executor driver, partition 0, PROCESS_LOCAL, 4883 bytes)
17/08/22 14:20:33 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID
3, localhost, executor driver, partition 1, PROCESS_LOCAL, 4892 bytes)
17/08/22 14:20:33 INFO Executor: Running task 0.0 in stage 1.0 (TID 2)
17/08/22 14:20:33 INFO Executor: Running task 1.0 in stage 1.0 (TID 3)
17/08/22 14:20:33 INFO CodeGenerator: Code generated in 42.221867 ms
17/08/22 14:20:33 INFO PythonRunner: Times: total = 53, boot = -3759, init
= 3812, finish = 0
17/08/22 14:20:33 INFO MemoryStore: Block rdd_19_1 stored as values in
memory (estimated size 272.0 B, free 364.2 MB)
17/08/22 14:20:33 INFO PythonRunner: Times: total = 47, boot = -3782, init
= 3829, finish = 0
17/08/22 14:20:33 INFO MemoryStore: Block rdd_19_0 stored as values in
memory (estimated size 288.0 B, free 364.2 MB)
17/08/22 14:20:33 INFO BlockManagerInfo: Added rdd_19_1 in memory on
192.168.25.187:46647 (size: 272.0 B, free: 366.3 MB)
17/08/22 14:20:33 INFO BlockManagerInfo: Added rdd_19_0 in memory on
192.168.25.187:46647 (size: 288.0 B, free: 366.3 MB)
17/08/22 14:20:33 INFO ContextCleaner: Cleaned accumulator 1
17/08/22 14:20:33 INFO ContextCleaner: Cleaned accumulator 2
17/08/22 14:20:33 INFO BlockManagerInfo: Removed broadcast_0_piece0 on
192.168.25.187:46647 in memory (size: 11.5 KB, free: 366.3 MB)
17/08/22 14:20:33 INFO MemoryStore: Block taskresult_2 stored as bytes in
memory (estimated size 16.1 MB, free 348.2 MB)
17/08/22 14:20:33 INFO BlockManagerInfo: Added taskresult_2 in memory on
192.168.25.187:46647 (size: 16.1 MB, free: 350.2 MB)
17/08/22 14:20:33 INFO Executor: Finished task 0.0 in stage 1.0 (TID 2).
16862200 bytes result sent via BlockManager)
17/08/22 14:20:33 INFO MemoryStore: Block taskresult_3 stored as bytes in
memory (estimated size 16.1 MB, free 332.1 MB)
17/08/22 14:20:33 INFO BlockManagerInfo: Added taskresult_3 in memory on
192.168.25.187:46647 (size: 16.1 MB, free: 334.1 MB)
17/08/22 14:20:33 INFO Executor: Finished task 1.0 in stage 1.0 (TID 3).
16862200 bytes result sent via BlockManager)
17/08/22 14:20:34 INFO BlockManagerInfo: Removed taskresult_2 on
192.168.25.187:46647 in memory (size: 16.1 MB, free: 350.2 MB)
17/08/22 14:20:34 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID
2) in 1056 ms on localhost (executor driver) (1/2)
17/08/22 14:20:34 INFO BlockManagerInfo: Removed taskresult_3 on
192.168.25.187:46647 in memory (size: 16.1 MB, free: 366.3 MB)
17/08/22 14:20:34 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID
3) in 1094 ms on localhost (executor driver) (2/2)
17/08/22 14:20:34 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks
have all completed, from pool
17/08/22 14:20:34 INFO DAGScheduler: ResultStage 1 (treeAggregate at
LogisticRegression.scala:517) finished in 1.100 s
17/08/22 14:20:34 INFO DAGScheduler: Job 1 finished: treeAggregate at
LogisticRegression.scala:517, took 1.217516 s
17/08/22 14:20:34 INFO Instrumentation:
LogisticRegression-LogisticRegression_4647a03eff9e40a729dc-1694037276-1:
{"numClasses":2}
17/08/22 14:20:34 INFO Instrumentation:
LogisticRegression-LogisticRegression_4647a03eff9e40a729dc-1694037276-1:
{"numFeatures":262144}
17/08/22 14:20:34 INFO MemoryStore: Block broadcast_2 stored as values in
memory (estimated size 2.0 MB, free 362.2 MB)
17/08/22 14:20:34 INFO MemoryStore: Block broadcast_2_piece0 stored as
bytes in memory (estimated size 10.2 KB, free 362.2 MB)
17/08/22 14:20:34 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory
on 192.168.25.187:46647 (size: 10.2 KB, free: 366.3 MB)
17/08/22 14:20:34 INFO SparkContext: Created broadcast 2 from broadcast at
LogisticRegression.scala:600
17/08/22 14:20:34 INFO MemoryStore: Block broadcast_3 stored as values in
memory (estimated size 2.0 MB, free 360.2 MB)
17/08/22 14:20:34 INFO MemoryStore: Block broadcast_3_piece0 stored as
bytes in memory (estimated size 10.1 KB, free 360.2 MB)
17/08/22 14:20:34 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory
on 192.168.25.187:46647 (size: 10.1 KB, free: 366.3 MB)
17/08/22 14:20:34 INFO SparkContext: Created broadcast 3 from broadcast at
LogisticRegression.scala:1879
17/08/22 14:20:34 INFO SparkContext: Starting job: treeAggregate at
LogisticRegression.scala:1892
17/08/22 14:20:34 INFO DAGScheduler: Got job 2 (treeAggregate at
LogisticRegression.scala:1892) with 2 output partitions
17/08/22 14:20:34 INFO DAGScheduler: Final stage: ResultStage 2
(treeAggregate at LogisticRegression.scala:1892)
17/08/22 14:20:34 INFO DAGScheduler: Parents of final stage: List()
17/08/22 14:20:34 INFO DAGScheduler: Missing parents: List()
17/08/22 14:20:34 INFO DAGScheduler: Submitting ResultStage 2
(MapPartitionsRDD[21] at treeAggregate at LogisticRegression.scala:1892),
which has no missing parents
17/08/22 14:20:34 INFO MemoryStore: Block broadcast_4 stored as values in
memory (estimated size 2.0 MB, free 358.2 MB)
17/08/22 14:20:34 INFO MemoryStore: Block broadcast_4_piece0 stored as
bytes in memory (estimated size 24.6 KB, free 358.2 MB)
17/08/22 14:20:34 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory
on 192.168.25.187:46647 (size: 24.6 KB, free: 366.2 MB)
17/08/22 14:20:34 INFO SparkContext: Created broadcast 4 from broadcast at
DAGScheduler.scala:1006
17/08/22 14:20:34 INFO DAGScheduler: Submitting 2 missing tasks from
ResultStage 2 (MapPartitionsRDD[21] at treeAggregate at
LogisticRegression.scala:1892) (first 15 tasks are for partitions Vector(0,
1))
17/08/22 14:20:34 INFO TaskSchedulerImpl: Adding task set 2.0 with 2 tasks
17/08/22 14:20:34 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID
4, localhost, executor driver, partition 0, PROCESS_LOCAL, 4883 bytes)
17/08/22 14:20:34 INFO TaskSetManager: Starting task 1.0 in stage 2.0 (TID
5, localhost, executor driver, partition 1, PROCESS_LOCAL, 4892 bytes)
17/08/22 14:20:34 INFO Executor: Running task 0.0 in stage 2.0 (TID 4)
17/08/22 14:20:34 INFO Executor: Running task 1.0 in stage 2.0 (TID 5)
17/08/22 14:20:34 INFO BlockManager: Found block rdd_19_0 locally
17/08/22 14:20:34 INFO BlockManager: Found block rdd_19_1 locally
17/08/22 14:20:34 INFO MemoryStore: Block taskresult_5 stored as bytes in
memory (estimated size 2.0 MB, free 356.2 MB)
17/08/22 14:20:34 INFO BlockManagerInfo: Added taskresult_5 in memory on
192.168.25.187:46647 (size: 2.0 MB, free: 364.2 MB)
17/08/22 14:20:34 INFO MemoryStore: Block taskresult_4 stored as bytes in
memory (estimated size 2.0 MB, free 354.1 MB)
17/08/22 14:20:34 INFO Executor: Finished task 1.0 in stage 2.0 (TID 5).
2110568 bytes result sent via BlockManager)
17/08/22 14:20:34 INFO BlockManagerInfo: Added taskresult_4 in memory on
192.168.25.187:46647 (size: 2.0 MB, free: 362.2 MB)
17/08/22 14:20:34 INFO Executor: Finished task 0.0 in stage 2.0 (TID 4).
2110568 bytes result sent via BlockManager)
17/08/22 14:20:34 INFO BlockManagerInfo: Removed taskresult_5 on
192.168.25.187:46647 in memory (size: 2.0 MB, free: 364.2 MB)
17/08/22 14:20:34 INFO TaskSetManager: Finished task 1.0 in stage 2.0 (TID
5) in 124 ms on localhost (executor driver) (1/2)
17/08/22 14:20:34 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID
4) in 147 ms on localhost (executor driver) (2/2)
17/08/22 14:20:34 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks
have all completed, from pool
17/08/22 14:20:34 INFO BlockManagerInfo: Removed taskresult_4 on
192.168.25.187:46647 in memory (size: 2.0 MB, free: 366.2 MB)
17/08/22 14:20:34 INFO DAGScheduler: ResultStage 2 (treeAggregate at
LogisticRegression.scala:1892) finished in 0.151 s
17/08/22 14:20:34 INFO DAGScheduler: Job 2 finished: treeAggregate at
LogisticRegression.scala:1892, took 0.210300 s
17/08/22 14:20:34 WARN BLAS: Failed to load implementation from:
com.github.fommil.netlib.NativeSystemBLAS
17/08/22 14:20:34 WARN BLAS: Failed to load implementation from:
com.github.fommil.netlib.NativeRefBLAS
17/08/22 14:20:34 INFO TorrentBroadcast: Destroying Broadcast(3) (from
destroy at LogisticRegression.scala:1933)
17/08/22 14:20:34 INFO BlockManagerInfo: Removed broadcast_3_piece0 on
192.168.25.187:46647 in memory (size: 10.1 KB, free: 366.2 MB)
17/08/22 14:20:34 INFO MemoryStore: Block broadcast_5 stored as values in
memory (estimated size 2.0 MB, free 358.2 MB)
17/08/22 14:20:34 INFO MemoryStore: Block broadcast_5_piece0 stored as
bytes in memory (estimated size 10.2 KB, free 358.2 MB)
17/08/22 14:20:34 INFO BlockManagerInfo: Added broadcast_5_piece0 in memory
on 192.168.25.187:46647 (size: 10.2 KB, free: 366.2 MB)
17/08/22 14:20:34 INFO SparkContext: Created broadcast 5 from broadcast at
LogisticRegression.scala:1879
17/08/22 14:20:34 INFO SparkContext: Starting job: treeAggregate at
LogisticRegression.scala:1892
17/08/22 14:20:34 INFO DAGScheduler: Got job 3 (treeAggregate at
LogisticRegression.scala:1892) with 2 output partitions
17/08/22 14:20:34 INFO DAGScheduler: Final stage: ResultStage 3
(treeAggregate at LogisticRegression.scala:1892)
17/08/22 14:20:34 INFO DAGScheduler: Parents of final stage: List()
17/08/22 14:20:34 INFO DAGScheduler: Missing parents: List()
17/08/22 14:20:34 INFO DAGScheduler: Submitting ResultStage 3
(MapPartitionsRDD[22] at treeAggregate at LogisticRegression.scala:1892),
which has no missing parents
17/08/22 14:20:34 INFO MemoryStore: Block broadcast_6 stored as values in
memory (estimated size 2.0 MB, free 356.1 MB)
17/08/22 14:20:34 INFO MemoryStore: Block broadcast_6_piece0 stored as
bytes in memory (estimated size 24.6 KB, free 356.1 MB)
17/08/22 14:20:35 INFO BlockManagerInfo: Added broadcast_6_piece0 in memory
on 192.168.25.187:46647 (size: 24.6 KB, free: 366.2 MB)
17/08/22 14:20:35 INFO SparkContext: Created broadcast 6 from broadcast at
DAGScheduler.scala:1006
17/08/22 14:20:35 INFO DAGScheduler: Submitting 2 missing tasks from
ResultStage 3 (MapPartitionsRDD[22] at treeAggregate at
LogisticRegression.scala:1892) (first 15 tasks are for partitions Vector(0,
1))
17/08/22 14:20:35 INFO TaskSchedulerImpl: Adding task set 3.0 with 2 tasks
17/08/22 14:20:35 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID
6, localhost, executor driver, partition 0, PROCESS_LOCAL, 4883 bytes)
17/08/22 14:20:35 INFO TaskSetManager: Starting task 1.0 in stage 3.0 (TID
7, localhost, executor driver, partition 1, PROCESS_LOCAL, 4892 bytes)
17/08/22 14:20:35 INFO Executor: Running task 0.0 in stage 3.0 (TID 6)
17/08/22 14:20:35 INFO Executor: Running task 1.0 in stage 3.0 (TID 7)
17/08/22 14:20:35 INFO BlockManager: Found block rdd_19_0 locally
17/08/22 14:20:35 INFO BlockManager: Found block rdd_19_1 locally
17/08/22 14:20:35 INFO MemoryStore: Block taskresult_6 stored as bytes in
memory (estimated size 2.0 MB, free 354.1 MB)
17/08/22 14:20:35 INFO BlockManagerInfo: Added taskresult_6 in memory on
192.168.25.187:46647 (size: 2.0 MB, free: 364.2 MB)
17/08/22 14:20:35 INFO MemoryStore: Block taskresult_7 stored as bytes in
memory (estimated size 2.0 MB, free 352.1 MB)
17/08/22 14:20:35 INFO BlockManagerInfo: Added taskresult_7 in memory on
192.168.25.187:46647 (size: 2.0 MB, free: 362.2 MB)
17/08/22 14:20:35 INFO Executor: Finished task 0.0 in stage 3.0 (TID 6).
2110568 bytes result sent via BlockManager)
17/08/22 14:20:35 INFO Executor: Finished task 1.0 in stage 3.0 (TID 7).
2110568 bytes result sent via BlockManager)
17/08/22 14:20:35 INFO BlockManagerInfo: Removed taskresult_7 on
192.168.25.187:46647 in memory (size: 2.0 MB, free: 364.2 MB)
17/08/22 14:20:35 INFO TaskSetManager: Finished task 1.0 in stage 3.0 (TID
7) in 177 ms on localhost (executor driver) (1/2)
17/08/22 14:20:35 INFO BlockManagerInfo: Removed broadcast_4_piece0 on
192.168.25.187:46647 in memory (size: 24.6 KB, free: 364.2 MB)
17/08/22 14:20:35 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID
6) in 223 ms on localhost (executor driver) (2/2)
17/08/22 14:20:35 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks
have all completed, from pool
17/08/22 14:20:35 INFO DAGScheduler: ResultStage 3 (treeAggregate at
LogisticRegression.scala:1892) finished in 0.225 s
17/08/22 14:20:35 INFO DAGScheduler: Job 3 finished: treeAggregate at
LogisticRegression.scala:1892, took 0.255439 s
17/08/22 14:20:35 INFO BlockManagerInfo: Removed taskresult_6 on
192.168.25.187:46647 in memory (size: 2.0 MB, free: 366.2 MB)
17/08/22 14:20:35 INFO TorrentBroadcast: Destroying Broadcast(5) (from
destroy at LogisticRegression.scala:1933)
17/08/22 14:20:35 INFO BlockManagerInfo: Removed broadcast_5_piece0 on
192.168.25.187:46647 in memory (size: 10.2 KB, free: 366.2 MB)
17/08/22 14:20:35 INFO LBFGS: Step Size: 1.000
17/08/22 14:20:35 INFO LBFGS: Val and Grad Norm: 0.317022 (rel: 0.543)
0.358320
17/08/22 14:20:35 INFO MemoryStore: Block broadcast_7 stored as values in
memory (estimated size 2.0 MB, free 358.2 MB)
17/08/22 14:20:35 INFO MemoryStore: Block broadcast_7_piece0 stored as
bytes in memory (estimated size 10.3 KB, free 358.2 MB)
17/08/22 14:20:35 INFO BlockManagerInfo: Added broadcast_7_piece0 in memory
on 192.168.25.187:46647 (size: 10.3 KB, free: 366.2 MB)
17/08/22 14:20:35 INFO SparkContext: Created broadcast 7 from broadcast at
LogisticRegression.scala:1879
17/08/22 14:20:35 INFO SparkContext: Starting job: treeAggregate at
LogisticRegression.scala:1892
17/08/22 14:20:35 INFO DAGScheduler: Got job 4 (treeAggregate at
LogisticRegression.scala:1892) with 2 output partitions
17/08/22 14:20:35 INFO DAGScheduler: Final stage: ResultStage 4
(treeAggregate at LogisticRegression.scala:1892)
17/08/22 14:20:35 INFO DAGScheduler: Parents of final stage: List()
17/08/22 14:20:35 INFO DAGScheduler: Missing parents: List()
17/08/22 14:20:35 INFO DAGScheduler: Submitting ResultStage 4
(MapPartitionsRDD[23] at treeAggregate at LogisticRegression.scala:1892),
which has no missing parents
17/08/22 14:20:35 INFO MemoryStore: Block broadcast_8 stored as values in
memory (estimated size 2.0 MB, free 356.1 MB)
17/08/22 14:20:35 INFO MemoryStore: Block broadcast_8_piece0 stored as
bytes in memory (estimated size 24.6 KB, free 356.1 MB)
17/08/22 14:20:35 INFO BlockManagerInfo: Added broadcast_8_piece0 in memory
on 192.168.25.187:46647 (size: 24.6 KB, free: 366.2 MB)
17/08/22 14:20:35 INFO SparkContext: Created broadcast 8 from broadcast at
DAGScheduler.scala:1006
17/08/22 14:20:35 INFO DAGScheduler: Submitting 2 missing tasks from
ResultStage 4 (MapPartitionsRDD[23] at treeAggregate at
LogisticRegression.scala:1892) (first 15 tasks are for partitions Vector(0,
1))
17/08/22 14:20:35 INFO TaskSchedulerImpl: Adding task set 4.0 with 2 tasks
17/08/22 14:20:35 INFO TaskSetManager: Starting task 0.0 in stage 4.0 (TID
8, localhost, executor driver, partition 0, PROCESS_LOCAL, 4883 bytes)
17/08/22 14:20:35 INFO TaskSetManager: Starting task 1.0 in stage 4.0 (TID
9, localhost, executor driver, partition 1, PROCESS_LOCAL, 4892 bytes)
17/08/22 14:20:35 INFO Executor: Running task 0.0 in stage 4.0 (TID 8)
17/08/22 14:20:35 INFO Executor: Running task 1.0 in stage 4.0 (TID 9)
17/08/22 14:20:35 INFO BlockManager: Found block rdd_19_0 locally
17/08/22 14:20:35 INFO BlockManager: Found block rdd_19_1 locally
17/08/22 14:20:35 INFO MemoryStore: Block taskresult_9 stored as bytes in
memory (estimated size 2.0 MB, free 354.1 MB)
17/08/22 14:20:35 INFO MemoryStore: Block taskresult_8 stored as bytes in
memory (estimated size 2.0 MB, free 352.1 MB)
17/08/22 14:20:35 INFO BlockManagerInfo: Added taskresult_9 in memory on
192.168.25.187:46647 (size: 2.0 MB, free: 364.2 MB)
17/08/22 14:20:35 INFO Executor: Finished task 1.0 in stage 4.0 (TID 9).
2110568 bytes result sent via BlockManager)
17/08/22 14:20:35 INFO BlockManagerInfo: Removed taskresult_9 on
192.168.25.187:46647 in memory (size: 2.0 MB, free: 366.2 MB)
17/08/22 14:20:35 INFO TaskSetManager: Finished task 1.0 in stage 4.0 (TID
9) in 103 ms on localhost (executor driver) (1/2)
17/08/22 14:20:35 INFO BlockManagerInfo: Added taskresult_8 in memory on
192.168.25.187:46647 (size: 2.0 MB, free: 364.2 MB)
17/08/22 14:20:35 INFO Executor: Finished task 0.0 in stage 4.0 (TID 8).
2110568 bytes result sent via BlockManager)
17/08/22 14:20:35 INFO BlockManagerInfo: Removed taskresult_8 on
192.168.25.187:46647 in memory (size: 2.0 MB, free: 366.2 MB)
17/08/22 14:20:35 INFO TaskSetManager: Finished task 0.0 in stage 4.0 (TID
8) in 136 ms on localhost (executor driver) (2/2)
17/08/22 14:20:35 INFO TaskSchedulerImpl: Removed TaskSet 4.0, whose tasks
have all completed, from pool
17/08/22 14:20:35 INFO DAGScheduler: ResultStage 4 (treeAggregate at
LogisticRegression.scala:1892) finished in 0.140 s
17/08/22 14:20:35 INFO DAGScheduler: Job 4 finished: treeAggregate at
LogisticRegression.scala:1892, took 0.182699 s
17/08/22 14:20:35 INFO TorrentBroadcast: Destroying Broadcast(7) (from
destroy at LogisticRegression.scala:1933)
17/08/22 14:20:35 INFO BlockManagerInfo: Removed broadcast_7_piece0 on
192.168.25.187:46647 in memory (size: 10.3 KB, free: 366.2 MB)
17/08/22 14:20:35 INFO LBFGS: Step Size: 1.000
17/08/22 14:20:35 INFO LBFGS: Val and Grad Norm: 0.145089 (rel: 0.542)
0.164387
17/08/22 14:20:35 INFO MemoryStore: Block broadcast_9 stored as values in
memory (estimated size 2.0 MB, free 356.1 MB)
17/08/22 14:20:35 INFO MemoryStore: Block broadcast_9_piece0 stored as
bytes in memory (estimated size 10.3 KB, free 356.1 MB)
17/08/22 14:20:35 INFO BlockManagerInfo: Added broadcast_9_piece0 in memory
on 192.168.25.187:46647 (size: 10.3 KB, free: 366.2 MB)
17/08/22 14:20:35 INFO SparkContext: Created broadcast 9 from broadcast at
LogisticRegression.scala:1879
17/08/22 14:20:35 INFO SparkContext: Starting job: treeAggregate at
LogisticRegression.scala:1892
17/08/22 14:20:35 INFO DAGScheduler: Got job 5 (treeAggregate at
LogisticRegression.scala:1892) with 2 output partitions
17/08/22 14:20:35 INFO DAGScheduler: Final stage: ResultStage 5
(treeAggregate at LogisticRegression.scala:1892)
17/08/22 14:20:35 INFO DAGScheduler: Parents of final stage: List()
17/08/22 14:20:35 INFO DAGScheduler: Missing parents: List()
17/08/22 14:20:35 INFO DAGScheduler: Submitting ResultStage 5
(MapPartitionsRDD[24] at treeAggregate at LogisticRegression.scala:1892),
which has no missing parents
17/08/22 14:20:35 INFO MemoryStore: Block broadcast_10 stored as values in
memory (estimated size 2.0 MB, free 354.1 MB)
17/08/22 14:20:35 INFO MemoryStore: Block broadcast_10_piece0 stored as
bytes in memory (estimated size 24.6 KB, free 354.1 MB)
17/08/22 14:20:35 INFO BlockManagerInfo: Added broadcast_10_piece0 in
memory on 192.168.25.187:46647 (size: 24.6 KB, free: 366.2 MB)
17/08/22 14:20:35 INFO SparkContext: Created broadcast 10 from broadcast at
DAGScheduler.scala:1006
17/08/22 14:20:35 INFO DAGScheduler: Submitting 2 missing tasks from
ResultStage 5 (MapPartitionsRDD[24] at treeAggregate at
LogisticRegression.scala:1892) (first 15 tasks are for partitions Vector(0,
1))
17/08/22 14:20:35 INFO TaskSchedulerImpl: Adding task set 5.0 with 2 tasks
17/08/22 14:20:35 INFO TaskSetManager: Starting task 0.0 in stage 5.0 (TID
10, localhost, executor driver, partition 0, PROCESS_LOCAL, 4883 bytes)
17/08/22 14:20:35 INFO TaskSetManager: Starting task 1.0 in stage 5.0 (TID
11, localhost, executor driver, partition 1, PROCESS_LOCAL, 4892 bytes)
17/08/22 14:20:35 INFO Executor: Running task 0.0 in stage 5.0 (TID 10)
17/08/22 14:20:35 INFO BlockManager: Found block rdd_19_0 locally
17/08/22 14:20:35 INFO Executor: Running task 1.0 in stage 5.0 (TID 11)
17/08/22 14:20:35 INFO BlockManager: Found block rdd_19_1 locally
17/08/22 14:20:35 INFO MemoryStore: Block taskresult_10 stored as bytes in
memory (estimated size 2.0 MB, free 352.0 MB)
17/08/22 14:20:35 INFO MemoryStore: Block taskresult_11 stored as bytes in
memory (estimated size 2.0 MB, free 350.0 MB)
17/08/22 14:20:35 INFO BlockManagerInfo: Added taskresult_10 in memory on
192.168.25.187:46647 (size: 2.0 MB, free: 364.2 MB)
17/08/22 14:20:35 INFO BlockManagerInfo: Added taskresult_11 in memory on
192.168.25.187:46647 (size: 2.0 MB, free: 362.2 MB)
17/08/22 14:20:35 INFO Executor: Finished task 1.0 in stage 5.0 (TID 11).
2110611 bytes result sent via BlockManager)
17/08/22 14:20:35 INFO BlockManagerInfo: Removed broadcast_6_piece0 on
192.168.25.187:46647 in memory (size: 24.6 KB, free: 362.2 MB)
17/08/22 14:20:35 INFO Executor: Finished task 0.0 in stage 5.0 (TID 10).
2110611 bytes result sent via BlockManager)
17/08/22 14:20:36 INFO TaskSetManager: Finished task 1.0 in stage 5.0 (TID
11) in 197 ms on localhost (executor driver) (1/2)
17/08/22 14:20:36 INFO BlockManagerInfo: Removed taskresult_11 on
192.168.25.187:46647 in memory (size: 2.0 MB, free: 364.2 MB)
17/08/22 14:20:36 INFO TaskSetManager: Finished task 0.0 in stage 5.0 (TID
10) in 217 ms on localhost (executor driver) (2/2)
17/08/22 14:20:36 INFO TaskSchedulerImpl: Removed TaskSet 5.0, whose tasks
have all completed, from pool
17/08/22 14:20:36 INFO BlockManagerInfo: Removed taskresult_10 on
192.168.25.187:46647 in memory (size: 2.0 MB, free: 366.2 MB)
17/08/22 14:20:36 INFO DAGScheduler: ResultStage 5 (treeAggregate at
LogisticRegression.scala:1892) finished in 0.236 s
17/08/22 14:20:36 INFO BlockManagerInfo: Removed broadcast_8_piece0 on
192.168.25.187:46647 in memory (size: 24.6 KB, free: 366.2 MB)
17/08/22 14:20:36 INFO DAGScheduler: Job 5 finished: treeAggregate at
LogisticRegression.scala:1892, took 0.282317 s
17/08/22 14:20:36 INFO TorrentBroadcast: Destroying Broadcast(9) (from
destroy at LogisticRegression.scala:1933)
17/08/22 14:20:36 INFO LBFGS: Step Size: 1.000
17/08/22 14:20:36 INFO LBFGS: Val and Grad Norm: 0.0635321 (rel: 0.562)
0.0740437
17/08/22 14:20:36 INFO BlockManagerInfo: Removed broadcast_9_piece0 on
192.168.25.187:46647 in memory (size: 10.3 KB, free: 366.2 MB)
17/08/22 14:20:36 INFO MemoryStore: Block broadcast_11 stored as values in
memory (estimated size 2.0 MB, free 358.2 MB)
17/08/22 14:20:36 INFO MemoryStore: Block broadcast_11_piece0 stored as
bytes in memory (estimated size 10.3 KB, free 358.2 MB)
17/08/22 14:20:36 INFO BlockManagerInfo: Added broadcast_11_piece0 in
memory on 192.168.25.187:46647 (size: 10.3 KB, free: 366.2 MB)
17/08/22 14:20:36 INFO SparkContext: Created broadcast 11 from broadcast at
LogisticRegression.scala:1879
17/08/22 14:20:36 INFO SparkContext: Starting job: treeAggregate at
LogisticRegression.scala:1892
17/08/22 14:20:36 INFO DAGScheduler: Got job 6 (treeAggregate at
LogisticRegression.scala:1892) with 2 output partitions
17/08/22 14:20:36 INFO DAGScheduler: Final stage: ResultStage 6
(treeAggregate at LogisticRegression.scala:1892)
17/08/22 14:20:36 INFO DAGScheduler: Parents of final stage: List()
17/08/22 14:20:36 INFO DAGScheduler: Missing parents: List()
17/08/22 14:20:36 INFO DAGScheduler: Submitting ResultStage 6
(MapPartitionsRDD[25] at treeAggregate at LogisticRegression.scala:1892),
which has no missing parents
17/08/22 14:20:36 INFO MemoryStore: Block broadcast_12 stored as values in
memory (estimated size 2.0 MB, free 356.1 MB)
17/08/22 14:20:36 INFO MemoryStore: Block broadcast_12_piece0 stored as
bytes in memory (estimated size 24.6 KB, free 356.1 MB)
17/08/22 14:20:36 INFO BlockManagerInfo: Added broadcast_12_piece0 in
memory on 192.168.25.187:46647 (size: 24.6 KB, free: 366.2 MB)
17/08/22 14:20:36 INFO SparkContext: Created broadcast 12 from broadcast at
DAGScheduler.scala:1006
17/08/22 14:20:36 INFO DAGScheduler: Submitting 2 missing tasks from
ResultStage 6 (MapPartitionsRDD[25] at treeAggregate at
LogisticRegression.scala:1892) (first 15 tasks are for partitions Vector(0,
1))
17/08/22 14:20:36 INFO TaskSchedulerImpl: Adding task set 6.0 with 2 tasks
17/08/22 14:20:36 INFO TaskSetManager: Starting task 0.0 in stage 6.0 (TID
12, localhost, executor driver, partition 0, PROCESS_LOCAL, 4883 bytes)
17/08/22 14:20:36 INFO TaskSetManager: Starting task 1.0 in stage 6.0 (TID
13, localhost, executor driver, partition 1, PROCESS_LOCAL, 4892 bytes)
17/08/22 14:20:36 INFO Executor: Running task 0.0 in stage 6.0 (TID 12)
17/08/22 14:20:36 INFO Executor: Running task 1.0 in stage 6.0 (TID 13)
17/08/22 14:20:36 INFO BlockManager: Found block rdd_19_0 locally
17/08/22 14:20:36 INFO BlockManager: Found block rdd_19_1 locally
17/08/22 14:20:36 INFO MemoryStore: Block taskresult_13 stored as bytes in
memory (estimated size 2.0 MB, free 354.1 MB)
17/08/22 14:20:36 INFO BlockManagerInfo: Added taskresult_13 in memory on
192.168.25.187:46647 (size: 2.0 MB, free: 364.2 MB)
17/08/22 14:20:36 INFO MemoryStore: Block taskresult_12 stored as bytes in
memory (estimated size 2.0 MB, free 352.1 MB)
17/08/22 14:20:36 INFO BlockManagerInfo: Added taskresult_12 in memory on
192.168.25.187:46647 (size: 2.0 MB, free: 362.2 MB)
17/08/22 14:20:36 INFO Executor: Finished task 0.0 in stage 6.0 (TID 12).
2110568 bytes result sent via BlockManager)
17/08/22 14:20:36 INFO TaskSetManager: Finished task 0.0 in stage 6.0 (TID
12) in 79 ms on localhost (executor driver) (1/2)
17/08/22 14:20:36 INFO Executor: Finished task 1.0 in stage 6.0 (TID 13).
2110568 bytes result sent via BlockManager)
17/08/22 14:20:36 INFO BlockManagerInfo: Removed taskresult_12 on
192.168.25.187:46647 in memory (size: 2.0 MB, free: 364.2 MB)
17/08/22 14:20:36 INFO TaskSetManager: Finished task 1.0 in stage 6.0 (TID
13) in 102 ms on localhost (executor driver) (2/2)
17/08/22 14:20:36 INFO TaskSchedulerImpl: Removed TaskSet 6.0, whose tasks
have all completed, from pool
17/08/22 14:20:36 INFO DAGScheduler: ResultStage 6 (treeAggregate at
LogisticRegression.scala:1892) finished in 0.106 s
17/08/22 14:20:36 INFO DAGScheduler: Job 6 finished: treeAggregate at
LogisticRegression.scala:1892, took 0.143539 s
17/08/22 14:20:36 INFO BlockManagerInfo: Removed taskresult_13 on
192.168.25.187:46647 in memory (size: 2.0 MB, free: 366.2 MB)
17/08/22 14:20:36 INFO TorrentBroadcast: Destroying Broadcast(11) (from
destroy at LogisticRegression.scala:1933)
17/08/22 14:20:36 INFO BlockManagerInfo: Removed broadcast_11_piece0 on
192.168.25.187:46647 in memory (size: 10.3 KB, free: 366.2 MB)
17/08/22 14:20:36 INFO LBFGS: Step Size: 1.000
17/08/22 14:20:36 INFO LBFGS: Val and Grad Norm: 0.0334526 (rel: 0.473)
0.0350590
17/08/22 14:20:36 INFO MemoryStore: Block broadcast_13 stored as values in
memory (estimated size 2.0 MB, free 356.1 MB)
17/08/22 14:20:36 INFO MemoryStore: Block broadcast_13_piece0 stored as
bytes in memory (estimated size 10.3 KB, free 356.1 MB)
17/08/22 14:20:36 INFO BlockManagerInfo: Added broadcast_13_piece0 in
memory on 192.168.25.187:46647 (size: 10.3 KB, free: 366.2 MB)
17/08/22 14:20:36 INFO SparkContext: Created broadcast 13 from broadcast at
LogisticRegression.scala:1879
17/08/22 14:20:36 INFO SparkContext: Starting job: treeAggregate at
LogisticRegression.scala:1892
17/08/22 14:20:36 INFO DAGScheduler: Got job 7 (treeAggregate at
LogisticRegression.scala:1892) with 2 output partitions
17/08/22 14:20:36 INFO DAGScheduler: Final stage: ResultStage 7
(treeAggregate at LogisticRegression.scala:1892)
17/08/22 14:20:36 INFO DAGScheduler: Parents of final stage: List()
17/08/22 14:20:36 INFO DAGScheduler: Missing parents: List()
17/08/22 14:20:36 INFO DAGScheduler: Submitting ResultStage 7
(MapPartitionsRDD[26] at treeAggregate at LogisticRegression.scala:1892),
which has no missing parents
17/08/22 14:20:36 INFO MemoryStore: Block broadcast_14 stored as values in
memory (estimated size 2.0 MB, free 354.1 MB)
17/08/22 14:20:36 INFO MemoryStore: Block broadcast_14_piece0 stored as
bytes in memory (estimated size 24.6 KB, free 354.1 MB)
17/08/22 14:20:36 INFO BlockManagerInfo: Added broadcast_14_piece0 in
memory on 192.168.25.187:46647 (size: 24.6 KB, free: 366.2 MB)
17/08/22 14:20:36 INFO SparkContext: Created broadcast 14 from broadcast at
DAGScheduler.scala:1006
17/08/22 14:20:36 INFO DAGScheduler: Submitting 2 missing tasks from
ResultStage 7 (MapPartitionsRDD[26] at treeAggregate at
LogisticRegression.scala:1892) (first 15 tasks are for partitions Vector(0,
1))
17/08/22 14:20:36 INFO TaskSchedulerImpl: Adding task set 7.0 with 2 tasks
17/08/22 14:20:36 INFO TaskSetManager: Starting task 0.0 in stage 7.0 (TID
14, localhost, executor driver, partition 0, PROCESS_LOCAL, 4883 bytes)
17/08/22 14:20:36 INFO TaskSetManager: Starting task 1.0 in stage 7.0 (TID
15, localhost, executor driver, partition 1, PROCESS_LOCAL, 4892 bytes)
17/08/22 14:20:36 INFO Executor: Running task 0.0 in stage 7.0 (TID 14)
17/08/22 14:20:36 INFO Executor: Running task 1.0 in stage 7.0 (TID 15)
17/08/22 14:20:36 INFO BlockManager: Found block rdd_19_0 locally
17/08/22 14:20:36 INFO BlockManager: Found block rdd_19_1 locally
17/08/22 14:20:36 INFO MemoryStore: Block taskresult_15 stored as bytes in
memory (estimated size 2.0 MB, free 352.0 MB)
17/08/22 14:20:36 INFO MemoryStore: Block taskresult_14 stored as bytes in
memory (estimated size 2.0 MB, free 350.0 MB)
17/08/22 14:20:36 INFO BlockManagerInfo: Added taskresult_15 in memory on
192.168.25.187:46647 (size: 2.0 MB, free: 364.2 MB)
17/08/22 14:20:36 INFO Executor: Finished task 1.0 in stage 7.0 (TID 15).
2110568 bytes result sent via BlockManager)
17/08/22 14:20:36 INFO BlockManagerInfo: Added taskresult_14 in memory on
192.168.25.187:46647 (size: 2.0 MB, free: 362.2 MB)
17/08/22 14:20:36 INFO BlockManagerInfo: Removed broadcast_10_piece0 on
192.168.25.187:46647 in memory (size: 24.6 KB, free: 362.2 MB)
17/08/22 14:20:36 INFO Executor: Finished task 0.0 in stage 7.0 (TID 14).
2110611 bytes result sent via BlockManager)
17/08/22 14:20:36 INFO BlockManagerInfo: Removed taskresult_15 on
192.168.25.187:46647 in memory (size: 2.0 MB, free: 364.2 MB)
17/08/22 14:20:36 INFO TaskSetManager: Finished task 1.0 in stage 7.0 (TID
15) in 387 ms on localhost (executor driver) (1/2)
17/08/22 14:20:36 INFO BlockManagerInfo: Removed broadcast_1_piece0 on
192.168.25.187:46647 in memory (size: 24.3 KB, free: 364.2 MB)
17/08/22 14:20:36 INFO BlockManagerInfo: Removed taskresult_14 on
192.168.25.187:46647 in memory (size: 2.0 MB, free: 366.2 MB)
17/08/22 14:20:36 INFO BlockManagerInfo: Removed broadcast_12_piece0 on
192.168.25.187:46647 in memory (size: 24.6 KB, free: 366.3 MB)
17/08/22 14:20:36 INFO TaskSetManager: Finished task 0.0 in stage 7.0 (TID
14) in 416 ms on localhost (executor driver) (2/2)
17/08/22 14:20:36 INFO TaskSchedulerImpl: Removed TaskSet 7.0, whose tasks
have all completed, from pool
17/08/22 14:20:36 INFO DAGScheduler: ResultStage 7 (treeAggregate at
LogisticRegression.scala:1892) finished in 0.418 s
17/08/22 14:20:36 INFO DAGScheduler: Job 7 finished: treeAggregate at
LogisticRegression.scala:1892, took 0.459555 s
17/08/22 14:20:36 INFO TorrentBroadcast: Destroying Broadcast(13) (from
destroy at LogisticRegression.scala:1933)
17/08/22 14:20:36 INFO BlockManagerInfo: Removed broadcast_13_piece0 on
192.168.25.187:46647 in memory (size: 10.3 KB, free: 366.3 MB)
17/08/22 14:20:36 INFO LBFGS: Step Size: 1.000
17/08/22 14:20:36 INFO LBFGS: Val and Grad Norm: 0.0200470 (rel: 0.401)
0.0164153
17/08/22 14:20:36 INFO MemoryStore: Block broadcast_15 stored as values in
memory (estimated size 2.0 MB, free 360.2 MB)
17/08/22 14:20:36 INFO MemoryStore: Block broadcast_15_piece0 stored as
bytes in memory (estimated size 10.3 KB, free 360.2 MB)
17/08/22 14:20:36 INFO BlockManagerInfo: Added broadcast_15_piece0 in
memory on 192.168.25.187:46647 (size: 10.3 KB, free: 366.3 MB)
17/08/22 14:20:36 INFO SparkContext: Created broadcast 15 from broadcast at
LogisticRegression.scala:1879
17/08/22 14:20:37 INFO SparkContext: Starting job: treeAggregate at
LogisticRegression.scala:1892
17/08/22 14:20:37 INFO DAGScheduler: Got job 8 (treeAggregate at
LogisticRegression.scala:1892) with 2 output partitions
17/08/22 14:20:37 INFO DAGScheduler: Final stage: ResultStage 8
(treeAggregate at LogisticRegression.scala:1892)
17/08/22 14:20:37 INFO DAGScheduler: Parents of final stage: List()
17/08/22 14:20:37 INFO DAGScheduler: Missing parents: List()
17/08/22 14:20:37 INFO DAGScheduler: Submitting ResultStage 8
(MapPartitionsRDD[27] at treeAggregate at LogisticRegression.scala:1892),
which has no missing parents
17/08/22 14:20:37 INFO MemoryStore: Block broadcast_16 stored as values in
memory (estimated size 2.0 MB, free 358.2 MB)
17/08/22 14:20:37 INFO MemoryStore: Block broadcast_16_piece0 stored as
bytes in memory (estimated size 24.6 KB, free 358.2 MB)
17/08/22 14:20:37 INFO BlockManagerInfo: Added broadcast_16_piece0 in
memory on 192.168.25.187:46647 (size: 24.6 KB, free: 366.2 MB)
17/08/22 14:20:37 INFO SparkContext: Created broadcast 16 from broadcast at
DAGScheduler.scala:1006
17/08/22 14:20:37 INFO DAGScheduler: Submitting 2 missing tasks from
ResultStage 8 (MapPartitionsRDD[27] at treeAggregate at
LogisticRegression.scala:1892) (first 15 tasks are for partitions Vector(0,
1))
17/08/22 14:20:37 INFO TaskSchedulerImpl: Adding task set 8.0 with 2 tasks
17/08/22 14:20:37 INFO TaskSetManager: Starting task 0.0 in stage 8.0 (TID
16, localhost, executor driver, partition 0, PROCESS_LOCAL, 4883 bytes)
17/08/22 14:20:37 INFO TaskSetManager: Starting task 1.0 in stage 8.0 (TID
17, localhost, executor driver, partition 1, PROCESS_LOCAL, 4892 bytes)
17/08/22 14:20:37 INFO Executor: Running task 0.0 in stage 8.0 (TID 16)
17/08/22 14:20:37 INFO Executor: Running task 1.0 in stage 8.0 (TID 17)
17/08/22 14:20:37 INFO BlockManager: Found block rdd_19_0 locally
17/08/22 14:20:37 INFO BlockManager: Found block rdd_19_1 locally
17/08/22 14:20:37 INFO MemoryStore: Block taskresult_16 stored as bytes in
memory (estimated size 2.0 MB, free 356.2 MB)
17/08/22 14:20:37 INFO BlockManagerInfo: Added taskresult_16 in memory on
192.168.25.187:46647 (size: 2.0 MB, free: 364.2 MB)
17/08/22 14:20:37 INFO Executor: Finished task 0.0 in stage 8.0 (TID 16).
2110568 bytes result sent via BlockManager)
17/08/22 14:20:37 INFO MemoryStore: Block taskresult_17 stored as bytes in
memory (estimated size 2.0 MB, free 354.1 MB)
17/08/22 14:20:37 INFO BlockManagerInfo: Added taskresult_17 in memory on
192.168.25.187:46647 (size: 2.0 MB, free: 362.2 MB)
17/08/22 14:20:37 INFO BlockManagerInfo: Removed taskresult_16 on
192.168.25.187:46647 in memory (size: 2.0 MB, free: 364.2 MB)
17/08/22 14:20:37 INFO Executor: Finished task 1.0 in stage 8.0 (TID 17).
2110568 bytes result sent via BlockManager)
17/08/22 14:20:37 INFO TaskSetManager: Finished task 0.0 in stage 8.0 (TID
16) in 67 ms on localhost (executor driver) (1/2)
17/08/22 14:20:37 INFO BlockManagerInfo: Removed taskresult_17 on
192.168.25.187:46647 in memory (size: 2.0 MB, free: 366.2 MB)
17/08/22 14:20:37 INFO TaskSetManager: Finished task 1.0 in stage 8.0 (TID
17) in 110 ms on localhost (executor driver) (2/2)
17/08/22 14:20:37 INFO TaskSchedulerImpl: Removed TaskSet 8.0, whose tasks
have all completed, from pool
17/08/22 14:20:37 INFO DAGScheduler: ResultStage 8 (treeAggregate at
LogisticRegression.scala:1892) finished in 0.118 s
17/08/22 14:20:37 INFO DAGScheduler: Job 8 finished: treeAggregate at
LogisticRegression.scala:1892, took 0.149447 s
17/08/22 14:20:37 INFO TorrentBroadcast: Destroying Broadcast(15) (from
destroy at LogisticRegression.scala:1933)
17/08/22 14:20:37 INFO BlockManagerInfo: Removed broadcast_15_piece0 on
192.168.25.187:46647 in memory (size: 10.3 KB, free: 366.2 MB)
17/08/22 14:20:37 INFO LBFGS: Step Size: 1.000
17/08/22 14:20:37 INFO LBFGS: Val and Grad Norm: 0.0146788 (rel: 0.268)
0.00726759
17/08/22 14:20:37 INFO MemoryStore: Block broadcast_17 stored as values in
memory (estimated size 2.0 MB, free 358.2 MB)
17/08/22 14:20:37 INFO MemoryStore: Block broadcast_17_piece0 stored as
bytes in memory (estimated size 10.3 KB, free 358.2 MB)
17/08/22 14:20:37 INFO BlockManagerInfo: Added broadcast_17_piece0 in
memory on 192.168.25.187:46647 (size: 10.3 KB, free: 366.2 MB)
17/08/22 14:20:37 INFO SparkContext: Created broadcast 17 from broadcast at
LogisticRegression.scala:1879
17/08/22 14:20:37 INFO SparkContext: Starting job: treeAggregate at
LogisticRegression.scala:1892
17/08/22 14:20:37 INFO DAGScheduler: Got job 9 (treeAggregate at
LogisticRegression.scala:1892) with 2 output partitions
17/08/22 14:20:37 INFO DAGScheduler: Final stage: ResultStage 9
(treeAggregate at LogisticRegression.scala:1892)
17/08/22 14:20:37 INFO DAGScheduler: Parents of final stage: List()
17/08/22 14:20:37 INFO DAGScheduler: Missing parents: List()
17/08/22 14:20:37 INFO DAGScheduler: Submitting ResultStage 9
(MapPartitionsRDD[28] at treeAggregate at LogisticRegression.scala:1892),
which has no missing parents
17/08/22 14:20:37 INFO MemoryStore: Block broadcast_18 stored as values in
memory (estimated size 2.0 MB, free 356.1 MB)
17/08/22 14:20:37 INFO MemoryStore: Block broadcast_18_piece0 stored as
bytes in memory (estimated size 24.6 KB, free 356.1 MB)
17/08/22 14:20:37 INFO BlockManagerInfo: Added broadcast_18_piece0 in
memory on 192.168.25.187:46647 (size: 24.6 KB, free: 366.2 MB)
17/08/22 14:20:37 INFO SparkContext: Created broadcast 18 from broadcast at
DAGScheduler.scala:1006
17/08/22 14:20:37 INFO DAGScheduler: Submitting 2 missing tasks from
ResultStage 9 (MapPartitionsRDD[28] at treeAggregate at
LogisticRegression.scala:1892) (first 15 tasks are for partitions Vector(0,
1))
17/08/22 14:20:37 INFO TaskSchedulerImpl: Adding task set 9.0 with 2 tasks
17/08/22 14:20:37 INFO TaskSetManager: Starting task 0.0 in stage 9.0 (TID
18, localhost, executor driver, partition 0, PROCESS_LOCAL, 4883 bytes)
17/08/22 14:20:37 INFO TaskSetManager: Starting task 1.0 in stage 9.0 (TID
19, localhost, executor driver, partition 1, PROCESS_LOCAL, 4892 bytes)
17/08/22 14:20:37 INFO Executor: Running task 0.0 in stage 9.0 (TID 18)
17/08/22 14:20:37 INFO Executor: Running task 1.0 in stage 9.0 (TID 19)
17/08/22 14:20:37 INFO BlockManager: Found block rdd_19_1 locally
17/08/22 14:20:37 INFO MemoryStore: Block taskresult_19 stored as bytes in
memory (estimated size 2.0 MB, free 354.1 MB)
17/08/22 14:20:37 INFO BlockManager: Found block rdd_19_0 locally
17/08/22 14:20:37 INFO BlockManagerInfo: Added taskresult_19 in memory on
192.168.25.187:46647 (size: 2.0 MB, free: 364.2 MB)
17/08/22 14:20:37 INFO BlockManagerInfo: Removed broadcast_16_piece0 on
192.168.25.187:46647 in memory (size: 24.6 KB, free: 364.2 MB)
17/08/22 14:20:37 INFO Executor: Finished task 1.0 in stage 9.0 (TID 19).
2110568 bytes result sent via BlockManager)
17/08/22 14:20:37 INFO MemoryStore: Block taskresult_18 stored as bytes in
memory (estimated size 2.0 MB, free 354.1 MB)
17/08/22 14:20:37 INFO BlockManagerInfo: Added taskresult_18 in memory on
192.168.25.187:46647 (size: 2.0 MB, free: 362.2 MB)
17/08/22 14:20:37 INFO BlockManagerInfo: Removed taskresult_19 on
192.168.25.187:46647 in memory (size: 2.0 MB, free: 364.2 MB)
17/08/22 14:20:37 INFO TaskSetManager: Finished task 1.0 in stage 9.0 (TID
19) in 197 ms on localhost (executor driver) (1/2)
17/08/22 14:20:37 INFO Executor: Finished task 0.0 in stage 9.0 (TID 18).
2110611 bytes result sent via BlockManager)
17/08/22 14:20:37 INFO BlockManagerInfo: Removed taskresult_18 on
192.168.25.187:46647 in memory (size: 2.0 MB, free: 366.2 MB)
17/08/22 14:20:37 INFO TaskSetManager: Finished task 0.0 in stage 9.0 (TID
18) in 233 ms on localhost (executor driver) (2/2)
17/08/22 14:20:37 INFO TaskSchedulerImpl: Removed TaskSet 9.0, whose tasks
have all completed, from pool
17/08/22 14:20:37 INFO DAGScheduler: ResultStage 9 (treeAggregate at
LogisticRegression.scala:1892) finished in 0.236 s
17/08/22 14:20:37 INFO DAGScheduler: Job 9 finished: treeAggregate at
LogisticRegression.scala:1892, took 0.274679 s
17/08/22 14:20:37 INFO TorrentBroadcast: Destroying Broadcast(17) (from
destroy at LogisticRegression.scala:1933)
17/08/22 14:20:37 INFO BlockManagerInfo: Removed broadcast_17_piece0 on
192.168.25.187:46647 in memory (size: 10.3 KB, free: 366.2 MB)
17/08/22 14:20:37 INFO LBFGS: Step Size: 1.000
17/08/22 14:20:37 INFO LBFGS: Val and Grad Norm: 0.0127666 (rel: 0.130)
0.00298415
17/08/22 14:20:37 INFO MemoryStore: Block broadcast_19 stored as values in
memory (estimated size 2.0 MB, free 358.2 MB)
17/08/22 14:20:37 INFO MemoryStore: Block broadcast_19_piece0 stored as
bytes in memory (estimated size 10.3 KB, free 358.2 MB)
17/08/22 14:20:37 INFO BlockManagerInfo: Added broadcast_19_piece0 in
memory on 192.168.25.187:46647 (size: 10.3 KB, free: 366.2 MB)
17/08/22 14:20:37 INFO SparkContext: Created broadcast 19 from broadcast at
LogisticRegression.scala:1879
17/08/22 14:20:37 INFO SparkContext: Starting job: treeAggregate at
LogisticRegression.scala:1892
17/08/22 14:20:37 INFO DAGScheduler: Got job 10 (treeAggregate at
LogisticRegression.scala:1892) with 2 output partitions
17/08/22 14:20:37 INFO DAGScheduler: Final stage: ResultStage 10
(treeAggregate at LogisticRegression.scala:1892)
17/08/22 14:20:37 INFO DAGScheduler: Parents of final stage: List()
17/08/22 14:20:37 INFO DAGScheduler: Missing parents: List()
17/08/22 14:20:37 INFO DAGScheduler: Submitting ResultStage 10
(MapPartitionsRDD[29] at treeAggregate at LogisticRegression.scala:1892),
which has no missing parents
17/08/22 14:20:37 INFO MemoryStore: Block broadcast_20 stored as values in
memory (estimated size 2.0 MB, free 356.1 MB)
17/08/22 14:20:37 INFO MemoryStore: Block broadcast_20_piece0 stored as
bytes in memory (estimated size 24.6 KB, free 356.1 MB)
17/08/22 14:20:37 INFO BlockManagerInfo: Added broadcast_20_piece0 in
memory on 192.168.25.187:46647 (size: 24.6 KB, free: 366.2 MB)
17/08/22 14:20:37 INFO SparkContext: Created broadcast 20 from broadcast at
DAGScheduler.scala:1006
17/08/22 14:20:37 INFO DAGScheduler: Submitting 2 missing tasks from
ResultStage 10 (MapPartitionsRDD[29] at treeAggregate at
LogisticRegression.scala:1892) (first 15 tasks are for partitions Vector(0,
1))
17/08/22 14:20:37 INFO TaskSchedulerImpl: Adding task set 10.0 with 2 tasks
17/08/22 14:20:37 INFO TaskSetManager: Starting task 0.0 in stage 10.0 (TID
20, localhost, executor driver, partition 0, PROCESS_LOCAL, 4883 bytes)
17/08/22 14:20:37 INFO TaskSetManager: Starting task 1.0 in stage 10.0 (TID
21, localhost, executor driver, partition 1, PROCESS_LOCAL, 4892 bytes)
17/08/22 14:20:37 INFO Executor: Running task 0.0 in stage 10.0 (TID 20)
17/08/22 14:20:37 INFO Executor: Running task 1.0 in stage 10.0 (TID 21)
17/08/22 14:20:37 INFO BlockManager: Found block rdd_19_1 locally
17/08/22 14:20:37 INFO MemoryStore: Block taskresult_21 stored as bytes in
memory (estimated size 2.0 MB, free 354.1 MB)
17/08/22 14:20:37 INFO BlockManagerInfo: Added taskresult_21 in memory on
192.168.25.187:46647 (size: 2.0 MB, free: 364.2 MB)
17/08/22 14:20:37 INFO BlockManager: Found block rdd_19_0 locally
17/08/22 14:20:37 INFO MemoryStore: Block taskresult_20 stored as bytes in
memory (estimated size 2.0 MB, free 352.1 MB)
17/08/22 14:20:37 INFO BlockManagerInfo: Added taskresult_20 in memory on
192.168.25.187:46647 (size: 2.0 MB, free: 362.2 MB)
17/08/22 14:20:37 INFO Executor: Finished task 1.0 in stage 10.0 (TID 21).
2110568 bytes result sent via BlockManager)
17/08/22 14:20:37 INFO Executor: Finished task 0.0 in stage 10.0 (TID 20).
2110568 bytes result sent via BlockManager)
17/08/22 14:20:37 INFO BlockManagerInfo: Removed taskresult_21 on
192.168.25.187:46647 in memory (size: 2.0 MB, free: 364.2 MB)
17/08/22 14:20:37 INFO TaskSetManager: Finished task 1.0 in stage 10.0 (TID
21) in 95 ms on localhost (executor driver) (1/2)
17/08/22 14:20:37 INFO BlockManagerInfo: Removed taskresult_20 on
192.168.25.187:46647 in memory (size: 2.0 MB, free: 366.2 MB)
17/08/22 14:20:37 INFO TaskSetManager: Finished task 0.0 in stage 10.0 (TID
20) in 115 ms on localhost (executor driver) (2/2)
17/08/22 14:20:37 INFO TaskSchedulerImpl: Removed TaskSet 10.0, whose tasks
have all completed, from pool
17/08/22 14:20:37 INFO DAGScheduler: ResultStage 10 (treeAggregate at
LogisticRegression.scala:1892) finished in 0.118 s
17/08/22 14:20:37 INFO DAGScheduler: Job 10 finished: treeAggregate at
LogisticRegression.scala:1892, took 0.143837 s
17/08/22 14:20:37 INFO TorrentBroadcast: Destroying Broadcast(19) (from
destroy at LogisticRegression.scala:1933)
17/08/22 14:20:37 INFO BlockManagerInfo: Removed broadcast_19_piece0 on
192.168.25.187:46647 in memory (size: 10.3 KB, free: 366.2 MB)
17/08/22 14:20:37 INFO LBFGS: Step Size: 1.000
17/08/22 14:20:37 INFO LBFGS: Val and Grad Norm: 0.0122267 (rel: 0.0423)
0.00119244
17/08/22 14:20:37 INFO MemoryStore: Block broadcast_21 stored as values in
memory (estimated size 2.0 MB, free 356.1 MB)
17/08/22 14:20:37 INFO MemoryStore: Block broadcast_21_piece0 stored as
bytes in memory (estimated size 10.3 KB, free 356.1 MB)
17/08/22 14:20:37 INFO BlockManagerInfo: Added broadcast_21_piece0 in
memory on 192.168.25.187:46647 (size: 10.3 KB, free: 366.2 MB)
17/08/22 14:20:37 INFO SparkContext: Created broadcast 21 from broadcast at
LogisticRegression.scala:1879
17/08/22 14:20:37 INFO SparkContext: Starting job: treeAggregate at
LogisticRegression.scala:1892
17/08/22 14:20:37 INFO DAGScheduler: Got job 11 (treeAggregate at
LogisticRegression.scala:1892) with 2 output partitions
17/08/22 14:20:37 INFO DAGScheduler: Final stage: ResultStage 11
(treeAggregate at LogisticRegression.scala:1892)
17/08/22 14:20:37 INFO DAGScheduler: Parents of final stage: List()
17/08/22 14:20:37 INFO DAGScheduler: Missing parents: List()
17/08/22 14:20:37 INFO DAGScheduler: Submitting ResultStage 11
(MapPartitionsRDD[30] at treeAggregate at LogisticRegression.scala:1892),
which has no missing parents
17/08/22 14:20:37 INFO MemoryStore: Block broadcast_22 stored as values in
memory (estimated size 2.0 MB, free 354.1 MB)
17/08/22 14:20:37 INFO MemoryStore: Block broadcast_22_piece0 stored as
bytes in memory (estimated size 24.6 KB, free 354.1 MB)
17/08/22 14:20:37 INFO BlockManagerInfo: Added broadcast_22_piece0 in
memory on 192.168.25.187:46647 (size: 24.6 KB, free: 366.2 MB)
17/08/22 14:20:38 INFO SparkContext: Created broadcast 22 from broadcast at
DAGScheduler.scala:1006
17/08/22 14:20:38 INFO DAGScheduler: Submitting 2 missing tasks from
ResultStage 11 (MapPartitionsRDD[30] at treeAggregate at
LogisticRegression.scala:1892) (first 15 tasks are for partitions Vector(0,
1))
17/08/22 14:20:38 INFO TaskSchedulerImpl: Adding task set 11.0 with 2 tasks
17/08/22 14:20:38 INFO TaskSetManager: Starting task 0.0 in stage 11.0 (TID
22, localhost, executor driver, partition 0, PROCESS_LOCAL, 4883 bytes)
17/08/22 14:20:38 INFO TaskSetManager: Starting task 1.0 in stage 11.0 (TID
23, localhost, executor driver, partition 1, PROCESS_LOCAL, 4892 bytes)
17/08/22 14:20:38 INFO Executor: Running task 0.0 in stage 11.0 (TID 22)
17/08/22 14:20:38 INFO Executor: Running task 1.0 in stage 11.0 (TID 23)
17/08/22 14:20:38 INFO BlockManager: Found block rdd_19_1 locally
17/08/22 14:20:38 INFO BlockManager: Found block rdd_19_0 locally
17/08/22 14:20:38 INFO MemoryStore: Block taskresult_23 stored as bytes in
memory (estimated size 2.0 MB, free 352.0 MB)
17/08/22 14:20:38 INFO BlockManagerInfo: Added taskresult_23 in memory on
192.168.25.187:46647 (size: 2.0 MB, free: 364.2 MB)
17/08/22 14:20:38 INFO Executor: Finished task 1.0 in stage 11.0 (TID 23).
2110568 bytes result sent via BlockManager)
17/08/22 14:20:38 INFO TaskSetManager: Finished task 1.0 in stage 11.0 (TID
23) in 85 ms on localhost (executor driver) (1/2)
17/08/22 14:20:38 INFO BlockManagerInfo: Removed broadcast_18_piece0 on
192.168.25.187:46647 in memory (size: 24.6 KB, free: 364.2 MB)
17/08/22 14:20:38 INFO BlockManagerInfo: Removed taskresult_23 on
192.168.25.187:46647 in memory (size: 2.0 MB, free: 366.2 MB)
17/08/22 14:20:38 INFO BlockManagerInfo: Removed broadcast_20_piece0 on
192.168.25.187:46647 in memory (size: 24.6 KB, free: 366.2 MB)
17/08/22 14:20:38 INFO MemoryStore: Block taskresult_22 stored as bytes in
memory (estimated size 2.0 MB, free 356.2 MB)
17/08/22 14:20:38 INFO BlockManagerInfo: Added taskresult_22 in memory on
192.168.25.187:46647 (size: 2.0 MB, free: 364.2 MB)
17/08/22 14:20:38 INFO Executor: Finished task 0.0 in stage 11.0 (TID 22).
2110611 bytes result sent via BlockManager)
17/08/22 14:20:38 INFO TaskSetManager: Finished task 0.0 in stage 11.0 (TID
22) in 118 ms on localhost (executor driver) (2/2)
17/08/22 14:20:38 INFO TaskSchedulerImpl: Removed TaskSet 11.0, whose tasks
have all completed, from pool
17/08/22 14:20:38 INFO DAGScheduler: ResultStage 11 (treeAggregate at
LogisticRegression.scala:1892) finished in 0.122 s
17/08/22 14:20:38 INFO BlockManagerInfo: Removed taskresult_22 on
192.168.25.187:46647 in memory (size: 2.0 MB, free: 366.2 MB)
17/08/22 14:20:38 INFO DAGScheduler: Job 11 finished: treeAggregate at
LogisticRegression.scala:1892, took 0.143423 s
17/08/22 14:20:38 INFO TorrentBroadcast: Destroying Broadcast(21) (from
destroy at LogisticRegression.scala:1933)
17/08/22 14:20:38 INFO BlockManagerInfo: Removed broadcast_21_piece0 on
192.168.25.187:46647 in memory (size: 10.3 KB, free: 366.2 MB)
17/08/22 14:20:38 INFO LBFGS: Step Size: 1.000
17/08/22 14:20:38 INFO LBFGS: Val and Grad Norm: 0.0120434 (rel: 0.0150)
0.000857606
17/08/22 14:20:38 INFO MemoryStore: Block broadcast_23 stored as values in
memory (estimated size 2.0 MB, free 358.2 MB)
17/08/22 14:20:38 INFO MemoryStore: Block broadcast_23_piece0 stored as
bytes in memory (estimated size 10.3 KB, free 358.2 MB)
17/08/22 14:20:38 INFO BlockManagerInfo: Added broadcast_23_piece0 in
memory on 192.168.25.187:46647 (size: 10.3 KB, free: 366.2 MB)
17/08/22 14:20:38 INFO SparkContext: Created broadcast 23 from broadcast at
LogisticRegression.scala:1879
17/08/22 14:20:38 INFO SparkContext: Starting job: treeAggregate at
LogisticRegression.scala:1892
17/08/22 14:20:38 INFO DAGScheduler: Got job 12 (treeAggregate at
LogisticRegression.scala:1892) with 2 output partitions
17/08/22 14:20:38 INFO DAGScheduler: Final stage: ResultStage 12
(treeAggregate at LogisticRegression.scala:1892)
17/08/22 14:20:38 INFO DAGScheduler: Parents of final stage: List()
17/08/22 14:20:38 INFO DAGScheduler: Missing parents: List()
17/08/22 14:20:38 INFO DAGScheduler: Submitting ResultStage 12
(MapPartitionsRDD[31] at treeAggregate at LogisticRegression.scala:1892),
which has no missing parents
17/08/22 14:20:38 INFO MemoryStore: Block broadcast_24 stored as values in
memory (estimated size 2.0 MB, free 356.1 MB)
17/08/22 14:20:38 INFO MemoryStore: Block broadcast_24_piece0 stored as
bytes in memory (estimated size 24.6 KB, free 356.1 MB)
17/08/22 14:20:38 INFO BlockManagerInfo: Added broadcast_24_piece0 in
memory on 192.168.25.187:46647 (size: 24.6 KB, free: 366.2 MB)
17/08/22 14:20:38 INFO SparkContext: Created broadcast 24 from broadcast at
DAGScheduler.scala:1006
17/08/22 14:20:38 INFO DAGScheduler: Submitting 2 missing tasks from
ResultStage 12 (MapPartitionsRDD[31] at treeAggregate at
LogisticRegression.scala:1892) (first 15 tasks are for partitions Vector(0,
1))
17/08/22 14:20:38 INFO TaskSchedulerImpl: Adding task set 12.0 with 2 tasks
17/08/22 14:20:38 INFO TaskSetManager: Starting task 0.0 in stage 12.0 (TID
24, localhost, executor driver, partition 0, PROCESS_LOCAL, 4883 bytes)
17/08/22 14:20:38 INFO TaskSetManager: Starting task 1.0 in stage 12.0 (TID
25, localhost, executor driver, partition 1, PROCESS_LOCAL, 4892 bytes)
17/08/22 14:20:38 INFO Executor: Running task 0.0 in stage 12.0 (TID 24)
17/08/22 14:20:38 INFO Executor: Running task 1.0 in stage 12.0 (TID 25)
17/08/22 14:20:38 INFO BlockManager: Found block rdd_19_0 locally
17/08/22 14:20:38 INFO MemoryStore: Block taskresult_24 stored as bytes in
memory (estimated size 2.0 MB, free 354.1 MB)
17/08/22 14:20:38 INFO BlockManagerInfo: Added taskresult_24 in memory on
192.168.25.187:46647 (size: 2.0 MB, free: 364.2 MB)
17/08/22 14:20:38 INFO Executor: Finished task 0.0 in stage 12.0 (TID 24).
2110568 bytes result sent via BlockManager)
17/08/22 14:20:38 INFO BlockManager: Found block rdd_19_1 locally
17/08/22 14:20:38 INFO MemoryStore: Block taskresult_25 stored as bytes in
memory (estimated size 2.0 MB, free 352.1 MB)
17/08/22 14:20:38 INFO TaskSetManager: Finished task 0.0 in stage 12.0 (TID
24) in 57 ms on localhost (executor driver) (1/2)
17/08/22 14:20:38 INFO BlockManagerInfo: Added taskresult_25 in memory on
192.168.25.187:46647 (size: 2.0 MB, free: 362.2 MB)
17/08/22 14:20:38 INFO Executor: Finished task 1.0 in stage 12.0 (TID 25).
2110568 bytes result sent via BlockManager)
17/08/22 14:20:38 INFO BlockManagerInfo: Removed taskresult_24 on
192.168.25.187:46647 in memory (size: 2.0 MB, free: 364.2 MB)
17/08/22 14:20:38 INFO TaskSetManager: Finished task 1.0 in stage 12.0 (TID
25) in 94 ms on localhost (executor driver) (2/2)
17/08/22 14:20:38 INFO TaskSchedulerImpl: Removed TaskSet 12.0, whose tasks
have all completed, from pool
17/08/22 14:20:38 INFO BlockManagerInfo: Removed taskresult_25 on
192.168.25.187:46647 in memory (size: 2.0 MB, free: 366.2 MB)
17/08/22 14:20:38 INFO DAGScheduler: ResultStage 12 (treeAggregate at
LogisticRegression.scala:1892) finished in 0.098 s
17/08/22 14:20:38 INFO DAGScheduler: Job 12 finished: treeAggregate at
LogisticRegression.scala:1892, took 0.125965 s
17/08/22 14:20:38 INFO TorrentBroadcast: Destroying Broadcast(23) (from
destroy at LogisticRegression.scala:1933)
17/08/22 14:20:38 INFO BlockManagerInfo: Removed broadcast_23_piece0 on
192.168.25.187:46647 in memory (size: 10.3 KB, free: 366.2 MB)
17/08/22 14:20:38 INFO LBFGS: Step Size: 1.000
17/08/22 14:20:38 INFO LBFGS: Val and Grad Norm: 0.0118476 (rel: 0.0163)
0.00102213
17/08/22 14:20:38 INFO LBFGS: Converged because max iterations reached
17/08/22 14:20:38 INFO TorrentBroadcast: Destroying Broadcast(2) (from
destroy at LogisticRegression.scala:796)
17/08/22 14:20:38 INFO BlockManagerInfo: Removed broadcast_2_piece0 on
192.168.25.187:46647 in memory (size: 10.2 KB, free: 366.2 MB)
17/08/22 14:20:38 INFO MapPartitionsRDD: Removing RDD 19 from persistence
list
17/08/22 14:20:38 INFO BlockManager: Removing RDD 19
17/08/22 14:20:38 INFO CodeGenerator: Code generated in 69.50928 ms
17/08/22 14:20:38 INFO BlockManagerInfo: Removed broadcast_24_piece0 on
192.168.25.187:46647 in memory (size: 24.6 KB, free: 366.3 MB)
17/08/22 14:20:39 INFO BlockManagerInfo: Removed broadcast_22_piece0 on
192.168.25.187:46647 in memory (size: 24.6 KB, free: 366.3 MB)
17/08/22 14:20:39 INFO Instrumentation:
LogisticRegression-LogisticRegression_4647a03eff9e40a729dc-1694037276-1:
training finished
17/08/22 14:20:39 INFO SparkContext: Starting job: treeAggregate at
IDF.scala:54
17/08/22 14:20:39 INFO DAGScheduler: Got job 13 (treeAggregate at
IDF.scala:54) with 2 output partitions
17/08/22 14:20:39 INFO DAGScheduler: Final stage: ResultStage 13
(treeAggregate at IDF.scala:54)
17/08/22 14:20:39 INFO DAGScheduler: Parents of final stage: List()
17/08/22 14:20:39 INFO DAGScheduler: Missing parents: List()
17/08/22 14:20:39 INFO DAGScheduler: Submitting ResultStage 13
(MapPartitionsRDD[42] at treeAggregate at IDF.scala:54), which has no
missing parents
17/08/22 14:20:39 INFO MemoryStore: Block broadcast_25 stored as values in
memory (estimated size 26.0 KB, free 364.2 MB)
17/08/22 14:20:39 INFO MemoryStore: Block broadcast_25_piece0 stored as
bytes in memory (estimated size 11.6 KB, free 364.2 MB)
17/08/22 14:20:39 INFO BlockManagerInfo: Added broadcast_25_piece0 in
memory on 192.168.25.187:46647 (size: 11.6 KB, free: 366.3 MB)
17/08/22 14:20:39 INFO SparkContext: Created broadcast 25 from broadcast at
DAGScheduler.scala:1006
17/08/22 14:20:39 INFO DAGScheduler: Submitting 2 missing tasks from
ResultStage 13 (MapPartitionsRDD[42] at treeAggregate at IDF.scala:54)
(first 15 tasks are for partitions Vector(0, 1))
17/08/22 14:20:39 INFO TaskSchedulerImpl: Adding task set 13.0 with 2 tasks
17/08/22 14:20:39 INFO TaskSetManager: Starting task 0.0 in stage 13.0 (TID
26, localhost, executor driver, partition 0, PROCESS_LOCAL, 4883 bytes)
17/08/22 14:20:39 INFO TaskSetManager: Starting task 1.0 in stage 13.0 (TID
27, localhost, executor driver, partition 1, PROCESS_LOCAL, 4892 bytes)
17/08/22 14:20:39 INFO Executor: Running task 0.0 in stage 13.0 (TID 26)
17/08/22 14:20:39 INFO Executor: Running task 1.0 in stage 13.0 (TID 27)
17/08/22 14:20:39 INFO PythonRunner: Times: total = 52, boot = -6281, init
= 6333, finish = 0
17/08/22 14:20:39 INFO PythonRunner: Times: total = 58, boot = -6294, init
= 6351, finish = 1
17/08/22 14:20:39 INFO MemoryStore: Block taskresult_26 stored as bytes in
memory (estimated size 2.0 MB, free 362.2 MB)
17/08/22 14:20:39 INFO BlockManagerInfo: Added taskresult_26 in memory on
192.168.25.187:46647 (size: 2.0 MB, free: 364.3 MB)
17/08/22 14:20:39 INFO MemoryStore: Block taskresult_27 stored as bytes in
memory (estimated size 2.0 MB, free 360.2 MB)
17/08/22 14:20:39 INFO BlockManagerInfo: Added taskresult_27 in memory on
192.168.25.187:46647 (size: 2.0 MB, free: 362.2 MB)
17/08/22 14:20:39 INFO Executor: Finished task 1.0 in stage 13.0 (TID 27).
2109342 bytes result sent via BlockManager)
17/08/22 14:20:39 INFO Executor: Finished task 0.0 in stage 13.0 (TID 26).
2109342 bytes result sent via BlockManager)
17/08/22 14:20:39 INFO TaskSetManager: Finished task 0.0 in stage 13.0 (TID
26) in 156 ms on localhost (executor driver) (1/2)
17/08/22 14:20:39 INFO BlockManagerInfo: Removed taskresult_26 on
192.168.25.187:46647 in memory (size: 2.0 MB, free: 364.3 MB)
17/08/22 14:20:39 INFO BlockManagerInfo: Removed taskresult_27 on
192.168.25.187:46647 in memory (size: 2.0 MB, free: 366.3 MB)
17/08/22 14:20:39 INFO TaskSetManager: Finished task 1.0 in stage 13.0 (TID
27) in 179 ms on localhost (executor driver) (2/2)
17/08/22 14:20:39 INFO TaskSchedulerImpl: Removed TaskSet 13.0, whose tasks
have all completed, from pool
17/08/22 14:20:39 INFO DAGScheduler: ResultStage 13 (treeAggregate at
IDF.scala:54) finished in 0.185 s
17/08/22 14:20:39 INFO DAGScheduler: Job 13 finished: treeAggregate at
IDF.scala:54, took 0.204303 s
17/08/22 14:20:39 INFO Instrumentation:
NaiveBayes-NaiveBayes_4836ba16aa3f054bc95c-1042799920-2: training:
numPartitions=2 storageLevel=StorageLevel(1 replicas)
17/08/22 14:20:39 INFO Instrumentation:
NaiveBayes-NaiveBayes_4836ba16aa3f054bc95c-1042799920-2:
{"smoothing":1.0,"featuresCol":"features","modelType":"multinomial","labelCol":"label","predictionCol":"prediction","rawPredictionCol":"rawPrediction","probabilityCol":"probability"}
17/08/22 14:20:39 INFO CodeGenerator: Code generated in 49.860121 ms
17/08/22 14:20:40 INFO SparkContext: Starting job: head at
NaiveBayes.scala:154
17/08/22 14:20:40 INFO DAGScheduler: Got job 14 (head at
NaiveBayes.scala:154) with 1 output partitions
17/08/22 14:20:40 INFO DAGScheduler: Final stage: ResultStage 14 (head at
NaiveBayes.scala:154)
17/08/22 14:20:40 INFO DAGScheduler: Parents of final stage: List()
17/08/22 14:20:40 INFO DAGScheduler: Missing parents: List()
17/08/22 14:20:40 INFO DAGScheduler: Submitting ResultStage 14
(MapPartitionsRDD[49] at head at NaiveBayes.scala:154), which has no
missing parents
17/08/22 14:20:40 INFO MemoryStore: Block broadcast_26 stored as values in
memory (estimated size 2.0 MB, free 362.2 MB)
17/08/22 14:20:40 INFO MemoryStore: Block broadcast_26_piece0 stored as
bytes in memory (estimated size 22.0 KB, free 362.2 MB)
17/08/22 14:20:40 INFO BlockManagerInfo: Added broadcast_26_piece0 in
memory on 192.168.25.187:46647 (size: 22.0 KB, free: 366.2 MB)
17/08/22 14:20:40 INFO SparkContext: Created broadcast 26 from broadcast at
DAGScheduler.scala:1006
17/08/22 14:20:40 INFO DAGScheduler: Submitting 1 missing tasks from
ResultStage 14 (MapPartitionsRDD[49] at head at NaiveBayes.scala:154)
(first 15 tasks are for partitions Vector(0))
17/08/22 14:20:40 INFO TaskSchedulerImpl: Adding task set 14.0 with 1 tasks
17/08/22 14:20:40 INFO TaskSetManager: Starting task 0.0 in stage 14.0 (TID
28, localhost, executor driver, partition 0, PROCESS_LOCAL, 4883 bytes)
17/08/22 14:20:40 INFO Executor: Running task 0.0 in stage 14.0 (TID 28)
17/08/22 14:20:40 INFO PythonRunner: Times: total = 42, boot = -541, init =
583, finish = 0
17/08/22 14:20:40 INFO Executor: Finished task 0.0 in stage 14.0 (TID 28).
1713 bytes result sent to driver
17/08/22 14:20:40 INFO TaskSetManager: Finished task 0.0 in stage 14.0 (TID
28) in 78 ms on localhost (executor driver) (1/1)
17/08/22 14:20:40 INFO TaskSchedulerImpl: Removed TaskSet 14.0, whose tasks
have all completed, from pool
17/08/22 14:20:40 INFO DAGScheduler: ResultStage 14 (head at
NaiveBayes.scala:154) finished in 0.081 s
17/08/22 14:20:40 INFO DAGScheduler: Job 14 finished: head at
NaiveBayes.scala:154, took 0.129266 s
17/08/22 14:20:40 INFO Instrumentation:
NaiveBayes-NaiveBayes_4836ba16aa3f054bc95c-1042799920-2:
{"numFeatures":262144}
17/08/22 14:20:40 INFO SparkContext: Starting job: collect at
NaiveBayes.scala:174
17/08/22 14:20:40 INFO DAGScheduler: Registering RDD 54 (map at
NaiveBayes.scala:162)
17/08/22 14:20:40 INFO DAGScheduler: Got job 15 (collect at
NaiveBayes.scala:174) with 2 output partitions
17/08/22 14:20:40 INFO DAGScheduler: Final stage: ResultStage 16 (collect
at NaiveBayes.scala:174)
17/08/22 14:20:40 INFO DAGScheduler: Parents of final stage:
List(ShuffleMapStage 15)
17/08/22 14:20:40 INFO DAGScheduler: Missing parents: List(ShuffleMapStage
15)
17/08/22 14:20:40 INFO DAGScheduler: Submitting ShuffleMapStage 15
(MapPartitionsRDD[54] at map at NaiveBayes.scala:162), which has no missing
parents
17/08/22 14:20:40 INFO BlockManagerInfo: Removed broadcast_26_piece0 on
192.168.25.187:46647 in memory (size: 22.0 KB, free: 366.3 MB)
17/08/22 14:20:40 INFO MemoryStore: Block broadcast_27 stored as values in
memory (estimated size 6.0 MB, free 358.2 MB)
17/08/22 14:20:40 INFO ContextCleaner: Cleaned accumulator 373
17/08/22 14:20:40 INFO ContextCleaner: Cleaned accumulator 374
17/08/22 14:20:40 INFO BlockManagerInfo: Removed broadcast_25_piece0 on
192.168.25.187:46647 in memory (size: 11.6 KB, free: 366.3 MB)
17/08/22 14:20:40 INFO MemoryStore: Block broadcast_27_piece0 stored as
bytes in memory (estimated size 45.0 KB, free 358.2 MB)
17/08/22 14:20:40 INFO BlockManagerInfo: Added broadcast_27_piece0 in
memory on 192.168.25.187:46647 (size: 45.0 KB, free: 366.2 MB)
17/08/22 14:20:40 INFO ContextCleaner: Cleaned accumulator 346
17/08/22 14:20:40 INFO ContextCleaner: Cleaned accumulator 345
17/08/22 14:20:40 INFO SparkContext: Created broadcast 27 from broadcast at
DAGScheduler.scala:1006
17/08/22 14:20:40 INFO DAGScheduler: Submitting 2 missing tasks from
ShuffleMapStage 15 (MapPartitionsRDD[54] at map at NaiveBayes.scala:162)
(first 15 tasks are for partitions Vector(0, 1))
17/08/22 14:20:40 INFO TaskSchedulerImpl: Adding task set 15.0 with 2 tasks
17/08/22 14:20:40 INFO TaskSetManager: Starting task 0.0 in stage 15.0 (TID
29, localhost, executor driver, partition 0, PROCESS_LOCAL, 4872 bytes)
17/08/22 14:20:40 INFO TaskSetManager: Starting task 1.0 in stage 15.0 (TID
30, localhost, executor driver, partition 1, PROCESS_LOCAL, 4881 bytes)
17/08/22 14:20:40 INFO Executor: Running task 1.0 in stage 15.0 (TID 30)
17/08/22 14:20:40 INFO Executor: Running task 0.0 in stage 15.0 (TID 29)
17/08/22 14:20:40 INFO PythonRunner: Times: total = 55, boot = -340, init =
394, finish = 1
17/08/22 14:20:40 INFO PythonRunner: Times: total = 47, boot = -962, init =
1009, finish = 0
17/08/22 14:20:40 INFO Executor: Finished task 1.0 in stage 15.0 (TID 30).
1867 bytes result sent to driver
17/08/22 14:20:40 INFO Executor: Finished task 0.0 in stage 15.0 (TID 29).
1867 bytes result sent to driver
17/08/22 14:20:40 INFO TaskSetManager: Finished task 1.0 in stage 15.0 (TID
30) in 220 ms on localhost (executor driver) (1/2)
17/08/22 14:20:40 INFO TaskSetManager: Finished task 0.0 in stage 15.0 (TID
29) in 224 ms on localhost (executor driver) (2/2)
17/08/22 14:20:40 INFO TaskSchedulerImpl: Removed TaskSet 15.0, whose tasks
have all completed, from pool
17/08/22 14:20:40 INFO DAGScheduler: ShuffleMapStage 15 (map at
NaiveBayes.scala:162) finished in 0.227 s
17/08/22 14:20:40 INFO DAGScheduler: looking for newly runnable stages
17/08/22 14:20:40 INFO DAGScheduler: running: Set()
17/08/22 14:20:40 INFO DAGScheduler: waiting: Set(ResultStage 16)
17/08/22 14:20:40 INFO DAGScheduler: failed: Set()
17/08/22 14:20:40 INFO DAGScheduler: Submitting ResultStage 16
(ShuffledRDD[55] at aggregateByKey at NaiveBayes.scala:163), which has no
missing parents
17/08/22 14:20:40 INFO MemoryStore: Block broadcast_28 stored as values in
memory (estimated size 6.0 MB, free 352.1 MB)
17/08/22 14:20:40 INFO MemoryStore: Block broadcast_28_piece0 stored as
bytes in memory (estimated size 45.3 KB, free 352.1 MB)
17/08/22 14:20:40 INFO BlockManagerInfo: Added broadcast_28_piece0 in
memory on 192.168.25.187:46647 (size: 45.3 KB, free: 366.2 MB)
17/08/22 14:20:40 INFO SparkContext: Created broadcast 28 from broadcast at
DAGScheduler.scala:1006
17/08/22 14:20:40 INFO DAGScheduler: Submitting 2 missing tasks from
ResultStage 16 (ShuffledRDD[55] at aggregateByKey at NaiveBayes.scala:163)
(first 15 tasks are for partitions Vector(0, 1))
17/08/22 14:20:40 INFO TaskSchedulerImpl: Adding task set 16.0 with 2 tasks
17/08/22 14:20:40 INFO TaskSetManager: Starting task 1.0 in stage 16.0 (TID
31, localhost, executor driver, partition 1, PROCESS_LOCAL, 4621 bytes)
17/08/22 14:20:40 INFO TaskSetManager: Starting task 0.0 in stage 16.0 (TID
32, localhost, executor driver, partition 0, ANY, 4621 bytes)
17/08/22 14:20:40 INFO Executor: Running task 0.0 in stage 16.0 (TID 32)
17/08/22 14:20:40 INFO Executor: Running task 1.0 in stage 16.0 (TID 31)
17/08/22 14:20:40 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty
blocks out of 2 blocks
17/08/22 14:20:40 INFO ShuffleBlockFetcherIterator: Started 0 remote
fetches in 13 ms
17/08/22 14:20:40 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty
blocks out of 2 blocks
17/08/22 14:20:40 INFO ShuffleBlockFetcherIterator: Started 0 remote
fetches in 19 ms
17/08/22 14:20:40 INFO Executor: Finished task 1.0 in stage 16.0 (TID 31).
1888 bytes result sent to driver
17/08/22 14:20:40 INFO TaskSetManager: Finished task 1.0 in stage 16.0 (TID
31) in 99 ms on localhost (executor driver) (1/2)
17/08/22 14:20:40 INFO MemoryStore: Block taskresult_32 stored as bytes in
memory (estimated size 4.0 MB, free 348.1 MB)
17/08/22 14:20:40 INFO BlockManagerInfo: Added taskresult_32 in memory on
192.168.25.187:46647 (size: 4.0 MB, free: 362.2 MB)
17/08/22 14:20:40 INFO Executor: Finished task 0.0 in stage 16.0 (TID 32).
4217031 bytes result sent via BlockManager)
17/08/22 14:20:40 INFO TaskSetManager: Finished task 0.0 in stage 16.0 (TID
32) in 217 ms on localhost (executor driver) (2/2)
17/08/22 14:20:40 INFO TaskSchedulerImpl: Removed TaskSet 16.0, whose tasks
have all completed, from pool
17/08/22 14:20:40 INFO DAGScheduler: ResultStage 16 (collect at
NaiveBayes.scala:174) finished in 0.225 s
17/08/22 14:20:40 INFO DAGScheduler: Job 15 finished: collect at
NaiveBayes.scala:174, took 0.589796 s
17/08/22 14:20:40 INFO BlockManagerInfo: Removed taskresult_32 on
192.168.25.187:46647 in memory (size: 4.0 MB, free: 366.2 MB)
17/08/22 14:20:40 INFO Instrumentation:
NaiveBayes-NaiveBayes_4836ba16aa3f054bc95c-1042799920-2: {"numClasses":2}
17/08/22 14:20:41 INFO Instrumentation:
NaiveBayes-NaiveBayes_4836ba16aa3f054bc95c-1042799920-2: training finished
17/08/22 14:20:41 INFO SparkSqlParser: Parsing command: 1 as id
17/08/22 14:20:41 INFO SparkSqlParser: Parsing command: CAST(value AS
STRING) as text
17/08/22 14:20:42 INFO BlockManagerInfo: Removed broadcast_27_piece0 on
192.168.25.187:46647 in memory (size: 45.0 KB, free: 366.2 MB)
17/08/22 14:20:42 INFO ContextCleaner: Cleaned accumulator 372
17/08/22 14:20:42 INFO BlockManagerInfo: Removed broadcast_28_piece0 on
192.168.25.187:46647 in memory (size: 45.3 KB, free: 366.3 MB)
17/08/22 14:20:42 INFO ContextCleaner: Cleaned accumulator 371
17/08/22 14:20:42 INFO ContextCleaner: Cleaned accumulator 400
17/08/22 14:20:42 INFO ContextCleaner: Cleaned accumulator 399
17/08/22 14:20:42 INFO ContextCleaner: Cleaned shuffle 0
printing schema
root
 |-- id: integer (nullable = false)
 |-- text: string (nullable = true)
 |-- prediction: double (nullable = true)

root
 |-- id: integer (nullable = false)
 |-- text: string (nullable = true)
 |-- prediction: double (nullable = true)

root
 |-- id: integer (nullable = false)
 |-- text: string (nullable = true)

17/08/22 14:20:42 INFO StreamExecution: Starting [id =
db779c29-4790-4b60-b035-f5701f6cc8dd, runId =
1996e5c6-aac3-45b2-871e-c2ba22c2da92]. Use
/tmp/temporary-5d7c55ec-4704-48ce-bf49-bb900f85a284 to store the query
checkpoint.
root
 |-- id: integer (nullable = false)
 |-- text: string (nullable = true)
 |-- prediction: double (nullable = true)

17/08/22 14:20:42 INFO ConsumerConfig: ConsumerConfig values:
        metric.reporters = []
        metadata.max.age.ms = 300000
        partition.assignment.strategy =
[org.apache.kafka.clients.consumer.RangeAssignor]
        reconnect.backoff.ms = 50
        sasl.kerberos.ticket.renew.window.factor = 0.8
        max.partition.fetch.bytes = 1048576
        bootstrap.servers = [localhost:9092]
        ssl.keystore.type = JKS
        enable.auto.commit = false
        sasl.mechanism = GSSAPI
        interceptor.classes = null
        exclude.internal.topics = true
        ssl.truststore.password = null
        client.id =
        ssl.endpoint.identification.algorithm = null
        max.poll.records = 1
        check.crcs = true
        request.timeout.ms = 40000
        heartbeat.interval.ms = 3000
        auto.commit.interval.ms = 5000
        receive.buffer.bytes = 65536
        ssl.truststore.type = JKS
        ssl.truststore.location = null
        ssl.keystore.password = null
        fetch.min.bytes = 1
        send.buffer.bytes = 131072
        value.deserializer = class
org.apache.kafka.common.serialization.ByteArrayDeserializer
        group.id =
spark-kafka-source-49941407-b415-4f5b-a584-5524a84e9066-1732634835-driver-0
        retry.backoff.ms = 100
        sasl.kerberos.kinit.cmd = /usr/bin/kinit
        sasl.kerberos.service.name = null
        sasl.kerberos.ticket.renew.jitter = 0.05
        ssl.trustmanager.algorithm = PKIX
        ssl.key.password = null
        fetch.max.wait.ms = 500
        sasl.kerberos.min.time.before.relogin = 60000
        connections.max.idle.ms = 540000
        session.timeout.ms = 30000
        metrics.num.samples = 2
        key.deserializer = class
org.apache.kafka.common.serialization.ByteArrayDeserializer
        ssl.protocol = TLS
        ssl.provider = null
        ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
        ssl.keystore.location = null
        ssl.cipher.suites = null
        security.protocol = PLAINTEXT
        ssl.keymanager.algorithm = SunX509
        metrics.sample.window.ms = 30000
        auto.offset.reset = earliest

17/08/22 14:20:42 INFO SparkSqlParser: Parsing command: CAST(id AS STRING)
as key
17/08/22 14:20:42 INFO SparkSqlParser: Parsing command: CONCAT(CAST(text AS
STRING),':', CAST(prediction AS STRING)) as value
17/08/22 14:20:42 INFO ConsumerConfig: ConsumerConfig values:
        metric.reporters = []
        metadata.max.age.ms = 300000
        partition.assignment.strategy =
[org.apache.kafka.clients.consumer.RangeAssignor]
        reconnect.backoff.ms = 50
        sasl.kerberos.ticket.renew.window.factor = 0.8
        max.partition.fetch.bytes = 1048576
        bootstrap.servers = [localhost:9092]
        ssl.keystore.type = JKS
        enable.auto.commit = false
        sasl.mechanism = GSSAPI
        interceptor.classes = null
        exclude.internal.topics = true
        ssl.truststore.password = null
        client.id = consumer-1
        ssl.endpoint.identification.algorithm = null
        max.poll.records = 1
        check.crcs = true
        request.timeout.ms = 40000
        heartbeat.interval.ms = 3000
        auto.commit.interval.ms = 5000
        receive.buffer.bytes = 65536
        ssl.truststore.type = JKS
        ssl.truststore.location = null
        ssl.keystore.password = null
        fetch.min.bytes = 1
        send.buffer.bytes = 131072
        value.deserializer = class
org.apache.kafka.common.serialization.ByteArrayDeserializer
        group.id =
spark-kafka-source-49941407-b415-4f5b-a584-5524a84e9066-1732634835-driver-0
        retry.backoff.ms = 100
        sasl.kerberos.kinit.cmd = /usr/bin/kinit
        sasl.kerberos.service.name = null
        sasl.kerberos.ticket.renew.jitter = 0.05
        ssl.trustmanager.algorithm = PKIX
        ssl.key.password = null
        fetch.max.wait.ms = 500
        sasl.kerberos.min.time.before.relogin = 60000
        connections.max.idle.ms = 540000
        session.timeout.ms = 30000
        metrics.num.samples = 2
        key.deserializer = class
org.apache.kafka.common.serialization.ByteArrayDeserializer
        ssl.protocol = TLS
        ssl.provider = null
        ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
        ssl.keystore.location = null
        ssl.cipher.suites = null
        security.protocol = PLAINTEXT
        ssl.keymanager.algorithm = SunX509
        metrics.sample.window.ms = 30000
        auto.offset.reset = earliest

17/08/22 14:20:42 INFO StreamExecution: Starting [id =
b4288c77-f554-4c98-bc63-5539db7d1869, runId =
3efa5ffc-230f-4819-a7d2-47ae0e6a8a43]. Use /tmp/kaf_tgt to store the query
checkpoint.
17/08/22 14:20:42 INFO ConsumerConfig: ConsumerConfig values:
        metric.reporters = []
        metadata.max.age.ms = 300000
        partition.assignment.strategy =
[org.apache.kafka.clients.consumer.RangeAssignor]
        reconnect.backoff.ms = 50
        sasl.kerberos.ticket.renew.window.factor = 0.8
        max.partition.fetch.bytes = 1048576
        bootstrap.servers = [localhost:9092]
        ssl.keystore.type = JKS
        enable.auto.commit = false
        sasl.mechanism = GSSAPI
        interceptor.classes = null
        exclude.internal.topics = true
        ssl.truststore.password = null
        client.id =
        ssl.endpoint.identification.algorithm = null
        max.poll.records = 1
        check.crcs = true
        request.timeout.ms = 40000
        heartbeat.interval.ms = 3000
        auto.commit.interval.ms = 5000
        receive.buffer.bytes = 65536
        ssl.truststore.type = JKS
        ssl.truststore.location = null
        ssl.keystore.password = null
        fetch.min.bytes = 1
        send.buffer.bytes = 131072
        value.deserializer = class
org.apache.kafka.common.serialization.ByteArrayDeserializer
        group.id =
spark-kafka-source-607b6303-bda0-4e93-b6fd-d46d4a4d2e1f--19863691-driver-0
        retry.backoff.ms = 100
        sasl.kerberos.kinit.cmd = /usr/bin/kinit
        sasl.kerberos.service.name = null
        sasl.kerberos.ticket.renew.jitter = 0.05
        ssl.trustmanager.algorithm = PKIX
        ssl.key.password = null
        fetch.max.wait.ms = 500
        sasl.kerberos.min.time.before.relogin = 60000
        connections.max.idle.ms = 540000
        session.timeout.ms = 30000
        metrics.num.samples = 2
        key.deserializer = class
org.apache.kafka.common.serialization.ByteArrayDeserializer
        ssl.protocol = TLS
        ssl.provider = null
        ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
        ssl.keystore.location = null
        ssl.cipher.suites = null
        security.protocol = PLAINTEXT
        ssl.keymanager.algorithm = SunX509
        metrics.sample.window.ms = 30000
        auto.offset.reset = earliest

17/08/22 14:20:42 INFO ConsumerConfig: ConsumerConfig values:
        metric.reporters = []
        metadata.max.age.ms = 300000
        partition.assignment.strategy =
[org.apache.kafka.clients.consumer.RangeAssignor]
        reconnect.backoff.ms = 50
        sasl.kerberos.ticket.renew.window.factor = 0.8
        max.partition.fetch.bytes = 1048576
        bootstrap.servers = [localhost:9092]
        ssl.keystore.type = JKS
        enable.auto.commit = false
        sasl.mechanism = GSSAPI
        interceptor.classes = null
        exclude.internal.topics = true
        ssl.truststore.password = null
        client.id = consumer-2
        ssl.endpoint.identification.algorithm = null
        max.poll.records = 1
        check.crcs = true
        request.timeout.ms = 40000
        heartbeat.interval.ms = 3000
        auto.commit.interval.ms = 5000
        receive.buffer.bytes = 65536
        ssl.truststore.type = JKS
        ssl.truststore.location = null
        ssl.keystore.password = null
        fetch.min.bytes = 1
        send.buffer.bytes = 131072
        value.deserializer = class
org.apache.kafka.common.serialization.ByteArrayDeserializer
        group.id =
spark-kafka-source-607b6303-bda0-4e93-b6fd-d46d4a4d2e1f--19863691-driver-0
        retry.backoff.ms = 100
        sasl.kerberos.kinit.cmd = /usr/bin/kinit
        sasl.kerberos.service.name = null
        sasl.kerberos.ticket.renew.jitter = 0.05
        ssl.trustmanager.algorithm = PKIX
        ssl.key.password = null
        fetch.max.wait.ms = 500
        sasl.kerberos.min.time.before.relogin = 60000
        connections.max.idle.ms = 540000
        session.timeout.ms = 30000
        metrics.num.samples = 2
        key.deserializer = class
org.apache.kafka.common.serialization.ByteArrayDeserializer
        ssl.protocol = TLS
        ssl.provider = null
        ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
        ssl.keystore.location = null
        ssl.cipher.suites = null
        security.protocol = PLAINTEXT
        ssl.keymanager.algorithm = SunX509
        metrics.sample.window.ms = 30000
        auto.offset.reset = earliest

17/08/22 14:20:42 INFO AppInfoParser: Kafka version : 0.10.0.1
17/08/22 14:20:42 INFO AppInfoParser: Kafka commitId : a7a17cdec9eaa6c5
17/08/22 14:20:42 INFO AppInfoParser: Kafka version : 0.10.0.1
17/08/22 14:20:42 INFO AppInfoParser: Kafka commitId : a7a17cdec9eaa6c5
17/08/22 14:20:42 INFO StreamExecution: Starting new streaming query.
17/08/22 14:20:43 ERROR StreamExecution: Query [id =
b4288c77-f554-4c98-bc63-5539db7d1869, runId =
3efa5ffc-230f-4819-a7d2-47ae0e6a8a43] terminated with error
java.lang.AssertionError: assertion failed
        at scala.Predef$.assert(Predef.scala:156)
        at
org.apache.spark.sql.execution.streaming.OffsetSeq.toStreamProgress(OffsetSeq.scala:38)
        at org.apache.spark.sql.execution.streaming.StreamExecution.org
$apache$spark$sql$execution$streaming$StreamExecution$$populateStartOffsets(StreamExecution.scala:429)
        at
org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(StreamExecution.scala:297)
        at
org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply(StreamExecution.scala:294)
        at
org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply(StreamExecution.scala:294)
        at
org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:279)
        at
org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
        at
org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1.apply$mcZ$sp(StreamExecution.scala:294)
        at
org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
        at org.apache.spark.sql.execution.streaming.StreamExecution.org
$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:290)
        at
org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:206)
17/08/22 14:20:43 INFO AbstractCoordinator: Discovered coordinator
Sandbox.RHEL:9092 (id: 2147483647 rack: null) for group
spark-kafka-source-49941407-b415-4f5b-a584-5524a84e9066-1732634835-driver-0.
17/08/22 14:20:43 INFO ConsumerCoordinator: Revoking previously assigned
partitions [] for group
spark-kafka-source-49941407-b415-4f5b-a584-5524a84e9066-1732634835-driver-0
17/08/22 14:20:43 INFO AbstractCoordinator: (Re-)joining group
spark-kafka-source-49941407-b415-4f5b-a584-5524a84e9066-1732634835-driver-0
17/08/22 14:20:43 INFO AbstractCoordinator: Successfully joined group
spark-kafka-source-49941407-b415-4f5b-a584-5524a84e9066-1732634835-driver-0
with generation 1
17/08/22 14:20:43 INFO ConsumerCoordinator: Setting newly assigned
partitions [spark-stream-0] for group
spark-kafka-source-49941407-b415-4f5b-a584-5524a84e9066-1732634835-driver-0
17/08/22 14:20:43 INFO KafkaSource: Initial offsets:
{"spark-stream":{"0":15}}
17/08/22 14:20:44 INFO StreamExecution: Committed offsets for batch 0.
Metadata OffsetSeqMetadata(0,1503391843742,Map(spark.sql.shuffle.partitions
-> 200))
17/08/22 14:20:44 INFO KafkaSource: GetBatch called with start = None, end
= {"spark-stream":{"0":15}}
17/08/22 14:20:44 INFO KafkaSource: Partitions added: Map()
17/08/22 14:20:44 INFO KafkaSource: GetBatch generating RDD of offset
range: KafkaSourceRDDOffsetRange(spark-stream-0,15,15,None)
-------------------------------------------
Batch: 0
-------------------------------------------
17/08/22 14:20:44 INFO CodeGenerator: Code generated in 26.485704 ms
17/08/22 14:20:44 INFO SparkContext: Starting job: start at
NativeMethodAccessorImpl.java:0
17/08/22 14:20:44 INFO DAGScheduler: Got job 16 (start at
NativeMethodAccessorImpl.java:0) with 1 output partitions
17/08/22 14:20:44 INFO DAGScheduler: Final stage: ResultStage 17 (start at
NativeMethodAccessorImpl.java:0)
17/08/22 14:20:44 INFO DAGScheduler: Parents of final stage: List()
17/08/22 14:20:44 INFO DAGScheduler: Missing parents: List()
17/08/22 14:20:44 INFO DAGScheduler: Submitting ResultStage 17
(MapPartitionsRDD[60] at start at NativeMethodAccessorImpl.java:0), which
has no missing parents
17/08/22 14:20:44 INFO MemoryStore: Block broadcast_29 stored as values in
memory (estimated size 8.9 KB, free 364.2 MB)
17/08/22 14:20:44 INFO MemoryStore: Block broadcast_29_piece0 stored as
bytes in memory (estimated size 4.8 KB, free 364.2 MB)
17/08/22 14:20:44 INFO BlockManagerInfo: Added broadcast_29_piece0 in
memory on 192.168.25.187:46647 (size: 4.8 KB, free: 366.3 MB)
17/08/22 14:20:44 INFO SparkContext: Created broadcast 29 from broadcast at
DAGScheduler.scala:1006
17/08/22 14:20:44 INFO DAGScheduler: Submitting 1 missing tasks from
ResultStage 17 (MapPartitionsRDD[60] at start at
NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions
Vector(0))
17/08/22 14:20:44 INFO TaskSchedulerImpl: Adding task set 17.0 with 1 tasks
17/08/22 14:20:44 INFO TaskSetManager: Starting task 0.0 in stage 17.0 (TID
33, localhost, executor driver, partition 0, PROCESS_LOCAL, 5007 bytes)
17/08/22 14:20:44 INFO Executor: Running task 0.0 in stage 17.0 (TID 33)
17/08/22 14:20:44 INFO ConsumerConfig: ConsumerConfig values:
        metric.reporters = []
        metadata.max.age.ms = 300000
        partition.assignment.strategy =
[org.apache.kafka.clients.consumer.RangeAssignor]
        reconnect.backoff.ms = 50
        sasl.kerberos.ticket.renew.window.factor = 0.8
        max.partition.fetch.bytes = 1048576
        bootstrap.servers = [localhost:9092]
        ssl.keystore.type = JKS
        enable.auto.commit = false
        sasl.mechanism = GSSAPI
        interceptor.classes = null
        exclude.internal.topics = true
        ssl.truststore.password = null
        client.id =
        ssl.endpoint.identification.algorithm = null
        max.poll.records = 2147483647
        check.crcs = true
        request.timeout.ms = 40000
        heartbeat.interval.ms = 3000
        auto.commit.interval.ms = 5000
        receive.buffer.bytes = 65536
        ssl.truststore.type = JKS
        ssl.truststore.location = null
        ssl.keystore.password = null
        fetch.min.bytes = 1
        send.buffer.bytes = 131072
        value.deserializer = class
org.apache.kafka.common.serialization.ByteArrayDeserializer
        group.id =
spark-kafka-source-49941407-b415-4f5b-a584-5524a84e9066-1732634835-executor
        retry.backoff.ms = 100
        sasl.kerberos.kinit.cmd = /usr/bin/kinit
        sasl.kerberos.service.name = null
        sasl.kerberos.ticket.renew.jitter = 0.05
        ssl.trustmanager.algorithm = PKIX
        ssl.key.password = null
        fetch.max.wait.ms = 500
        sasl.kerberos.min.time.before.relogin = 60000
        connections.max.idle.ms = 540000
        session.timeout.ms = 30000
        metrics.num.samples = 2
        key.deserializer = class
org.apache.kafka.common.serialization.ByteArrayDeserializer
        ssl.protocol = TLS
        ssl.provider = null
        ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
        ssl.keystore.location = null
        ssl.cipher.suites = null
        security.protocol = PLAINTEXT
        ssl.keymanager.algorithm = SunX509
        metrics.sample.window.ms = 30000
        auto.offset.reset = none

17/08/22 14:20:44 INFO ConsumerConfig: ConsumerConfig values:
        metric.reporters = []
        metadata.max.age.ms = 300000
        partition.assignment.strategy =
[org.apache.kafka.clients.consumer.RangeAssignor]
        reconnect.backoff.ms = 50
        sasl.kerberos.ticket.renew.window.factor = 0.8
        max.partition.fetch.bytes = 1048576
        bootstrap.servers = [localhost:9092]
        ssl.keystore.type = JKS
        enable.auto.commit = false
        sasl.mechanism = GSSAPI
        interceptor.classes = null
        exclude.internal.topics = true
        ssl.truststore.password = null
        client.id = consumer-3
        ssl.endpoint.identification.algorithm = null
        max.poll.records = 2147483647
        check.crcs = true
        request.timeout.ms = 40000
        heartbeat.interval.ms = 3000
        auto.commit.interval.ms = 5000
        receive.buffer.bytes = 65536
        ssl.truststore.type = JKS
        ssl.truststore.location = null
        ssl.keystore.password = null
        fetch.min.bytes = 1
        send.buffer.bytes = 131072
        value.deserializer = class
org.apache.kafka.common.serialization.ByteArrayDeserializer
        group.id =
spark-kafka-source-49941407-b415-4f5b-a584-5524a84e9066-1732634835-executor
        retry.backoff.ms = 100
        sasl.kerberos.kinit.cmd = /usr/bin/kinit
        sasl.kerberos.service.name = null
        sasl.kerberos.ticket.renew.jitter = 0.05
        ssl.trustmanager.algorithm = PKIX
        ssl.key.password = null
        fetch.max.wait.ms = 500
        sasl.kerberos.min.time.before.relogin = 60000
        connections.max.idle.ms = 540000
        session.timeout.ms = 30000
        metrics.num.samples = 2
        key.deserializer = class
org.apache.kafka.common.serialization.ByteArrayDeserializer
        ssl.protocol = TLS
        ssl.provider = null
        ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
        ssl.keystore.location = null
        ssl.cipher.suites = null
        security.protocol = PLAINTEXT
        ssl.keymanager.algorithm = SunX509
        metrics.sample.window.ms = 30000
        auto.offset.reset = none

17/08/22 14:20:44 INFO AppInfoParser: Kafka version : 0.10.0.1
17/08/22 14:20:44 INFO AppInfoParser: Kafka commitId : a7a17cdec9eaa6c5
17/08/22 14:20:44 INFO KafkaSourceRDD: Beginning offset 15 is the same as
ending offset skipping spark-stream 0
17/08/22 14:20:44 INFO CodeGenerator: Code generated in 27.911741 ms
17/08/22 14:20:44 INFO Executor: Finished task 0.0 in stage 17.0 (TID 33).
1117 bytes result sent to driver
17/08/22 14:20:44 INFO TaskSetManager: Finished task 0.0 in stage 17.0 (TID
33) in 69 ms on localhost (executor driver) (1/1)
17/08/22 14:20:44 INFO TaskSchedulerImpl: Removed TaskSet 17.0, whose tasks
have all completed, from pool
17/08/22 14:20:44 INFO DAGScheduler: ResultStage 17 (start at
NativeMethodAccessorImpl.java:0) finished in 0.071 s
17/08/22 14:20:44 INFO DAGScheduler: Job 16 finished: start at
NativeMethodAccessorImpl.java:0, took 0.087629 s
17/08/22 14:20:44 INFO SparkContext: Starting job: start at
NativeMethodAccessorImpl.java:0
17/08/22 14:20:44 INFO DAGScheduler: Got job 17 (start at
NativeMethodAccessorImpl.java:0) with 1 output partitions
17/08/22 14:20:44 INFO DAGScheduler: Final stage: ResultStage 18 (start at
NativeMethodAccessorImpl.java:0)
17/08/22 14:20:44 INFO DAGScheduler: Parents of final stage: List()
17/08/22 14:20:44 INFO DAGScheduler: Missing parents: List()
17/08/22 14:20:44 INFO DAGScheduler: Submitting ResultStage 18
(MapPartitionsRDD[64] at start at NativeMethodAccessorImpl.java:0), which
has no missing parents
17/08/22 14:20:44 INFO MemoryStore: Block broadcast_30 stored as values in
memory (estimated size 8.1 KB, free 364.2 MB)
17/08/22 14:20:44 INFO MemoryStore: Block broadcast_30_piece0 stored as
bytes in memory (estimated size 4.4 KB, free 364.2 MB)
17/08/22 14:20:44 INFO BlockManagerInfo: Added broadcast_30_piece0 in
memory on 192.168.25.187:46647 (size: 4.4 KB, free: 366.3 MB)
17/08/22 14:20:44 INFO SparkContext: Created broadcast 30 from broadcast at
DAGScheduler.scala:1006
17/08/22 14:20:44 INFO DAGScheduler: Submitting 1 missing tasks from
ResultStage 18 (MapPartitionsRDD[64] at start at
NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions
Vector(0))
17/08/22 14:20:44 INFO TaskSchedulerImpl: Adding task set 18.0 with 1 tasks
17/08/22 14:20:44 INFO TaskSetManager: Starting task 0.0 in stage 18.0 (TID
34, localhost, executor driver, partition 0, PROCESS_LOCAL, 4835 bytes)
17/08/22 14:20:44 INFO Executor: Running task 0.0 in stage 18.0 (TID 34)
17/08/22 14:20:44 INFO CodeGenerator: Code generated in 25.878504 ms
17/08/22 14:20:44 INFO Executor: Finished task 0.0 in stage 18.0 (TID 34).
999 bytes result sent to driver
17/08/22 14:20:44 INFO TaskSetManager: Finished task 0.0 in stage 18.0 (TID
34) in 47 ms on localhost (executor driver) (1/1)
17/08/22 14:20:44 INFO TaskSchedulerImpl: Removed TaskSet 18.0, whose tasks
have all completed, from pool
17/08/22 14:20:44 INFO DAGScheduler: ResultStage 18 (start at
NativeMethodAccessorImpl.java:0) finished in 0.047 s
17/08/22 14:20:44 INFO DAGScheduler: Job 17 finished: start at
NativeMethodAccessorImpl.java:0, took 0.057313 s
17/08/22 14:20:44 INFO SparkContext: Starting job: start at
NativeMethodAccessorImpl.java:0
17/08/22 14:20:44 INFO DAGScheduler: Got job 18 (start at
NativeMethodAccessorImpl.java:0) with 1 output partitions
17/08/22 14:20:44 INFO DAGScheduler: Final stage: ResultStage 19 (start at
NativeMethodAccessorImpl.java:0)
17/08/22 14:20:44 INFO DAGScheduler: Parents of final stage: List()
17/08/22 14:20:44 INFO DAGScheduler: Missing parents: List()
17/08/22 14:20:44 INFO DAGScheduler: Submitting ResultStage 19
(MapPartitionsRDD[64] at start at NativeMethodAccessorImpl.java:0), which
has no missing parents
17/08/22 14:20:44 INFO MemoryStore: Block broadcast_31 stored as values in
memory (estimated size 8.1 KB, free 364.2 MB)
17/08/22 14:20:44 INFO MemoryStore: Block broadcast_31_piece0 stored as
bytes in memory (estimated size 4.4 KB, free 364.2 MB)
17/08/22 14:20:44 INFO BlockManagerInfo: Added broadcast_31_piece0 in
memory on 192.168.25.187:46647 (size: 4.4 KB, free: 366.3 MB)
17/08/22 14:20:44 INFO SparkContext: Created broadcast 31 from broadcast at
DAGScheduler.scala:1006
17/08/22 14:20:44 INFO DAGScheduler: Submitting 1 missing tasks from
ResultStage 19 (MapPartitionsRDD[64] at start at
NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions
Vector(1))
17/08/22 14:20:44 INFO TaskSchedulerImpl: Adding task set 19.0 with 1 tasks
17/08/22 14:20:44 INFO TaskSetManager: Starting task 0.0 in stage 19.0 (TID
35, localhost, executor driver, partition 1, PROCESS_LOCAL, 4835 bytes)
17/08/22 14:20:44 INFO Executor: Running task 0.0 in stage 19.0 (TID 35)
17/08/22 14:20:44 INFO Executor: Finished task 0.0 in stage 19.0 (TID 35).
999 bytes result sent to driver
17/08/22 14:20:44 INFO TaskSetManager: Finished task 0.0 in stage 19.0 (TID
35) in 15 ms on localhost (executor driver) (1/1)
17/08/22 14:20:44 INFO TaskSchedulerImpl: Removed TaskSet 19.0, whose tasks
have all completed, from pool
17/08/22 14:20:44 INFO DAGScheduler: ResultStage 19 (start at
NativeMethodAccessorImpl.java:0) finished in 0.015 s
17/08/22 14:20:44 INFO DAGScheduler: Job 18 finished: start at
NativeMethodAccessorImpl.java:0, took 0.024716 s
+---+----+
| id|text|
+---+----+
+---+----+

17/08/22 14:20:45 INFO StreamExecution: Streaming query made progress: {
  "id" : "db779c29-4790-4b60-b035-f5701f6cc8dd",
  "runId" : "1996e5c6-aac3-45b2-871e-c2ba22c2da92",
  "name" : null,
  "timestamp" : "2017-08-22T08:50:42.983Z",
  "numInputRows" : 0,
  "processedRowsPerSecond" : 0.0,
  "durationMs" : {
    "addBatch" : 462,
    "getBatch" : 85,
    "getOffset" : 631,
    "queryPlanning" : 201,
    "triggerExecution" : 1891,
    "walCommit" : 301
  },
  "stateOperators" : [ ],
  "sources" : [ {
    "description" : "KafkaSource[Subscribe[spark-stream]]",
    "startOffset" : null,
    "endOffset" : {
      "spark-stream" : {
        "0" : 15
      }
    },
    "numInputRows" : 0,
    "processedRowsPerSecond" : 0.0
  } ],
  "sink" : {
    "description" :
"org.apache.spark.sql.execution.streaming.ConsoleSink@68e8a192"
  }
}
17/08/22 14:20:45 INFO StreamExecution: Streaming query made progress: {
  "id" : "db779c29-4790-4b60-b035-f5701f6cc8dd",
  "runId" : "1996e5c6-aac3-45b2-871e-c2ba22c2da92",
  "name" : null,
  "timestamp" : "2017-08-22T08:50:45.199Z",
  "numInputRows" : 0,
  "inputRowsPerSecond" : 0.0,
  "processedRowsPerSecond" : 0.0,
  "durationMs" : {
    "getOffset" : 3,
    "triggerExecution" : 3
  },
  "stateOperators" : [ ],
  "sources" : [ {
    "description" : "KafkaSource[Subscribe[spark-stream]]",
    "startOffset" : {
      "spark-stream" : {
        "0" : 15
      }
    },
    "endOffset" : {
      "spark-stream" : {
        "0" : 15
      }
    },
    "numInputRows" : 0,
    "inputRowsPerSecond" : 0.0,
    "processedRowsPerSecond" : 0.0
  } ],
  "sink" : {
    "description" :
"org.apache.spark.sql.execution.streaming.ConsoleSink@68e8a192"
  }
}
Traceback (most recent call last):
  File "/home/jatin/projects/app/kafka_ml.py", line 67, in <module>
    kafka_write.awaitTermination()
  File
"/home/jatin/projects/spark/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/streaming.py",
line 106, in awaitTermination
  File
"/home/jatin/projects/spark/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py",
line 1133, in __call__
  File
"/home/jatin/projects/spark/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/utils.py",
line 75, in deco
pyspark.sql.utils.StreamingQueryException: u'assertion failed\n===
Streaming Query ===\nIdentifier: [id =
b4288c77-f554-4c98-bc63-5539db7d1869, runId =
3efa5ffc-230f-4819-a7d2-47ae0e6a8a43]\nCurrent Committed Offsets:
{}\nCurrent Available Offsets: {}\n\nCurrent State: ACTIVE\nThread State:
RUNNABLE\n\nLogical Plan:\nProject [cast(id#174 as string) AS key#326,
concat(cast(text#175 as string), :, cast(prediction#224 as string)) AS
value#327]\n+- Union\n   :- Project [id#174, text#175, prediction#224]\n
:  +- Project [id#174, text#175, words#179, relevant_words#184,
rawFeatures#190, features#197, rawPrediction#205, probability#214,
UDF(rawPrediction#205) AS prediction#224]\n   :     +- Project [id#174,
text#175, words#179, relevant_words#184, rawFeatures#190, features#197,
rawPrediction#205, UDF(rawPrediction#205) AS probability#214]\n   :
 +- Project [id#174, text#175, words#179, relevant_words#184,
rawFeatures#190, features#197, UDF(features#197) AS rawPrediction#205]\n
:           +- Project [id#174, text#175, words#179, relevant_words#184,
rawFeatures#190, UDF(rawFeatures#190) AS features#197]\n   :
 +- Project [id#174, text#175, words#179, relevant_words#184,
UDF(relevant_words#184) AS rawFeatures#190]\n   :                 +-
Project [id#174, text#175, words#179, UDF(words#179) AS
relevant_words#184]\n   :                    +- Project [id#174, text#175,
UDF(text#175) AS words#179]\n   :                       +- Project [1 AS
id#174, cast(value#160 as string) AS text#175]\n   :
   +- StreamingExecutionRelation KafkaSource[Subscribe[spark-stream]],
[key#159, value#160, topic#161, partition#162, offset#163L, timestamp#164,
timestampType#165]\n   +- Project [id#174, text#175, prediction#290]\n
 +- Project [id#174, text#175, words#245, relevant_words#250,
rawFeatures#256, features#263, rawPrediction#271, probability#280,
UDF(rawPrediction#271) AS prediction#290]\n         +- Project [id#174,
text#175, words#245, relevant_words#250, rawFeatures#256, features#263,
rawPrediction#271, UDF(rawPrediction#271) AS probability#280]\n
 +- Project [id#174, text#175, words#245, relevant_words#250,
rawFeatures#256, features#263, UDF(features#263) AS rawPrediction#271]\n
            +- Project [id#174, text#175, words#245, relevant_words#250,
rawFeatures#256, UDF(rawFeatures#256) AS features#263]\n
 +- Project [id#174, text#175, words#245, relevant_words#250,
UDF(relevant_words#250) AS rawFeatures#256]\n                     +-
Project [id#174, text#175, words#245, UDF(words#245) AS
relevant_words#250]\n                        +- Project [id#174, text#175,
UDF(text#175) AS words#245]\n                           +- Project [1 AS
id#174, cast(value#160 as string) AS text#175]\n
   +- StreamingExecutionRelation KafkaSource[Subscribe[spark-stream]],
[key#159, value#160, topic#161, partition#162, offset#163L, timestamp#164,
timestampType#165]\n'
17/08/22 14:20:45 INFO SparkContext: Invoking stop() from shutdown hook
17/08/22 14:20:45 INFO SparkUI: Stopped Spark web UI at
http://192.168.25.187:4040
17/08/22 14:20:45 INFO MapOutputTrackerMasterEndpoint:
MapOutputTrackerMasterEndpoint stopped!
17/08/22 14:20:45 INFO MemoryStore: MemoryStore cleared
17/08/22 14:20:45 INFO BlockManager: BlockManager stopped
17/08/22 14:20:45 INFO BlockManagerMaster: BlockManagerMaster stopped
17/08/22 14:20:45 INFO
OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
OutputCommitCoordinator stopped!
17/08/22 14:20:45 INFO SparkContext: Successfully stopped SparkContext
17/08/22 14:20:45 INFO ShutdownHookManager: Shutdown hook called
17/08/22 14:20:45 INFO ShutdownHookManager: Deleting directory
/tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7/pyspark-146b573e-66f7-41af-bf89-dc2b5a426c75
17/08/22 14:20:45 INFO ShutdownHookManager: Deleting directory
/tmp/spark-9474eda2-128d-4b00-8f99-dc02012a8bf7
17/08/22 14:20:45 INFO ShutdownHookManager: Deleting directory
/tmp/temporary-5d7c55ec-4704-48ce-bf49-bb900f85a284
jatin@Sandbox.RHEL:/home/jatin/projects/app#vi kafka_ml.py
jatin@Sandbox.RHEL:/home/jatin/projects/app#
jatin@Sandbox.RHEL:/home/jatin/projects/app#

Mime
View raw message