spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Serega Sheypak <serega.shey...@gmail.com>
Subject Re: spark sql and cassandra. spark generate 769 tasks to read 3 lines from cassandra table
Date Wed, 17 Jun 2015 08:37:03 GMT
Hi, can somebody suggest me the way to reduce quantity of task?

2015-06-15 18:26 GMT+02:00 Serega Sheypak <serega.sheypak@gmail.com>:

> Hi, I'm running spark sql against Cassandra table. I have 3 C* nodes, Each
> of them has spark worker.
> The problem is that spark runs 869 task to read 3 lines: select bar from
> foo.
> I've tried these properties:
>
> #try to avoid 769 tasks per dummy select foo from bar qeury
> spark.cassandra.input.split.size_in_mb=32mb
> spark.cassandra.input.fetch.size_in_rows=1000
> spark.cassandra.input.split.size=10000
>
> but it doesn't help.
>
> Here are  mean metrics for the job :
> input1= 8388608.0 TB
> input2 = -320 B
> input3 = -400 B
>
> I'm confused with input, there are only 3 rows in C* table.
> Definitely, I don't have 8388608.0 TB of data :)
>
>
>
>

Mime
View raw message