spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Artz <michaelea...@gmail.com>
Subject Re: Read parquet files as buckets
Date Wed, 01 Nov 2017 12:41:14 GMT
Hi,
   What about the DAG can you send that as well?  From the resulting
"write" call?

On Wed, Nov 1, 2017 at 5:44 AM, אורן שמון <oren.shamun@gmail.com> wrote:

> The version is 2.2.0 .
> The code for the write is :
> sortedApiRequestLogsDataSet.write
>       .bucketBy(numberOfBuckets, "userId")
>       .mode(SaveMode.Overwrite)
>       .format("parquet")
>       .option("path", outputPath + "/")
>       .option("compression", "snappy")
>       .saveAsTable("sorted_api_logs")
>
> And code for the read :
> val df = sparkSession.read.parquet(path).toDF()
>
> The read code run on other cluster than the write .
>
>
>
>
> On Tue, Oct 31, 2017 at 7:02 PM Michael Artz <michaeleartz@gmail.com>
> wrote:
>
>> What version of spark?  Do you have code sample?  Screen shot of the DAG
>> or the printout from .explain?
>>
>> On Tue, Oct 31, 2017 at 11:01 AM, אורן שמון <oren.shamun@gmail.com>
>> wrote:
>>
>>> Hi all,
>>> I have Parquet files as result from some job , the job saved them in
>>> bucket mode by userId . How can I read the files in bucket mode in another
>>> job ? I tried to read it but it didnt bucket the data (same user in same
>>> partition)
>>>
>>
>>

Mime
View raw message