spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ayan guha <guha.a...@gmail.com>
Subject Re: Can spark handle this scenario?
Date Sat, 17 Feb 2018 04:58:45 GMT
** You do NOT need dataframes, I mean.....

On Sat, Feb 17, 2018 at 3:58 PM, ayan guha <guha.ayan@gmail.com> wrote:

> Hi
>
> Couple of suggestions:
>
> 1. Do not use Dataset, use Dataframe in this scenario. There is no benefit
> of dataset features here. Using Dataframe, you can write an arbitrary UDF
> which can do what you want to do.
> 2. In fact you do need dataframes here. You would be better off with RDD
> here. just create a RDD of symbols and use map to do the processing.
>
> On Sat, Feb 17, 2018 at 12:40 PM, Irving Duran <irving.duran@gmail.com>
> wrote:
>
>> Do you only want to use Scala? Because otherwise, I think with pyspark
>> and pandas read table you should be able to accomplish what you want to
>> accomplish.
>>
>> Thank you,
>>
>> Irving Duran
>>
>> On 02/16/2018 06:10 PM, Lian Jiang wrote:
>>
>> Hi,
>>
>> I have a user case:
>>
>> I want to download S&P500 stock data from Yahoo API in parallel using
>> Spark. I have got all stock symbols as a Dataset. Then I used below code to
>> call Yahoo API for each symbol:
>>
>>
>>
>> case class Symbol(symbol: String, sector: String)
>>
>> case class Tick(symbol: String, sector: String, open: Double, close:
>> Double)
>>
>>
>> // symbolDS is Dataset[Symbol], pullSymbolFromYahoo returns Dataset[Tick]
>>
>>
>>     symbolDs.map { k =>
>>
>>       pullSymbolFromYahoo(k.symbol, k.sector)
>>
>>     }
>>
>>
>> This statement cannot compile:
>>
>>
>> Unable to find encoder for type stored in a Dataset.  Primitive types
>> (Int, String, etc) and Product types (case classes) are supported by
>> importing spark.implicits._  Support for serializing other types will be
>> added in future releases.
>>
>>
>> My questions are:
>>
>>
>> 1. As you can see, this scenario is not traditional dataset handling such
>> as count, sql query... Instead, it is more like a UDF which apply random
>> operation on each record. Is Spark good at handling such scenario?
>>
>>
>> 2. Regarding the compilation error, any fix? I did not find a
>> satisfactory solution online.
>>
>>
>> Thanks for help!
>>
>>
>>
>>
>>
>>
>
>
> --
> Best Regards,
> Ayan Guha
>



-- 
Best Regards,
Ayan Guha

Mime
View raw message