spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ajay Chander <itsche...@gmail.com>
Subject Re: Facing issue with floor function in spark SQL query
Date Fri, 04 Mar 2016 12:35:14 GMT
Hi Ashok,

Try using hivecontext instead of sqlcontext. I suspect sqlcontext doesnot
have that functionality. Let me know if it works.

Thanks,
Ajay

On Friday, March 4, 2016, ashokkumar rajendran <
ashokkumar.rajendran@gmail.com> wrote:

> Hi Ayan,
>
> Thanks for the response. I am using SQL query (not Dataframe). Could you
> please explain how I should import this sql function to it? Simply
> importing this class to my driver code does not help here.
>
> Many functions that I need are already there in the sql.functions so I do
> not want to rewrite them.
>
> Regards
> Ashok
>
> On Fri, Mar 4, 2016 at 3:52 PM, ayan guha <guha.ayan@gmail.com
> <javascript:_e(%7B%7D,'cvml','guha.ayan@gmail.com');>> wrote:
>
>> Most likely you are missing import of  org.apache.spark.sql.functions.
>>
>> In any case, you can write your own function for floor and use it as UDF.
>>
>> On Fri, Mar 4, 2016 at 7:34 PM, ashokkumar rajendran <
>> ashokkumar.rajendran@gmail.com
>> <javascript:_e(%7B%7D,'cvml','ashokkumar.rajendran@gmail.com');>> wrote:
>>
>>> Hi,
>>>
>>> I load json file that has timestamp (as long in milliseconds) and
>>> several other attributes. I would like to group them by 5 minutes and store
>>> them as separate file.
>>>
>>> I am facing couple of problems here..
>>> 1. Using Floor function at select clause (to bucket by 5mins) gives me
>>> error saying "java.util.NoSuchElementException: key not found: floor". How
>>> do I use floor function in select clause? I see that floor method is
>>> available in org.apache.spark.sql.functions clause but not sure why its not
>>> working here.
>>> 2. Can I use the same in Group by clause?
>>> 3. How do I store them as separate file after grouping them?
>>>
>>>         String logPath = "my-json.gz";
>>>         DataFrame logdf = sqlContext.read().json(logPath);
>>>         logdf.registerTempTable("logs");
>>>         DataFrame bucketLogs = sqlContext.sql("Select `user.timestamp`
>>> as rawTimeStamp, `user.requestId` as requestId,
>>> *floor(`user.timestamp`/72000*) as timeBucket FROM logs");
>>>         bucketLogs.toJSON().saveAsTextFile("target_file");
>>>
>>> Regards
>>> Ashok
>>>
>>
>>
>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>
>

Mime
View raw message