spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rishikesh Gawade <rishikeshg1...@gmail.com>
Subject Re: Hive external table not working in sparkSQL when subdirectories are present
Date Wed, 07 Aug 2019 19:14:41 GMT
Hi,
I did not explicitly create a Hive Context. I have been using the
spark.sqlContext that gets created upon launching the spark-shell.
Isn't this sqlContext same as the hiveContext?
Thanks,
Rishikesh

On Wed, Aug 7, 2019 at 12:43 PM Jörn Franke <jornfranke@gmail.com> wrote:

> Do you use the HiveContext in Spark? Do you configure the same options
> there? Can you share some code?
>
> Am 07.08.2019 um 08:50 schrieb Rishikesh Gawade <rishikeshg1996@gmail.com
> >:
>
> Hi.
> I am using Spark 2.3.2 and Hive 3.1.0.
> Even if i use parquet files the result would be same, because after all
> sparkSQL isn't able to descend into the subdirectories over which the table
> is created. Could there be any other way?
> Thanks,
> Rishikesh
>
> On Tue, Aug 6, 2019, 1:03 PM Mich Talebzadeh <mich.talebzadeh@gmail.com>
> wrote:
>
>> which versions of Spark and Hive are you using.
>>
>> what will happen if you use parquet tables instead?
>>
>> HTH
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Tue, 6 Aug 2019 at 07:58, Rishikesh Gawade <rishikeshg1996@gmail.com>
>> wrote:
>>
>>> Hi.
>>> I have built a Hive external table on top of a directory 'A' which has
>>> data stored in ORC format. This directory has several subdirectories inside
>>> it, each of which contains the actual ORC files.
>>> These subdirectories are actually created by spark jobs which ingest
>>> data from other sources and write it into this directory.
>>> I tried creating a table and setting the table properties of the same as
>>> *hive.mapred.supports.subdirectories=TRUE* and
>>> *mapred.input.dir.recursive**=TRUE*.
>>> As a result of this, when i fire the simplest query of *select count(*)
>>> from ExtTable* via the Hive CLI, it successfully gives me the expected
>>> count of records in the table.
>>> However, when i fire the same query via sparkSQL, i get count = 0.
>>>
>>> I think the sparkSQL isn't able to descend into the subdirectories for
>>> getting the data while hive is able to do so.
>>> Are there any configurations needed to be set on the spark side so that
>>> this works as it does via hive cli?
>>> I am using Spark on YARN.
>>>
>>> Thanks,
>>> Rishikesh
>>>
>>> Tags: subdirectories, subdirectory, recursive, recursion, hive external
>>> table, orc, sparksql, yarn
>>>
>>

Mime
View raw message