spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shmuel Blitz <shmuel.bl...@similarweb.com>
Subject Re: Spark Dataframe and HIVE
Date Sun, 11 Feb 2018 07:36:23 GMT
Please run the following command, and paste the result:
SHOW CREATE TABLE <<TABLE-NAME>>

On Sun, Feb 11, 2018 at 7:56 AM, ☼ R Nair (रविशंकर नायर) <
ravishankar.nair@gmail.com> wrote:

> No, No luck.
>
> Thanks
>
> On Sun, Feb 11, 2018 at 12:48 AM, Deepak Sharma <deepakmca05@gmail.com>
> wrote:
>
>> In hive cli:
>> msck repair table 《table_name》;
>>
>> Thanks
>> Deepak
>>
>> On Feb 11, 2018 11:14, "☼ R Nair (रविशंकर नायर)" <
>> ravishankar.nair@gmail.com> wrote:
>>
>>> NO, can you pease explain the command ? Let me try now.
>>>
>>> Best,
>>>
>>> On Sun, Feb 11, 2018 at 12:40 AM, Deepak Sharma <deepakmca05@gmail.com>
>>> wrote:
>>>
>>>> I am not sure about the exact issue bjt i see you are partioning while
>>>> writing from spark.
>>>> Did you tried msck repair on the table before reading it in hive ?
>>>>
>>>> Thanks
>>>> Deepak
>>>>
>>>> On Feb 11, 2018 11:06, "☼ R Nair (रविशंकर नायर)"
<
>>>> ravishankar.nair@gmail.com> wrote:
>>>>
>>>>> All,
>>>>>
>>>>> Thanks for the inputs. Again I am not successful. I think, we need to
>>>>> resolve this, as this is a very common requirement.
>>>>>
>>>>> Please go through my complete code:
>>>>>
>>>>> STEP 1:  Started Spark shell as spark-shell --master yarn
>>>>>
>>>>> STEP 2: Flowing code is being given as inout to shark shell
>>>>>
>>>>> import org.apache.spark.sql.Row
>>>>> import org.apache.spark.sql.SparkSession
>>>>> val warehouseLocation ="/user/hive/warehouse"
>>>>>
>>>>> val spark = SparkSession.builder().appName("Spark Hive
>>>>> Example").config("spark.sql.warehouse.dir",
>>>>> warehouseLocation).enableHiveSupport().getOrCreate()
>>>>>
>>>>> import org.apache.spark.sql._
>>>>> var passion_df = spark.read.
>>>>> format("jdbc").
>>>>> option("url", "jdbc:mysql://localhost:3307/policies").
>>>>> option("driver" ,"com.mysql.jdbc.Driver").
>>>>> option("user", "root").
>>>>> option("password", "root").
>>>>> option("dbtable", "insurancedetails").
>>>>> option("partitionColumn", "policyid").
>>>>> option("lowerBound", "1").
>>>>> option("upperBound", "100000").
>>>>> option("numPartitions", "4").
>>>>> load()
>>>>> //Made sure that passion_df is created, as passion_df.show(5) shows me
>>>>> correct data.
>>>>> passion_df.write.saveAsTable("default.mine") //Default parquet
>>>>>
>>>>> STEP 3: Went to HIVE. Started HIVE prompt.
>>>>>
>>>>> hive> show tables;
>>>>> OK
>>>>> callcentervoicelogs
>>>>> mine
>>>>> Time taken: 0.035 seconds, Fetched: 2 row(s)
>>>>> //As you can see HIVE is showing the table "mine" in default schema.
>>>>>
>>>>> STEP 4: HERE IS THE PROBLEM.
>>>>>
>>>>> hive> select * from mine;
>>>>> OK
>>>>> Time taken: 0.354 seconds
>>>>> hive>
>>>>> //Where is the data ???
>>>>>
>>>>> STEP 5:
>>>>>
>>>>> See the below command on HIVE
>>>>>
>>>>> describe formatted mine;
>>>>> OK
>>>>> # col_name             data_type           comment
>>>>>
>>>>> policyid             int
>>>>> statecode           string
>>>>> socialid             string
>>>>> county               string
>>>>> eq_site_limit       decimal(10,2)
>>>>> hu_site_limit       decimal(10,2)
>>>>> fl_site_limit       decimal(10,2)
>>>>> fr_site_limit       decimal(10,2)
>>>>> tiv_2014             decimal(10,2)
>>>>> tiv_2015             decimal(10,2)
>>>>> eq_site_deductible   int
>>>>> hu_site_deductible   int
>>>>> fl_site_deductible   int
>>>>> fr_site_deductible   int
>>>>> latitude             decimal(6,6)
>>>>> longitude           decimal(6,6)
>>>>> line                 string
>>>>> construction         string
>>>>> point_granularity   int
>>>>>
>>>>> # Detailed Table Information
>>>>> Database:           default
>>>>> Owner:               ravishankarnair
>>>>> CreateTime:         Sun Feb 11 00:26:40 EST 2018
>>>>> LastAccessTime:     UNKNOWN
>>>>> Protect Mode:       None
>>>>> Retention:           0
>>>>> Location:           file:/Users/ravishankarnair/spark-warehouse/mine
>>>>> Table Type:         MANAGED_TABLE
>>>>> Table Parameters:
>>>>> spark.sql.sources.provider parquet
>>>>> spark.sql.sources.schema.numParts 1
>>>>> spark.sql.sources.schema.part.0 {\"type\":\"struct\",\"fields\
>>>>> ":[{\"name\":\"policyid\",\"type\":\"integer\",\"nullable\":
>>>>> true,\"metadata\":{\"name\":\"policyid\",\"scale\":0}},{\"na
>>>>> me\":\"statecode\",\"type\":\"string\",\"nullable\":true,\"m
>>>>> etadata\":{\"name\":\"statecode\",\"scale\":0}},{\"name\":\"
>>>>> Socialid\",\"type\":\"string\",\"nullable\":true,\"metadata\
>>>>> ":{\"name\":\"Socialid\",\"scale\":0}},{\"name\":\"county\",
>>>>> \"type\":\"string\",\"nullable\":true,\"metadata\":{\"name\"
>>>>> :\"county\",\"scale\":0}},{\"name\":\"eq_site_limit\",\"type
>>>>> \":\"decimal(10,2)\",\"nullable\":true,\"metadata\":{\"name\
>>>>> ":\"eq_site_limit\",\"scale\":2}},{\"name\":\"hu_site_limit\
>>>>> ",\"type\":\"decimal(10,2)\",\"nullable\":true,\"metadata\":
>>>>> {\"name\":\"hu_site_limit\",\"scale\":2}},{\"name\":\"fl_
>>>>> site_limit\",\"type\":\"decimal(10,2)\",\"nullable\":
>>>>> true,\"metadata\":{\"name\":\"fl_site_limit\",\"scale\":2}},
>>>>> {\"name\":\"fr_site_limit\",\"type\":\"decimal(10,2)\",\"
>>>>> nullable\":true,\"metadata\":{\"name\":\"fr_site_limit\",\"
>>>>> scale\":2}},{\"name\":\"tiv_2014\",\"type\":\"decimal(10,
>>>>> 2)\",\"nullable\":true,\"metadata\":{\"name\":\"tiv_
>>>>> 2014\",\"scale\":2}},{\"name\":\"tiv_2015\",\"type\":\"
>>>>> decimal(10,2)\",\"nullable\":true,\"metadata\":{\"name\":\"t
>>>>> iv_2015\",\"scale\":2}},{\"name\":\"eq_site_deductible\",\"
>>>>> type\":\"integer\",\"nullable\":true,\"metadata\":{\"name\":
>>>>> \"eq_site_deductible\",\"scale\":0}},{\"name\":\"hu_
>>>>> site_deductible\",\"type\":\"integer\",\"nullable\":true,\"
>>>>> metadata\":{\"name\":\"hu_site_deductible\",\"scale\":0}},{\
>>>>> "name\":\"fl_site_deductible\",\"type\":\"integer\",\"
>>>>> nullable\":true,\"metadata\":{\"name\":\"fl_site_deductible\
>>>>> ",\"scale\":0}},{\"name\":\"fr_site_deductible\",\"type\":
>>>>> \"integer\",\"nullable\":true,\"metadata\":{\"name\":\"fr_
>>>>> site_deductible\",\"scale\":0}},{\"name\":\"latitude\",\"
>>>>> type\":\"decimal(6,6)\",\"nullable\":true,\"metadata\":{
>>>>> \"name\":\"latitude\",\"scale\":6}},{\"name\":\"longitude\",
>>>>> \"type\":\"decimal(6,6)\",\"nullable\":true,\"metadata\":{
>>>>> \"name\":\"longitude\",\"scale\":6}},{\"name\":\"line\"
>>>>> ,\"type\":\"string\",\"nullable\":true,\"metadata\":{
>>>>> \"name\":\"line\",\"scale\":0}},{\"name\":\"construction\",\
>>>>> "type\":\"string\",\"nullable\":true,\"metadata\":{\"name\":
>>>>> \"construction\",\"scale\":0}},{\"name\":\"point_granularity
>>>>> \",\"type\":\"integer\",\"nullable\":true,\"metadata\":{
>>>>> \"name\":\"point_granularity\",\"scale\":0}}]}
>>>>> transient_lastDdlTime 1518326800
>>>>>
>>>>> # Storage Information
>>>>> SerDe Library:       org.apache.hadoop.hive.ql.io.p
>>>>> arquet.serde.ParquetHiveSerDe
>>>>> InputFormat:         org.apache.hadoop.hive.ql.io.p
>>>>> arquet.MapredParquetInputFormat
>>>>> OutputFormat:       org.apache.hadoop.hive.ql.io.p
>>>>> arquet.MapredParquetOutputFormat
>>>>> Compressed:         No
>>>>> Num Buckets:         -1
>>>>> Bucket Columns:     []
>>>>> Sort Columns:       []
>>>>> Storage Desc Params:
>>>>> path                 hdfs://localhost:8020/user/hive/warehouse/mine
>>>>> serialization.format 1
>>>>> Time taken: 0.077 seconds, Fetched: 48 row(s)
>>>>>
>>>>> Now, I see your advise and support. Whats the issue? Am I doing wrong,
>>>>> it it a bug ? I am using Spark 2.2.1, HIVE 1.2.1, HADOOP 2.7.3. All class
>>>>> path, configuration are set properly.
>>>>>
>>>>> Best,
>>>>>
>>>>> Ravion
>>>>>
>>>>> On Fri, Feb 9, 2018 at 1:29 PM, Nicholas Hakobian <
>>>>> nicholas.hakobian@rallyhealth.com> wrote:
>>>>>
>>>>>> Its possible that the format of your table is not compatible with
>>>>>> your version of hive, so Spark saved it in a way such that only Spark
can
>>>>>> read it. When this happens it prints out a very visible warning letting
you
>>>>>> know this has happened.
>>>>>>
>>>>>> We've seen it most frequently when trying to save a parquet file
with
>>>>>> a column in date format into a Hive table. In older versions of hive,
its
>>>>>> parquet reader/writer did not support Date formats (among a couple
others).
>>>>>>
>>>>>> Nicholas Szandor Hakobian, Ph.D.
>>>>>> Staff Data Scientist
>>>>>> Rally Health
>>>>>> nicholas.hakobian@rallyhealth.com
>>>>>>
>>>>>>
>>>>>> On Fri, Feb 9, 2018 at 9:59 AM, Prakash Joshi <
>>>>>> prakashcjoshi5@gmail.com> wrote:
>>>>>>
>>>>>>> Ravi,
>>>>>>>
>>>>>>> Can you send the result of
>>>>>>> Show create table your_table_name
>>>>>>>
>>>>>>> Thanks
>>>>>>> Prakash
>>>>>>>
>>>>>>> On Feb 9, 2018 8:20 PM, "☼ R Nair (रविशंकर नायर)"
<
>>>>>>> ravishankar.nair@gmail.com> wrote:
>>>>>>>
>>>>>>>> All,
>>>>>>>>
>>>>>>>> It has been three days continuously I am on this issue. Not
getting
>>>>>>>> any clue.
>>>>>>>>
>>>>>>>> Environment: Spark 2.2.x, all configurations are correct.
>>>>>>>> hive-site.xml is in spark's conf.
>>>>>>>>
>>>>>>>> 1) Step 1: I created a data frame DF1 reading a csv file.
>>>>>>>>
>>>>>>>> 2) Did  manipulations on DF1. Resulting frame is passion_df.
>>>>>>>>
>>>>>>>> 3) passion_df.write.format("orc").saveAsTable("sampledb.passion")
>>>>>>>>
>>>>>>>> 4) The metastore shows the hive table., when I do "show tables"
in
>>>>>>>> HIVE, I can see table name
>>>>>>>>
>>>>>>>> 5) I can't select in HIVE, though I can select from SPARK
as
>>>>>>>> spark.sql("select * from sampledb.passion")
>>>>>>>>
>>>>>>>> Whats going on here? Please help. Why I am not seeing data
from
>>>>>>>> HIVE prompt?
>>>>>>>> The "describe formatted " command on the table in HIVE shows
he
>>>>>>>> data is is in default warehouse location ( /user/hive/warehouse)
since I
>>>>>>>> set it.
>>>>>>>>
>>>>>>>> I am not getting any definite answer anywhere. Many suggestions
and
>>>>>>>> answers given in Stackoverflow et al.Nothing really works.
>>>>>>>>
>>>>>>>> So asking experts here for some light on this, thanks
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Ravion
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>>
>>>
>>>
>>> --
>>>
>>>
>
>
> --
>
>


-- 
Shmuel Blitz
Big Data Developer
Email: shmuel.blitz@similarweb.com
www.similarweb.com
<https://www.facebook.com/SimilarWeb/>
<https://www.linkedin.com/company/429838/> <https://twitter.com/similarweb>

Mime
View raw message