spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ☼ R Nair (रविशंकर नायर) <ravishankar.n...@gmail.com>
Subject Re: Spark Dataframe and HIVE
Date Sun, 11 Feb 2018 05:56:13 GMT
No, No luck.

Thanks

On Sun, Feb 11, 2018 at 12:48 AM, Deepak Sharma <deepakmca05@gmail.com>
wrote:

> In hive cli:
> msck repair table 《table_name》;
>
> Thanks
> Deepak
>
> On Feb 11, 2018 11:14, "☼ R Nair (रविशंकर नायर)" <
> ravishankar.nair@gmail.com> wrote:
>
>> NO, can you pease explain the command ? Let me try now.
>>
>> Best,
>>
>> On Sun, Feb 11, 2018 at 12:40 AM, Deepak Sharma <deepakmca05@gmail.com>
>> wrote:
>>
>>> I am not sure about the exact issue bjt i see you are partioning while
>>> writing from spark.
>>> Did you tried msck repair on the table before reading it in hive ?
>>>
>>> Thanks
>>> Deepak
>>>
>>> On Feb 11, 2018 11:06, "☼ R Nair (रविशंकर नायर)" <
>>> ravishankar.nair@gmail.com> wrote:
>>>
>>>> All,
>>>>
>>>> Thanks for the inputs. Again I am not successful. I think, we need to
>>>> resolve this, as this is a very common requirement.
>>>>
>>>> Please go through my complete code:
>>>>
>>>> STEP 1:  Started Spark shell as spark-shell --master yarn
>>>>
>>>> STEP 2: Flowing code is being given as inout to shark shell
>>>>
>>>> import org.apache.spark.sql.Row
>>>> import org.apache.spark.sql.SparkSession
>>>> val warehouseLocation ="/user/hive/warehouse"
>>>>
>>>> val spark = SparkSession.builder().appName("Spark Hive
>>>> Example").config("spark.sql.warehouse.dir",
>>>> warehouseLocation).enableHiveSupport().getOrCreate()
>>>>
>>>> import org.apache.spark.sql._
>>>> var passion_df = spark.read.
>>>> format("jdbc").
>>>> option("url", "jdbc:mysql://localhost:3307/policies").
>>>> option("driver" ,"com.mysql.jdbc.Driver").
>>>> option("user", "root").
>>>> option("password", "root").
>>>> option("dbtable", "insurancedetails").
>>>> option("partitionColumn", "policyid").
>>>> option("lowerBound", "1").
>>>> option("upperBound", "100000").
>>>> option("numPartitions", "4").
>>>> load()
>>>> //Made sure that passion_df is created, as passion_df.show(5) shows me
>>>> correct data.
>>>> passion_df.write.saveAsTable("default.mine") //Default parquet
>>>>
>>>> STEP 3: Went to HIVE. Started HIVE prompt.
>>>>
>>>> hive> show tables;
>>>> OK
>>>> callcentervoicelogs
>>>> mine
>>>> Time taken: 0.035 seconds, Fetched: 2 row(s)
>>>> //As you can see HIVE is showing the table "mine" in default schema.
>>>>
>>>> STEP 4: HERE IS THE PROBLEM.
>>>>
>>>> hive> select * from mine;
>>>> OK
>>>> Time taken: 0.354 seconds
>>>> hive>
>>>> //Where is the data ???
>>>>
>>>> STEP 5:
>>>>
>>>> See the below command on HIVE
>>>>
>>>> describe formatted mine;
>>>> OK
>>>> # col_name             data_type           comment
>>>>
>>>> policyid             int
>>>> statecode           string
>>>> socialid             string
>>>> county               string
>>>> eq_site_limit       decimal(10,2)
>>>> hu_site_limit       decimal(10,2)
>>>> fl_site_limit       decimal(10,2)
>>>> fr_site_limit       decimal(10,2)
>>>> tiv_2014             decimal(10,2)
>>>> tiv_2015             decimal(10,2)
>>>> eq_site_deductible   int
>>>> hu_site_deductible   int
>>>> fl_site_deductible   int
>>>> fr_site_deductible   int
>>>> latitude             decimal(6,6)
>>>> longitude           decimal(6,6)
>>>> line                 string
>>>> construction         string
>>>> point_granularity   int
>>>>
>>>> # Detailed Table Information
>>>> Database:           default
>>>> Owner:               ravishankarnair
>>>> CreateTime:         Sun Feb 11 00:26:40 EST 2018
>>>> LastAccessTime:     UNKNOWN
>>>> Protect Mode:       None
>>>> Retention:           0
>>>> Location:           file:/Users/ravishankarnair/spark-warehouse/mine
>>>> Table Type:         MANAGED_TABLE
>>>> Table Parameters:
>>>> spark.sql.sources.provider parquet
>>>> spark.sql.sources.schema.numParts 1
>>>> spark.sql.sources.schema.part.0 {\"type\":\"struct\",\"fields\
>>>> ":[{\"name\":\"policyid\",\"type\":\"integer\",\"nullable\":
>>>> true,\"metadata\":{\"name\":\"policyid\",\"scale\":0}},{\"na
>>>> me\":\"statecode\",\"type\":\"string\",\"nullable\":true,\"m
>>>> etadata\":{\"name\":\"statecode\",\"scale\":0}},{\"name\":\"
>>>> Socialid\",\"type\":\"string\",\"nullable\":true,\"metadata\
>>>> ":{\"name\":\"Socialid\",\"scale\":0}},{\"name\":\"county\",
>>>> \"type\":\"string\",\"nullable\":true,\"metadata\":{\"name\"
>>>> :\"county\",\"scale\":0}},{\"name\":\"eq_site_limit\",\"
>>>> type\":\"decimal(10,2)\",\"nullable\":true,\"metadata\":{
>>>> \"name\":\"eq_site_limit\",\"scale\":2}},{\"name\":\"hu_site
>>>> _limit\",\"type\":\"decimal(10,2)\",\"nullable\":true,\"
>>>> metadata\":{\"name\":\"hu_site_limit\",\"scale\":2}},{\"
>>>> name\":\"fl_site_limit\",\"type\":\"decimal(10,2)\",\"nullab
>>>> le\":true,\"metadata\":{\"name\":\"fl_site_limit\",\"sca
>>>> le\":2}},{\"name\":\"fr_site_limit\",\"type\":\"decimal(10,
>>>> 2)\",\"nullable\":true,\"metadata\":{\"name\":\"fr_
>>>> site_limit\",\"scale\":2}},{\"name\":\"tiv_2014\",\"type\":\
>>>> "decimal(10,2)\",\"nullable\":true,\"metadata\":{\"name\":\"
>>>> tiv_2014\",\"scale\":2}},{\"name\":\"tiv_2015\",\"type\":\
>>>> "decimal(10,2)\",\"nullable\":true,\"metadata\":{\"name\":\"
>>>> tiv_2015\",\"scale\":2}},{\"name\":\"eq_site_deductible\",
>>>> \"type\":\"integer\",\"nullable\":true,\"metadata\":{
>>>> \"name\":\"eq_site_deductible\",\"scale\":0}},{\"name\":\"
>>>> hu_site_deductible\",\"type\":\"integer\",\"nullable\":true,
>>>> \"metadata\":{\"name\":\"hu_site_deductible\",\"scale\":0}
>>>> },{\"name\":\"fl_site_deductible\",\"type\":\"
>>>> integer\",\"nullable\":true,\"metadata\":{\"name\":\"fl_
>>>> site_deductible\",\"scale\":0}},{\"name\":\"fr_site_
>>>> deductible\",\"type\":\"integer\",\"nullable\":true,\"metada
>>>> ta\":{\"name\":\"fr_site_deductible\",\"scale\":0}},{\"
>>>> name\":\"latitude\",\"type\":\"decimal(6,6)\",\"nullable\":t
>>>> rue,\"metadata\":{\"name\":\"latitude\",\"scale\":6}},{\"nam
>>>> e\":\"longitude\",\"type\":\"decimal(6,6)\",\"nullable\":tru
>>>> e,\"metadata\":{\"name\":\"longitude\",\"scale\":6}},{\"name
>>>> \":\"line\",\"type\":\"string\",\"nullable\":true,\"metadata
>>>> \":{\"name\":\"line\",\"scale\":0}},{\"name\":\"construction
>>>> \",\"type\":\"string\",\"nullable\":true,\"metadata\":{
>>>> \"name\":\"construction\",\"scale\":0}},{\"name\":\"point_gr
>>>> anularity\",\"type\":\"integer\",\"nullable\":true,\"metadat
>>>> a\":{\"name\":\"point_granularity\",\"scale\":0}}]}
>>>> transient_lastDdlTime 1518326800
>>>>
>>>> # Storage Information
>>>> SerDe Library:       org.apache.hadoop.hive.ql.io.p
>>>> arquet.serde.ParquetHiveSerDe
>>>> InputFormat:         org.apache.hadoop.hive.ql.io.p
>>>> arquet.MapredParquetInputFormat
>>>> OutputFormat:       org.apache.hadoop.hive.ql.io.p
>>>> arquet.MapredParquetOutputFormat
>>>> Compressed:         No
>>>> Num Buckets:         -1
>>>> Bucket Columns:     []
>>>> Sort Columns:       []
>>>> Storage Desc Params:
>>>> path                 hdfs://localhost:8020/user/hive/warehouse/mine
>>>> serialization.format 1
>>>> Time taken: 0.077 seconds, Fetched: 48 row(s)
>>>>
>>>> Now, I see your advise and support. Whats the issue? Am I doing wrong,
>>>> it it a bug ? I am using Spark 2.2.1, HIVE 1.2.1, HADOOP 2.7.3. All class
>>>> path, configuration are set properly.
>>>>
>>>> Best,
>>>>
>>>> Ravion
>>>>
>>>> On Fri, Feb 9, 2018 at 1:29 PM, Nicholas Hakobian <
>>>> nicholas.hakobian@rallyhealth.com> wrote:
>>>>
>>>>> Its possible that the format of your table is not compatible with your
>>>>> version of hive, so Spark saved it in a way such that only Spark can
read
>>>>> it. When this happens it prints out a very visible warning letting you
know
>>>>> this has happened.
>>>>>
>>>>> We've seen it most frequently when trying to save a parquet file with
>>>>> a column in date format into a Hive table. In older versions of hive,
its
>>>>> parquet reader/writer did not support Date formats (among a couple others).
>>>>>
>>>>> Nicholas Szandor Hakobian, Ph.D.
>>>>> Staff Data Scientist
>>>>> Rally Health
>>>>> nicholas.hakobian@rallyhealth.com
>>>>>
>>>>>
>>>>> On Fri, Feb 9, 2018 at 9:59 AM, Prakash Joshi <
>>>>> prakashcjoshi5@gmail.com> wrote:
>>>>>
>>>>>> Ravi,
>>>>>>
>>>>>> Can you send the result of
>>>>>> Show create table your_table_name
>>>>>>
>>>>>> Thanks
>>>>>> Prakash
>>>>>>
>>>>>> On Feb 9, 2018 8:20 PM, "☼ R Nair (रविशंकर नायर)"
<
>>>>>> ravishankar.nair@gmail.com> wrote:
>>>>>>
>>>>>>> All,
>>>>>>>
>>>>>>> It has been three days continuously I am on this issue. Not getting
>>>>>>> any clue.
>>>>>>>
>>>>>>> Environment: Spark 2.2.x, all configurations are correct.
>>>>>>> hive-site.xml is in spark's conf.
>>>>>>>
>>>>>>> 1) Step 1: I created a data frame DF1 reading a csv file.
>>>>>>>
>>>>>>> 2) Did  manipulations on DF1. Resulting frame is passion_df.
>>>>>>>
>>>>>>> 3) passion_df.write.format("orc").saveAsTable("sampledb.passion")
>>>>>>>
>>>>>>> 4) The metastore shows the hive table., when I do "show tables"
in
>>>>>>> HIVE, I can see table name
>>>>>>>
>>>>>>> 5) I can't select in HIVE, though I can select from SPARK as
>>>>>>> spark.sql("select * from sampledb.passion")
>>>>>>>
>>>>>>> Whats going on here? Please help. Why I am not seeing data from
HIVE
>>>>>>> prompt?
>>>>>>> The "describe formatted " command on the table in HIVE shows
he data
>>>>>>> is is in default warehouse location ( /user/hive/warehouse) since
I set it.
>>>>>>>
>>>>>>> I am not getting any definite answer anywhere. Many suggestions
and
>>>>>>> answers given in Stackoverflow et al.Nothing really works.
>>>>>>>
>>>>>>> So asking experts here for some light on this, thanks
>>>>>>>
>>>>>>> Best,
>>>>>>> Ravion
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>
>>
>> --
>>
>>


--

Mime
View raw message