spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Deepak Sharma <deepakmc...@gmail.com>
Subject Re: Spark Dataframe and HIVE
Date Sun, 11 Feb 2018 05:40:59 GMT
I am not sure about the exact issue bjt i see you are partioning while
writing from spark.
Did you tried msck repair on the table before reading it in hive ?

Thanks
Deepak

On Feb 11, 2018 11:06, "☼ R Nair (रविशंकर नायर)" <ravishankar.nair@gmail.com>
wrote:

> All,
>
> Thanks for the inputs. Again I am not successful. I think, we need to
> resolve this, as this is a very common requirement.
>
> Please go through my complete code:
>
> STEP 1:  Started Spark shell as spark-shell --master yarn
>
> STEP 2: Flowing code is being given as inout to shark shell
>
> import org.apache.spark.sql.Row
> import org.apache.spark.sql.SparkSession
> val warehouseLocation ="/user/hive/warehouse"
>
> val spark = SparkSession.builder().appName("Spark Hive
> Example").config("spark.sql.warehouse.dir", warehouseLocation).
> enableHiveSupport().getOrCreate()
>
> import org.apache.spark.sql._
> var passion_df = spark.read.
> format("jdbc").
> option("url", "jdbc:mysql://localhost:3307/policies").
> option("driver" ,"com.mysql.jdbc.Driver").
> option("user", "root").
> option("password", "root").
> option("dbtable", "insurancedetails").
> option("partitionColumn", "policyid").
> option("lowerBound", "1").
> option("upperBound", "100000").
> option("numPartitions", "4").
> load()
> //Made sure that passion_df is created, as passion_df.show(5) shows me
> correct data.
> passion_df.write.saveAsTable("default.mine") //Default parquet
>
> STEP 3: Went to HIVE. Started HIVE prompt.
>
> hive> show tables;
> OK
> callcentervoicelogs
> mine
> Time taken: 0.035 seconds, Fetched: 2 row(s)
> //As you can see HIVE is showing the table "mine" in default schema.
>
> STEP 4: HERE IS THE PROBLEM.
>
> hive> select * from mine;
> OK
> Time taken: 0.354 seconds
> hive>
> //Where is the data ???
>
> STEP 5:
>
> See the below command on HIVE
>
> describe formatted mine;
> OK
> # col_name             data_type           comment
>
> policyid             int
> statecode           string
> socialid             string
> county               string
> eq_site_limit       decimal(10,2)
> hu_site_limit       decimal(10,2)
> fl_site_limit       decimal(10,2)
> fr_site_limit       decimal(10,2)
> tiv_2014             decimal(10,2)
> tiv_2015             decimal(10,2)
> eq_site_deductible   int
> hu_site_deductible   int
> fl_site_deductible   int
> fr_site_deductible   int
> latitude             decimal(6,6)
> longitude           decimal(6,6)
> line                 string
> construction         string
> point_granularity   int
>
> # Detailed Table Information
> Database:           default
> Owner:               ravishankarnair
> CreateTime:         Sun Feb 11 00:26:40 EST 2018
> LastAccessTime:     UNKNOWN
> Protect Mode:       None
> Retention:           0
> Location:           file:/Users/ravishankarnair/spark-warehouse/mine
> Table Type:         MANAGED_TABLE
> Table Parameters:
> spark.sql.sources.provider parquet
> spark.sql.sources.schema.numParts 1
> spark.sql.sources.schema.part.0 {\"type\":\"struct\",\"fields\
> ":[{\"name\":\"policyid\",\"type\":\"integer\",\"nullable\
> ":true,\"metadata\":{\"name\":\"policyid\",\"scale\":0}},{\"
> name\":\"statecode\",\"type\":\"string\",\"nullable\":true,\
> "metadata\":{\"name\":\"statecode\",\"scale\":0}},{\"
> name\":\"Socialid\",\"type\":\"string\",\"nullable\":true,\"
> metadata\":{\"name\":\"Socialid\",\"scale\":0}},{\"
> name\":\"county\",\"type\":\"string\",\"nullable\":true,\"
> metadata\":{\"name\":\"county\",\"scale\":0}},{\"name\":\"
> eq_site_limit\",\"type\":\"decimal(10,2)\",\"nullable\":
> true,\"metadata\":{\"name\":\"eq_site_limit\",\"scale\":2}},
> {\"name\":\"hu_site_limit\",\"type\":\"decimal(10,2)\",\"
> nullable\":true,\"metadata\":{\"name\":\"hu_site_limit\",\"
> scale\":2}},{\"name\":\"fl_site_limit\",\"type\":\"
> decimal(10,2)\",\"nullable\":true,\"metadata\":{\"name\":\"
> fl_site_limit\",\"scale\":2}},{\"name\":\"fr_site_limit\",\"
> type\":\"decimal(10,2)\",\"nullable\":true,\"metadata\":{
> \"name\":\"fr_site_limit\",\"scale\":2}},{\"name\":\"tiv_
> 2014\",\"type\":\"decimal(10,2)\",\"nullable\":true,\"
> metadata\":{\"name\":\"tiv_2014\",\"scale\":2}},{\"name\"
> :\"tiv_2015\",\"type\":\"decimal(10,2)\",\"nullable\":
> true,\"metadata\":{\"name\":\"tiv_2015\",\"scale\":2}},{\"
> name\":\"eq_site_deductible\",\"type\":\"integer\",\"
> nullable\":true,\"metadata\":{\"name\":\"eq_site_deductible\
> ",\"scale\":0}},{\"name\":\"hu_site_deductible\",\"type\":
> \"integer\",\"nullable\":true,\"metadata\":{\"name\":\"hu_
> site_deductible\",\"scale\":0}},{\"name\":\"fl_site_
> deductible\",\"type\":\"integer\",\"nullable\":true,\"
> metadata\":{\"name\":\"fl_site_deductible\",\"scale\":0}
> },{\"name\":\"fr_site_deductible\",\"type\":\"
> integer\",\"nullable\":true,\"metadata\":{\"name\":\"fr_
> site_deductible\",\"scale\":0}},{\"name\":\"latitude\",\"
> type\":\"decimal(6,6)\",\"nullable\":true,\"metadata\":{
> \"name\":\"latitude\",\"scale\":6}},{\"name\":\"longitude\",
> \"type\":\"decimal(6,6)\",\"nullable\":true,\"metadata\":{
> \"name\":\"longitude\",\"scale\":6}},{\"name\":\"line\"
> ,\"type\":\"string\",\"nullable\":true,\"metadata\":{
> \"name\":\"line\",\"scale\":0}},{\"name\":\"construction\",\
> "type\":\"string\",\"nullable\":true,\"metadata\":{\"name\":
> \"construction\",\"scale\":0}},{\"name\":\"point_granularity\",\"type\":\"
> integer\",\"nullable\":true,\"metadata\":{\"name\":\"point_
> granularity\",\"scale\":0}}]}
> transient_lastDdlTime 1518326800
>
> # Storage Information
> SerDe Library:       org.apache.hadoop.hive.ql.io.
> parquet.serde.ParquetHiveSerDe
> InputFormat:         org.apache.hadoop.hive.ql.io.parquet.
> MapredParquetInputFormat
> OutputFormat:       org.apache.hadoop.hive.ql.io.parquet.
> MapredParquetOutputFormat
> Compressed:         No
> Num Buckets:         -1
> Bucket Columns:     []
> Sort Columns:       []
> Storage Desc Params:
> path                 hdfs://localhost:8020/user/hive/warehouse/mine
> serialization.format 1
> Time taken: 0.077 seconds, Fetched: 48 row(s)
>
> Now, I see your advise and support. Whats the issue? Am I doing wrong, it
> it a bug ? I am using Spark 2.2.1, HIVE 1.2.1, HADOOP 2.7.3. All class
> path, configuration are set properly.
>
> Best,
>
> Ravion
>
> On Fri, Feb 9, 2018 at 1:29 PM, Nicholas Hakobian <nicholas.hakobian@
> rallyhealth.com> wrote:
>
>> Its possible that the format of your table is not compatible with your
>> version of hive, so Spark saved it in a way such that only Spark can read
>> it. When this happens it prints out a very visible warning letting you know
>> this has happened.
>>
>> We've seen it most frequently when trying to save a parquet file with a
>> column in date format into a Hive table. In older versions of hive, its
>> parquet reader/writer did not support Date formats (among a couple others).
>>
>> Nicholas Szandor Hakobian, Ph.D.
>> Staff Data Scientist
>> Rally Health
>> nicholas.hakobian@rallyhealth.com
>>
>>
>> On Fri, Feb 9, 2018 at 9:59 AM, Prakash Joshi <prakashcjoshi5@gmail.com>
>> wrote:
>>
>>> Ravi,
>>>
>>> Can you send the result of
>>> Show create table your_table_name
>>>
>>> Thanks
>>> Prakash
>>>
>>> On Feb 9, 2018 8:20 PM, "☼ R Nair (रविशंकर नायर)" <
>>> ravishankar.nair@gmail.com> wrote:
>>>
>>>> All,
>>>>
>>>> It has been three days continuously I am on this issue. Not getting any
>>>> clue.
>>>>
>>>> Environment: Spark 2.2.x, all configurations are correct. hive-site.xml
>>>> is in spark's conf.
>>>>
>>>> 1) Step 1: I created a data frame DF1 reading a csv file.
>>>>
>>>> 2) Did  manipulations on DF1. Resulting frame is passion_df.
>>>>
>>>> 3) passion_df.write.format("orc").saveAsTable("sampledb.passion")
>>>>
>>>> 4) The metastore shows the hive table., when I do "show tables" in
>>>> HIVE, I can see table name
>>>>
>>>> 5) I can't select in HIVE, though I can select from SPARK as
>>>> spark.sql("select * from sampledb.passion")
>>>>
>>>> Whats going on here? Please help. Why I am not seeing data from HIVE
>>>> prompt?
>>>> The "describe formatted " command on the table in HIVE shows he data is
>>>> is in default warehouse location ( /user/hive/warehouse) since I set it.
>>>>
>>>> I am not getting any definite answer anywhere. Many suggestions and
>>>> answers given in Stackoverflow et al.Nothing really works.
>>>>
>>>> So asking experts here for some light on this, thanks
>>>>
>>>> Best,
>>>> Ravion
>>>>
>>>>
>>>>
>>
>
>
> --
>
>

Mime
View raw message