spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Deepak Sharma <deepakmc...@gmail.com>
Subject Re: Spark Dataframe and HIVE
Date Sun, 11 Feb 2018 08:08:03 GMT
Try this in hive:
alter table mine set locations "hdfs://localhost:8020/
user/hive/warehouse/mine";

Thanks
Deepak

On Sun, Feb 11, 2018 at 1:24 PM, ☼ R Nair (रविशंकर नायर) <
ravishankar.nair@gmail.com> wrote:

> Hi,
> Here you go:
>
> hive> show create table mine;
> OK
> CREATE TABLE `mine`(
>   `policyid` int,
>   `statecode` string,
>   `socialid` string,
>   `county` string,
>   `eq_site_limit` decimal(10,2),
>   `hu_site_limit` decimal(10,2),
>   `fl_site_limit` decimal(10,2),
>   `fr_site_limit` decimal(10,2),
>   `tiv_2014` decimal(10,2),
>   `tiv_2015` decimal(10,2),
>   `eq_site_deductible` int,
>   `hu_site_deductible` int,
>   `fl_site_deductible` int,
>   `fr_site_deductible` int,
>   `latitude` decimal(6,6),
>   `longitude` decimal(6,6),
>   `line` string,
>   `construction` string,
>   `point_granularity` int)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
> WITH SERDEPROPERTIES (
>   'path'='hdfs://localhost:8020/user/hive/warehouse/mine')
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
> LOCATION
>   'file:/Users/ravishankarnair/spark-warehouse/mine'
> TBLPROPERTIES (
>   'spark.sql.sources.provider'='parquet',
>   'spark.sql.sources.schema.numParts'='1',
>   'spark.sql.sources.schema.part.0'='{\"type\":\"struct\",
> \"fields\":[{\"name\":\"policyid\",\"type\":\"integer\
> ",\"nullable\":true,\"metadata\":{\"name\":\"policyid\",\"scale\":0}},{\"
> name\":\"statecode\",\"type\":\"string\",\"nullable\":true,\
> "metadata\":{\"name\":\"statecode\",\"scale\":0}},{\"
> name\":\"Socialid\",\"type\":\"string\",\"nullable\":true,\"
> metadata\":{\"name\":\"Socialid\",\"scale\":0}},{\"
> name\":\"county\",\"type\":\"string\",\"nullable\":true,\"
> metadata\":{\"name\":\"county\",\"scale\":0}},{\"name\":\"
> eq_site_limit\",\"type\":\"decimal(10,2)\",\"nullable\":
> true,\"metadata\":{\"name\":\"eq_site_limit\",\"scale\":2}},
> {\"name\":\"hu_site_limit\",\"type\":\"decimal(10,2)\",\"
> nullable\":true,\"metadata\":{\"name\":\"hu_site_limit\",\"
> scale\":2}},{\"name\":\"fl_site_limit\",\"type\":\"
> decimal(10,2)\",\"nullable\":true,\"metadata\":{\"name\":\"
> fl_site_limit\",\"scale\":2}},{\"name\":\"fr_site_limit\",\"
> type\":\"decimal(10,2)\",\"nullable\":true,\"metadata\":{
> \"name\":\"fr_site_limit\",\"scale\":2}},{\"name\":\"tiv_
> 2014\",\"type\":\"decimal(10,2)\",\"nullable\":true,\"
> metadata\":{\"name\":\"tiv_2014\",\"scale\":2}},{\"name\"
> :\"tiv_2015\",\"type\":\"decimal(10,2)\",\"nullable\":
> true,\"metadata\":{\"name\":\"tiv_2015\",\"scale\":2}},{\"
> name\":\"eq_site_deductible\",\"type\":\"integer\",\"
> nullable\":true,\"metadata\":{\"name\":\"eq_site_deductible\
> ",\"scale\":0}},{\"name\":\"hu_site_deductible\",\"type\":
> \"integer\",\"nullable\":true,\"metadata\":{\"name\":\"hu_
> site_deductible\",\"scale\":0}},{\"name\":\"fl_site_
> deductible\",\"type\":\"integer\",\"nullable\":true,\"
> metadata\":{\"name\":\"fl_site_deductible\",\"scale\":0}
> },{\"name\":\"fr_site_deductible\",\"type\":\"
> integer\",\"nullable\":true,\"metadata\":{\"name\":\"fr_
> site_deductible\",\"scale\":0}},{\"name\":\"latitude\",\"
> type\":\"decimal(6,6)\",\"nullable\":true,\"metadata\":{
> \"name\":\"latitude\",\"scale\":6}},{\"name\":\"longitude\",
> \"type\":\"decimal(6,6)\",\"nullable\":true,\"metadata\":{
> \"name\":\"longitude\",\"scale\":6}},{\"name\":\"line\"
> ,\"type\":\"string\",\"nullable\":true,\"metadata\":{
> \"name\":\"line\",\"scale\":0}},{\"name\":\"construction\",\
> "type\":\"string\",\"nullable\":true,\"metadata\":{\"name\":
> \"construction\",\"scale\":0}},{\"name\":\"point_granularity\",\"type\":\"
> integer\",\"nullable\":true,\"metadata\":{\"name\":\"point_
> granularity\",\"scale\":0}}]}',
>   'transient_lastDdlTime'='1518335598')
> Time taken: 0.13 seconds, Fetched: 35 row(s)
>
> On Sun, Feb 11, 2018 at 2:36 AM, Shmuel Blitz <shmuel.blitz@similarweb.com
> > wrote:
>
>> Please run the following command, and paste the result:
>> SHOW CREATE TABLE <<TABLE-NAME>>
>>
>> On Sun, Feb 11, 2018 at 7:56 AM, ☼ R Nair (रविशंकर नायर)
<
>> ravishankar.nair@gmail.com> wrote:
>>
>>> No, No luck.
>>>
>>> Thanks
>>>
>>> On Sun, Feb 11, 2018 at 12:48 AM, Deepak Sharma <deepakmca05@gmail.com>
>>> wrote:
>>>
>>>> In hive cli:
>>>> msck repair table 《table_name》;
>>>>
>>>> Thanks
>>>> Deepak
>>>>
>>>> On Feb 11, 2018 11:14, "☼ R Nair (रविशंकर नायर)"
<
>>>> ravishankar.nair@gmail.com> wrote:
>>>>
>>>>> NO, can you pease explain the command ? Let me try now.
>>>>>
>>>>> Best,
>>>>>
>>>>> On Sun, Feb 11, 2018 at 12:40 AM, Deepak Sharma <deepakmca05@gmail.com
>>>>> > wrote:
>>>>>
>>>>>> I am not sure about the exact issue bjt i see you are partioning
>>>>>> while writing from spark.
>>>>>> Did you tried msck repair on the table before reading it in hive
?
>>>>>>
>>>>>> Thanks
>>>>>> Deepak
>>>>>>
>>>>>> On Feb 11, 2018 11:06, "☼ R Nair (रविशंकर नायर)"
<
>>>>>> ravishankar.nair@gmail.com> wrote:
>>>>>>
>>>>>>> All,
>>>>>>>
>>>>>>> Thanks for the inputs. Again I am not successful. I think, we
need
>>>>>>> to resolve this, as this is a very common requirement.
>>>>>>>
>>>>>>> Please go through my complete code:
>>>>>>>
>>>>>>> STEP 1:  Started Spark shell as spark-shell --master yarn
>>>>>>>
>>>>>>> STEP 2: Flowing code is being given as inout to shark shell
>>>>>>>
>>>>>>> import org.apache.spark.sql.Row
>>>>>>> import org.apache.spark.sql.SparkSession
>>>>>>> val warehouseLocation ="/user/hive/warehouse"
>>>>>>>
>>>>>>> val spark = SparkSession.builder().appName("Spark Hive
>>>>>>> Example").config("spark.sql.warehouse.dir",
>>>>>>> warehouseLocation).enableHiveSupport().getOrCreate()
>>>>>>>
>>>>>>> import org.apache.spark.sql._
>>>>>>> var passion_df = spark.read.
>>>>>>> format("jdbc").
>>>>>>> option("url", "jdbc:mysql://localhost:3307/policies").
>>>>>>> option("driver" ,"com.mysql.jdbc.Driver").
>>>>>>> option("user", "root").
>>>>>>> option("password", "root").
>>>>>>> option("dbtable", "insurancedetails").
>>>>>>> option("partitionColumn", "policyid").
>>>>>>> option("lowerBound", "1").
>>>>>>> option("upperBound", "100000").
>>>>>>> option("numPartitions", "4").
>>>>>>> load()
>>>>>>> //Made sure that passion_df is created, as passion_df.show(5)
shows
>>>>>>> me correct data.
>>>>>>> passion_df.write.saveAsTable("default.mine") //Default parquet
>>>>>>>
>>>>>>> STEP 3: Went to HIVE. Started HIVE prompt.
>>>>>>>
>>>>>>> hive> show tables;
>>>>>>> OK
>>>>>>> callcentervoicelogs
>>>>>>> mine
>>>>>>> Time taken: 0.035 seconds, Fetched: 2 row(s)
>>>>>>> //As you can see HIVE is showing the table "mine" in default
schema.
>>>>>>>
>>>>>>> STEP 4: HERE IS THE PROBLEM.
>>>>>>>
>>>>>>> hive> select * from mine;
>>>>>>> OK
>>>>>>> Time taken: 0.354 seconds
>>>>>>> hive>
>>>>>>> //Where is the data ???
>>>>>>>
>>>>>>> STEP 5:
>>>>>>>
>>>>>>> See the below command on HIVE
>>>>>>>
>>>>>>> describe formatted mine;
>>>>>>> OK
>>>>>>> # col_name             data_type           comment
>>>>>>>
>>>>>>> policyid             int
>>>>>>> statecode           string
>>>>>>> socialid             string
>>>>>>> county               string
>>>>>>> eq_site_limit       decimal(10,2)
>>>>>>> hu_site_limit       decimal(10,2)
>>>>>>> fl_site_limit       decimal(10,2)
>>>>>>> fr_site_limit       decimal(10,2)
>>>>>>> tiv_2014             decimal(10,2)
>>>>>>> tiv_2015             decimal(10,2)
>>>>>>> eq_site_deductible   int
>>>>>>> hu_site_deductible   int
>>>>>>> fl_site_deductible   int
>>>>>>> fr_site_deductible   int
>>>>>>> latitude             decimal(6,6)
>>>>>>> longitude           decimal(6,6)
>>>>>>> line                 string
>>>>>>> construction         string
>>>>>>> point_granularity   int
>>>>>>>
>>>>>>> # Detailed Table Information
>>>>>>> Database:           default
>>>>>>> Owner:               ravishankarnair
>>>>>>> CreateTime:         Sun Feb 11 00:26:40 EST 2018
>>>>>>> LastAccessTime:     UNKNOWN
>>>>>>> Protect Mode:       None
>>>>>>> Retention:           0
>>>>>>> Location:           file:/Users/ravishankarnair/spark-warehouse/mine
>>>>>>>
>>>>>>> Table Type:         MANAGED_TABLE
>>>>>>> Table Parameters:
>>>>>>> spark.sql.sources.provider parquet
>>>>>>> spark.sql.sources.schema.numParts 1
>>>>>>> spark.sql.sources.schema.part.0 {\"type\":\"struct\",\"fields\
>>>>>>> ":[{\"name\":\"policyid\",\"type\":\"integer\",\"nullable\":
>>>>>>> true,\"metadata\":{\"name\":\"policyid\",\"scale\":0}},{\"na
>>>>>>> me\":\"statecode\",\"type\":\"string\",\"nullable\":true,\"m
>>>>>>> etadata\":{\"name\":\"statecode\",\"scale\":0}},{\"name\":\"
>>>>>>> Socialid\",\"type\":\"string\",\"nullable\":true,\"metadata\
>>>>>>> ":{\"name\":\"Socialid\",\"scale\":0}},{\"name\":\"county\",
>>>>>>> \"type\":\"string\",\"nullable\":true,\"metadata\":{\"name\"
>>>>>>> :\"county\",\"scale\":0}},{\"name\":\"eq_site_limit\",\"type
>>>>>>> \":\"decimal(10,2)\",\"nullable\":true,\"metadata\":{\"name\
>>>>>>> ":\"eq_site_limit\",\"scale\":2}},{\"name\":\"hu_site_limit\
>>>>>>> ",\"type\":\"decimal(10,2)\",\"nullable\":true,\"metadata\":
>>>>>>> {\"name\":\"hu_site_limit\",\"scale\":2}},{\"name\":\"fl_sit
>>>>>>> e_limit\",\"type\":\"decimal(10,2)\",\"nullable\":true,\"met
>>>>>>> adata\":{\"name\":\"fl_site_limit\",\"scale\":2}},{\"name\
>>>>>>> ":\"fr_site_limit\",\"type\":\"decimal(10,2)\",\"nullable\":
>>>>>>> true,\"metadata\":{\"name\":\"fr_site_limit\",\"scale\":2}},
>>>>>>> {\"name\":\"tiv_2014\",\"type\":\"decimal(10,2)\",\"
>>>>>>> nullable\":true,\"metadata\":{\"name\":\"tiv_2014\",\"scale\
>>>>>>> ":2}},{\"name\":\"tiv_2015\",\"type\":\"decimal(10,2)\",\"
>>>>>>> nullable\":true,\"metadata\":{\"name\":\"tiv_2015\",\"scale\
>>>>>>> ":2}},{\"name\":\"eq_site_deductible\",\"type\":\"
>>>>>>> integer\",\"nullable\":true,\"metadata\":{\"name\":\"eq_
>>>>>>> site_deductible\",\"scale\":0}},{\"name\":\"hu_site_deductib
>>>>>>> le\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{
>>>>>>> \"name\":\"hu_site_deductible\",\"scale\":0}},{\"name\":\"
>>>>>>> fl_site_deductible\",\"type\":\"integer\",\"nullable\":true,
>>>>>>> \"metadata\":{\"name\":\"fl_site_deductible\",\"scale\":0}
>>>>>>> },{\"name\":\"fr_site_deductible\",\"type\":\"intege
>>>>>>> r\",\"nullable\":true,\"metadata\":{\"name\":\"fr_site_
>>>>>>> deductible\",\"scale\":0}},{\"name\":\"latitude\",\"type\":\
>>>>>>> "decimal(6,6)\",\"nullable\":true,\"metadata\":{\"name\":\"l
>>>>>>> atitude\",\"scale\":6}},{\"name\":\"longitude\",\"type\":\"
>>>>>>> decimal(6,6)\",\"nullable\":true,\"metadata\":{\"name\":\"lo
>>>>>>> ngitude\",\"scale\":6}},{\"name\":\"line\",\"type\":\"string
>>>>>>> \",\"nullable\":true,\"metadata\":{\"name\":\"line\",\"
>>>>>>> scale\":0}},{\"name\":\"construction\",\"type\":\"string\",\
>>>>>>> "nullable\":true,\"metadata\":{\"name\":\"construction\",\"
>>>>>>> scale\":0}},{\"name\":\"point_granularity\",\"type\":\"
>>>>>>> integer\",\"nullable\":true,\"metadata\":{\"name\":\"point_
>>>>>>> granularity\",\"scale\":0}}]}
>>>>>>> transient_lastDdlTime 1518326800
>>>>>>>
>>>>>>> # Storage Information
>>>>>>> SerDe Library:       org.apache.hadoop.hive.ql.io.p
>>>>>>> arquet.serde.ParquetHiveSerDe
>>>>>>> InputFormat:         org.apache.hadoop.hive.ql.io.p
>>>>>>> arquet.MapredParquetInputFormat
>>>>>>> OutputFormat:       org.apache.hadoop.hive.ql.io.p
>>>>>>> arquet.MapredParquetOutputFormat
>>>>>>> Compressed:         No
>>>>>>> Num Buckets:         -1
>>>>>>> Bucket Columns:     []
>>>>>>> Sort Columns:       []
>>>>>>> Storage Desc Params:
>>>>>>> path                 hdfs://localhost:8020/user/hive/warehouse/mine
>>>>>>> serialization.format 1
>>>>>>> Time taken: 0.077 seconds, Fetched: 48 row(s)
>>>>>>>
>>>>>>> Now, I see your advise and support. Whats the issue? Am I doing
>>>>>>> wrong, it it a bug ? I am using Spark 2.2.1, HIVE 1.2.1, HADOOP
2.7.3. All
>>>>>>> class path, configuration are set properly.
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> Ravion
>>>>>>>
>>>>>>> On Fri, Feb 9, 2018 at 1:29 PM, Nicholas Hakobian <
>>>>>>> nicholas.hakobian@rallyhealth.com> wrote:
>>>>>>>
>>>>>>>> Its possible that the format of your table is not compatible
with
>>>>>>>> your version of hive, so Spark saved it in a way such that
only Spark can
>>>>>>>> read it. When this happens it prints out a very visible warning
letting you
>>>>>>>> know this has happened.
>>>>>>>>
>>>>>>>> We've seen it most frequently when trying to save a parquet
file
>>>>>>>> with a column in date format into a Hive table. In older
versions of hive,
>>>>>>>> its parquet reader/writer did not support Date formats (among
a couple
>>>>>>>> others).
>>>>>>>>
>>>>>>>> Nicholas Szandor Hakobian, Ph.D.
>>>>>>>> Staff Data Scientist
>>>>>>>> Rally Health
>>>>>>>> nicholas.hakobian@rallyhealth.com
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Feb 9, 2018 at 9:59 AM, Prakash Joshi <
>>>>>>>> prakashcjoshi5@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Ravi,
>>>>>>>>>
>>>>>>>>> Can you send the result of
>>>>>>>>> Show create table your_table_name
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Prakash
>>>>>>>>>
>>>>>>>>> On Feb 9, 2018 8:20 PM, "☼ R Nair (रविशंकर
नायर)" <
>>>>>>>>> ravishankar.nair@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> All,
>>>>>>>>>>
>>>>>>>>>> It has been three days continuously I am on this
issue. Not
>>>>>>>>>> getting any clue.
>>>>>>>>>>
>>>>>>>>>> Environment: Spark 2.2.x, all configurations are
correct.
>>>>>>>>>> hive-site.xml is in spark's conf.
>>>>>>>>>>
>>>>>>>>>> 1) Step 1: I created a data frame DF1 reading a csv
file.
>>>>>>>>>>
>>>>>>>>>> 2) Did  manipulations on DF1. Resulting frame is
passion_df.
>>>>>>>>>>
>>>>>>>>>> 3) passion_df.write.format("orc").saveAsTable("sampledb.passion")
>>>>>>>>>>
>>>>>>>>>> 4) The metastore shows the hive table., when I do
"show tables"
>>>>>>>>>> in HIVE, I can see table name
>>>>>>>>>>
>>>>>>>>>> 5) I can't select in HIVE, though I can select from
SPARK as
>>>>>>>>>> spark.sql("select * from sampledb.passion")
>>>>>>>>>>
>>>>>>>>>> Whats going on here? Please help. Why I am not seeing
data from
>>>>>>>>>> HIVE prompt?
>>>>>>>>>> The "describe formatted " command on the table in
HIVE shows he
>>>>>>>>>> data is is in default warehouse location ( /user/hive/warehouse)
since I
>>>>>>>>>> set it.
>>>>>>>>>>
>>>>>>>>>> I am not getting any definite answer anywhere. Many
suggestions
>>>>>>>>>> and answers given in Stackoverflow et al.Nothing
really works.
>>>>>>>>>>
>>>>>>>>>> So asking experts here for some light on this, thanks
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Ravion
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>>
>>>
>>>
>>> --
>>>
>>>
>>
>>
>> --
>> Shmuel Blitz
>> Big Data Developer
>> Email: shmuel.blitz@similarweb.com
>> www.similarweb.com
>> <https://www.facebook.com/SimilarWeb/>
>> <https://www.linkedin.com/company/429838/>
>> <https://twitter.com/similarweb>
>>
>
>
>
> --
>
>


-- 
Thanks
Deepak
www.bigdatabig.com
www.keosha.net

Mime
View raw message