spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ☼ R Nair (रविशंकर नायर) <ravishankar.n...@gmail.com>
Subject Re: Spark Dataframe and HIVE
Date Sun, 11 Feb 2018 07:54:30 GMT
Hi,
Here you go:

hive> show create table mine;
OK
CREATE TABLE `mine`(
  `policyid` int,
  `statecode` string,
  `socialid` string,
  `county` string,
  `eq_site_limit` decimal(10,2),
  `hu_site_limit` decimal(10,2),
  `fl_site_limit` decimal(10,2),
  `fr_site_limit` decimal(10,2),
  `tiv_2014` decimal(10,2),
  `tiv_2015` decimal(10,2),
  `eq_site_deductible` int,
  `hu_site_deductible` int,
  `fl_site_deductible` int,
  `fr_site_deductible` int,
  `latitude` decimal(6,6),
  `longitude` decimal(6,6),
  `line` string,
  `construction` string,
  `point_granularity` int)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
  'path'='hdfs://localhost:8020/user/hive/warehouse/mine')
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  'file:/Users/ravishankarnair/spark-warehouse/mine'
TBLPROPERTIES (
  'spark.sql.sources.provider'='parquet',
  'spark.sql.sources.schema.numParts'='1',

'spark.sql.sources.schema.part.0'='{\"type\":\"struct\",\"fields\":[{\"name\":\"policyid\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{\"name\":\"policyid\",\"scale\":0}},{\"name\":\"statecode\",\"type\":\"string\",\"nullable\":true,\"metadata\":{\"name\":\"statecode\",\"scale\":0}},{\"name\":\"Socialid\",\"type\":\"string\",\"nullable\":true,\"metadata\":{\"name\":\"Socialid\",\"scale\":0}},{\"name\":\"county\",\"type\":\"string\",\"nullable\":true,\"metadata\":{\"name\":\"county\",\"scale\":0}},{\"name\":\"eq_site_limit\",\"type\":\"decimal(10,2)\",\"nullable\":true,\"metadata\":{\"name\":\"eq_site_limit\",\"scale\":2}},{\"name\":\"hu_site_limit\",\"type\":\"decimal(10,2)\",\"nullable\":true,\"metadata\":{\"name\":\"hu_site_limit\",\"scale\":2}},{\"name\":\"fl_site_limit\",\"type\":\"decimal(10,2)\",\"nullable\":true,\"metadata\":{\"name\":\"fl_site_limit\",\"scale\":2}},{\"name\":\"fr_site_limit\",\"type\":\"decimal(10,2)\",\"nullable\":true,\"metadata\":{\"name\":\"fr_site_limit\",\"scale\":2}},{\"name\":\"tiv_2014\",\"type\":\"decimal(10,2)\",\"nullable\":true,\"metadata\":{\"name\":\"tiv_2014\",\"scale\":2}},{\"name\":\"tiv_2015\",\"type\":\"decimal(10,2)\",\"nullable\":true,\"metadata\":{\"name\":\"tiv_2015\",\"scale\":2}},{\"name\":\"eq_site_deductible\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{\"name\":\"eq_site_deductible\",\"scale\":0}},{\"name\":\"hu_site_deductible\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{\"name\":\"hu_site_deductible\",\"scale\":0}},{\"name\":\"fl_site_deductible\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{\"name\":\"fl_site_deductible\",\"scale\":0}},{\"name\":\"fr_site_deductible\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{\"name\":\"fr_site_deductible\",\"scale\":0}},{\"name\":\"latitude\",\"type\":\"decimal(6,6)\",\"nullable\":true,\"metadata\":{\"name\":\"latitude\",\"scale\":6}},{\"name\":\"longitude\",\"type\":\"decimal(6,6)\",\"nullable\":true,\"metadata\":{\"name\":\"longitude\",\"scale\":6}},{\"name\":\"line\",\"type\":\"string\",\"nullable\":true,\"metadata\":{\"name\":\"line\",\"scale\":0}},{\"name\":\"construction\",\"type\":\"string\",\"nullable\":true,\"metadata\":{\"name\":\"construction\",\"scale\":0}},{\"name\":\"point_granularity\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{\"name\":\"point_granularity\",\"scale\":0}}]}',
  'transient_lastDdlTime'='1518335598')
Time taken: 0.13 seconds, Fetched: 35 row(s)

On Sun, Feb 11, 2018 at 2:36 AM, Shmuel Blitz <shmuel.blitz@similarweb.com>
wrote:

> Please run the following command, and paste the result:
> SHOW CREATE TABLE <<TABLE-NAME>>
>
> On Sun, Feb 11, 2018 at 7:56 AM, ☼ R Nair (रविशंकर नायर) <
> ravishankar.nair@gmail.com> wrote:
>
>> No, No luck.
>>
>> Thanks
>>
>> On Sun, Feb 11, 2018 at 12:48 AM, Deepak Sharma <deepakmca05@gmail.com>
>> wrote:
>>
>>> In hive cli:
>>> msck repair table 《table_name》;
>>>
>>> Thanks
>>> Deepak
>>>
>>> On Feb 11, 2018 11:14, "☼ R Nair (रविशंकर नायर)" <
>>> ravishankar.nair@gmail.com> wrote:
>>>
>>>> NO, can you pease explain the command ? Let me try now.
>>>>
>>>> Best,
>>>>
>>>> On Sun, Feb 11, 2018 at 12:40 AM, Deepak Sharma <deepakmca05@gmail.com>
>>>> wrote:
>>>>
>>>>> I am not sure about the exact issue bjt i see you are partioning while
>>>>> writing from spark.
>>>>> Did you tried msck repair on the table before reading it in hive ?
>>>>>
>>>>> Thanks
>>>>> Deepak
>>>>>
>>>>> On Feb 11, 2018 11:06, "☼ R Nair (रविशंकर नायर)"
<
>>>>> ravishankar.nair@gmail.com> wrote:
>>>>>
>>>>>> All,
>>>>>>
>>>>>> Thanks for the inputs. Again I am not successful. I think, we need
to
>>>>>> resolve this, as this is a very common requirement.
>>>>>>
>>>>>> Please go through my complete code:
>>>>>>
>>>>>> STEP 1:  Started Spark shell as spark-shell --master yarn
>>>>>>
>>>>>> STEP 2: Flowing code is being given as inout to shark shell
>>>>>>
>>>>>> import org.apache.spark.sql.Row
>>>>>> import org.apache.spark.sql.SparkSession
>>>>>> val warehouseLocation ="/user/hive/warehouse"
>>>>>>
>>>>>> val spark = SparkSession.builder().appName("Spark Hive
>>>>>> Example").config("spark.sql.warehouse.dir",
>>>>>> warehouseLocation).enableHiveSupport().getOrCreate()
>>>>>>
>>>>>> import org.apache.spark.sql._
>>>>>> var passion_df = spark.read.
>>>>>> format("jdbc").
>>>>>> option("url", "jdbc:mysql://localhost:3307/policies").
>>>>>> option("driver" ,"com.mysql.jdbc.Driver").
>>>>>> option("user", "root").
>>>>>> option("password", "root").
>>>>>> option("dbtable", "insurancedetails").
>>>>>> option("partitionColumn", "policyid").
>>>>>> option("lowerBound", "1").
>>>>>> option("upperBound", "100000").
>>>>>> option("numPartitions", "4").
>>>>>> load()
>>>>>> //Made sure that passion_df is created, as passion_df.show(5) shows
>>>>>> me correct data.
>>>>>> passion_df.write.saveAsTable("default.mine") //Default parquet
>>>>>>
>>>>>> STEP 3: Went to HIVE. Started HIVE prompt.
>>>>>>
>>>>>> hive> show tables;
>>>>>> OK
>>>>>> callcentervoicelogs
>>>>>> mine
>>>>>> Time taken: 0.035 seconds, Fetched: 2 row(s)
>>>>>> //As you can see HIVE is showing the table "mine" in default schema.
>>>>>>
>>>>>> STEP 4: HERE IS THE PROBLEM.
>>>>>>
>>>>>> hive> select * from mine;
>>>>>> OK
>>>>>> Time taken: 0.354 seconds
>>>>>> hive>
>>>>>> //Where is the data ???
>>>>>>
>>>>>> STEP 5:
>>>>>>
>>>>>> See the below command on HIVE
>>>>>>
>>>>>> describe formatted mine;
>>>>>> OK
>>>>>> # col_name             data_type           comment
>>>>>>
>>>>>> policyid             int
>>>>>> statecode           string
>>>>>> socialid             string
>>>>>> county               string
>>>>>> eq_site_limit       decimal(10,2)
>>>>>> hu_site_limit       decimal(10,2)
>>>>>> fl_site_limit       decimal(10,2)
>>>>>> fr_site_limit       decimal(10,2)
>>>>>> tiv_2014             decimal(10,2)
>>>>>> tiv_2015             decimal(10,2)
>>>>>> eq_site_deductible   int
>>>>>> hu_site_deductible   int
>>>>>> fl_site_deductible   int
>>>>>> fr_site_deductible   int
>>>>>> latitude             decimal(6,6)
>>>>>> longitude           decimal(6,6)
>>>>>> line                 string
>>>>>> construction         string
>>>>>> point_granularity   int
>>>>>>
>>>>>> # Detailed Table Information
>>>>>> Database:           default
>>>>>> Owner:               ravishankarnair
>>>>>> CreateTime:         Sun Feb 11 00:26:40 EST 2018
>>>>>> LastAccessTime:     UNKNOWN
>>>>>> Protect Mode:       None
>>>>>> Retention:           0
>>>>>> Location:           file:/Users/ravishankarnair/spark-warehouse/mine
>>>>>>
>>>>>> Table Type:         MANAGED_TABLE
>>>>>> Table Parameters:
>>>>>> spark.sql.sources.provider parquet
>>>>>> spark.sql.sources.schema.numParts 1
>>>>>> spark.sql.sources.schema.part.0 {\"type\":\"struct\",\"fields\
>>>>>> ":[{\"name\":\"policyid\",\"type\":\"integer\",\"nullable\":
>>>>>> true,\"metadata\":{\"name\":\"policyid\",\"scale\":0}},{\"na
>>>>>> me\":\"statecode\",\"type\":\"string\",\"nullable\":true,\"m
>>>>>> etadata\":{\"name\":\"statecode\",\"scale\":0}},{\"name\":\"
>>>>>> Socialid\",\"type\":\"string\",\"nullable\":true,\"metadata\
>>>>>> ":{\"name\":\"Socialid\",\"scale\":0}},{\"name\":\"county\",
>>>>>> \"type\":\"string\",\"nullable\":true,\"metadata\":{\"name\"
>>>>>> :\"county\",\"scale\":0}},{\"name\":\"eq_site_limit\",\"type
>>>>>> \":\"decimal(10,2)\",\"nullable\":true,\"metadata\":{\"name\
>>>>>> ":\"eq_site_limit\",\"scale\":2}},{\"name\":\"hu_site_limit\
>>>>>> ",\"type\":\"decimal(10,2)\",\"nullable\":true,\"metadata\":
>>>>>> {\"name\":\"hu_site_limit\",\"scale\":2}},{\"name\":\"fl_sit
>>>>>> e_limit\",\"type\":\"decimal(10,2)\",\"nullable\":true,\"
>>>>>> metadata\":{\"name\":\"fl_site_limit\",\"scale\":2}},{\"
>>>>>> name\":\"fr_site_limit\",\"type\":\"decimal(10,2)\",\"nullab
>>>>>> le\":true,\"metadata\":{\"name\":\"fr_site_limit\",\"sca
>>>>>> le\":2}},{\"name\":\"tiv_2014\",\"type\":\"decimal(10,2)\",\
>>>>>> "nullable\":true,\"metadata\":{\"name\":\"tiv_2014\",\"
>>>>>> scale\":2}},{\"name\":\"tiv_2015\",\"type\":\"decimal(10,
>>>>>> 2)\",\"nullable\":true,\"metadata\":{\"name\":\"tiv_
>>>>>> 2015\",\"scale\":2}},{\"name\":\"eq_site_deductible\",\"type
>>>>>> \":\"integer\",\"nullable\":true,\"metadata\":{\"name\":\"
>>>>>> eq_site_deductible\",\"scale\":0}},{\"name\":\"hu_site_
>>>>>> deductible\",\"type\":\"integer\",\"nullable\":true,\"metada
>>>>>> ta\":{\"name\":\"hu_site_deductible\",\"scale\":0}},{\"
>>>>>> name\":\"fl_site_deductible\",\"type\":\"integer\",\"nullabl
>>>>>> e\":true,\"metadata\":{\"name\":\"fl_site_deductible\",\"
>>>>>> scale\":0}},{\"name\":\"fr_site_deductible\",\"type\":\"
>>>>>> integer\",\"nullable\":true,\"metadata\":{\"name\":\"fr_site
>>>>>> _deductible\",\"scale\":0}},{\"name\":\"latitude\",\"type\":
>>>>>> \"decimal(6,6)\",\"nullable\":true,\"metadata\":{\"name\":\"
>>>>>> latitude\",\"scale\":6}},{\"name\":\"longitude\",\"type\":
>>>>>> \"decimal(6,6)\",\"nullable\":true,\"metadata\":{\"name\":\"
>>>>>> longitude\",\"scale\":6}},{\"name\":\"line\",\"type\":\"
>>>>>> string\",\"nullable\":true,\"metadata\":{\"name\":\"line\",
>>>>>> \"scale\":0}},{\"name\":\"construction\",\"type\":\"
>>>>>> string\",\"nullable\":true,\"metadata\":{\"name\":\"
>>>>>> construction\",\"scale\":0}},{\"name\":\"point_granularity\"
>>>>>> ,\"type\":\"integer\",\"nullable\":true,\"metadata\":{\"
>>>>>> name\":\"point_granularity\",\"scale\":0}}]}
>>>>>> transient_lastDdlTime 1518326800
>>>>>>
>>>>>> # Storage Information
>>>>>> SerDe Library:       org.apache.hadoop.hive.ql.io.p
>>>>>> arquet.serde.ParquetHiveSerDe
>>>>>> InputFormat:         org.apache.hadoop.hive.ql.io.p
>>>>>> arquet.MapredParquetInputFormat
>>>>>> OutputFormat:       org.apache.hadoop.hive.ql.io.p
>>>>>> arquet.MapredParquetOutputFormat
>>>>>> Compressed:         No
>>>>>> Num Buckets:         -1
>>>>>> Bucket Columns:     []
>>>>>> Sort Columns:       []
>>>>>> Storage Desc Params:
>>>>>> path                 hdfs://localhost:8020/user/hive/warehouse/mine
>>>>>> serialization.format 1
>>>>>> Time taken: 0.077 seconds, Fetched: 48 row(s)
>>>>>>
>>>>>> Now, I see your advise and support. Whats the issue? Am I doing
>>>>>> wrong, it it a bug ? I am using Spark 2.2.1, HIVE 1.2.1, HADOOP 2.7.3.
All
>>>>>> class path, configuration are set properly.
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Ravion
>>>>>>
>>>>>> On Fri, Feb 9, 2018 at 1:29 PM, Nicholas Hakobian <
>>>>>> nicholas.hakobian@rallyhealth.com> wrote:
>>>>>>
>>>>>>> Its possible that the format of your table is not compatible
with
>>>>>>> your version of hive, so Spark saved it in a way such that only
Spark can
>>>>>>> read it. When this happens it prints out a very visible warning
letting you
>>>>>>> know this has happened.
>>>>>>>
>>>>>>> We've seen it most frequently when trying to save a parquet file
>>>>>>> with a column in date format into a Hive table. In older versions
of hive,
>>>>>>> its parquet reader/writer did not support Date formats (among
a couple
>>>>>>> others).
>>>>>>>
>>>>>>> Nicholas Szandor Hakobian, Ph.D.
>>>>>>> Staff Data Scientist
>>>>>>> Rally Health
>>>>>>> nicholas.hakobian@rallyhealth.com
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Feb 9, 2018 at 9:59 AM, Prakash Joshi <
>>>>>>> prakashcjoshi5@gmail.com> wrote:
>>>>>>>
>>>>>>>> Ravi,
>>>>>>>>
>>>>>>>> Can you send the result of
>>>>>>>> Show create table your_table_name
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Prakash
>>>>>>>>
>>>>>>>> On Feb 9, 2018 8:20 PM, "☼ R Nair (रविशंकर
नायर)" <
>>>>>>>> ravishankar.nair@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> All,
>>>>>>>>>
>>>>>>>>> It has been three days continuously I am on this issue.
Not
>>>>>>>>> getting any clue.
>>>>>>>>>
>>>>>>>>> Environment: Spark 2.2.x, all configurations are correct.
>>>>>>>>> hive-site.xml is in spark's conf.
>>>>>>>>>
>>>>>>>>> 1) Step 1: I created a data frame DF1 reading a csv file.
>>>>>>>>>
>>>>>>>>> 2) Did  manipulations on DF1. Resulting frame is passion_df.
>>>>>>>>>
>>>>>>>>> 3) passion_df.write.format("orc").saveAsTable("sampledb.passion")
>>>>>>>>>
>>>>>>>>> 4) The metastore shows the hive table., when I do "show
tables" in
>>>>>>>>> HIVE, I can see table name
>>>>>>>>>
>>>>>>>>> 5) I can't select in HIVE, though I can select from SPARK
as
>>>>>>>>> spark.sql("select * from sampledb.passion")
>>>>>>>>>
>>>>>>>>> Whats going on here? Please help. Why I am not seeing
data from
>>>>>>>>> HIVE prompt?
>>>>>>>>> The "describe formatted " command on the table in HIVE
shows he
>>>>>>>>> data is is in default warehouse location ( /user/hive/warehouse)
since I
>>>>>>>>> set it.
>>>>>>>>>
>>>>>>>>> I am not getting any definite answer anywhere. Many suggestions
>>>>>>>>> and answers given in Stackoverflow et al.Nothing really
works.
>>>>>>>>>
>>>>>>>>> So asking experts here for some light on this, thanks
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Ravion
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>
>>
>> --
>>
>>
>
>
> --
> Shmuel Blitz
> Big Data Developer
> Email: shmuel.blitz@similarweb.com
> www.similarweb.com
> <https://www.facebook.com/SimilarWeb/>
> <https://www.linkedin.com/company/429838/>
> <https://twitter.com/similarweb>
>



--

Mime
View raw message