spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Lee <alee...@hotmail.com>
Subject RE: Working Formula for Hive 0.13?
Date Mon, 25 Aug 2014 21:29:12 GMT
>From my perspective, there're few benefits regarding Hive 0.13.1+. The following are the
4 major ones that I can see why people are asking to upgrade to Hive 0.13.1 recently.
1. Performance and bug fix, patches. (Usual case)
2. Native support for Parquet format, no need to provide custom JARs and SerDe like Hive 0.12.
(Depends, driven by data format and queries)
3. Support of Tez engine which gives performance improvement in several use cases (Performance
improvement)
4. Security enhancement in Hive 0.13.1 has improved a lot (Security concerns, ACLs, etc)
These are the major benefits I see to upgrade to Hive 0.13.1+ from Hive 0.12.0.
There may be others out there that I'm not aware of, but I do see it coming.
my 2 cents.
> From: michael@databricks.com
> Date: Mon, 25 Aug 2014 13:08:42 -0700
> Subject: Re: Working Formula for Hive 0.13?
> To: wangfei1@huawei.com
> CC: dev@spark.apache.org
> 
> Thanks for working on this!  Its unclear at the moment exactly how we are
> going to handle this, since the end goal is to be compatible with as many
> versions of Hive as possible.  That said, I think it would be great to open
> a PR in this case.  Even if we don't merge it, thats a good way to get it
> on people's radar and have a discussion about the changes that are required.
> 
> 
> On Sun, Aug 24, 2014 at 7:11 PM, scwf <wangfei1@huawei.com> wrote:
> 
> >   I have worked for a branch update the hive version to hive-0.13(by
> > org.apache.hive)---https://github.com/scwf/spark/tree/hive-0.13
> > I am wondering whether it's ok to make a PR now because hive-0.13 version
> > is not compatible with hive-0.12 and here i used org.apache.hive.
> >
> >
> >
> > On 2014/7/29 8:22, Michael Armbrust wrote:
> >
> >> A few things:
> >>   - When we upgrade to Hive 0.13.0, Patrick will likely republish the
> >> hive-exec jar just as we did for 0.12.0
> >>   - Since we have to tie into some pretty low level APIs it is
> >> unsurprising
> >> that the code doesn't just compile out of the box against 0.13.0
> >>   - ScalaReflection is for determining Schema from Scala classes, not
> >> reflection based bridge code.  Either way its unclear to if there is any
> >> reason to use reflection to support multiple versions, instead of just
> >> upgrading to Hive 0.13.0
> >>
> >> One question I have is, What is the goal of upgrading to hive 0.13.0?  Is
> >> it purely because you are having problems connecting to newer metastores?
> >>   Are there some features you are hoping for?  This will help me
> >> prioritize
> >> this effort.
> >>
> >> Michael
> >>
> >>
> >> On Mon, Jul 28, 2014 at 4:05 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> >>
> >>  I was looking for a class where reflection-related code should reside.
> >>>
> >>> I found this but don't think it is the proper class for bridging
> >>> differences between hive 0.12 and 0.13.1:
> >>>
> >>> sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/
> >>> ScalaReflection.scala
> >>>
> >>> Cheers
> >>>
> >>>
> >>> On Mon, Jul 28, 2014 at 3:41 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> >>>
> >>>  After manually copying hive 0.13.1 jars to local maven repo, I got the
> >>>> following errors when building spark-hive_2.10 module :
> >>>>
> >>>> [ERROR]
> >>>>
> >>>>  /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/
> >>> sql/hive/HiveContext.scala:182:
> >>>
> >>>> type mismatch;
> >>>>   found   : String
> >>>>   required: Array[String]
> >>>> [ERROR]       val proc: CommandProcessor =
> >>>> CommandProcessorFactory.get(tokens(0), hiveconf)
> >>>> [ERROR]
> >>>>     ^
> >>>> [ERROR]
> >>>>
> >>>>  /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/
> >>> sql/hive/HiveMetastoreCatalog.scala:60:
> >>>
> >>>> value getAllPartitionsForPruner is not a member of org.apache.
> >>>>   hadoop.hive.ql.metadata.Hive
> >>>> [ERROR]         client.getAllPartitionsForPruner(table).toSeq
> >>>> [ERROR]                ^
> >>>> [ERROR]
> >>>>
> >>>>  /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/
> >>> sql/hive/HiveMetastoreCatalog.scala:267:
> >>>
> >>>> overloaded method constructor TableDesc with alternatives:
> >>>>    (x$1: Class[_ <: org.apache.hadoop.mapred.InputFormat[_, _]],x$2:
> >>>> Class[_],x$3:
> >>>>
> >>> java.util.Properties)org.apache.hadoop.hive.ql.plan.TableDesc
> >>>
> >>>> <and>
> >>>>    ()org.apache.hadoop.hive.ql.plan.TableDesc
> >>>>   cannot be applied to (Class[org.apache.hadoop.hive.
> >>>> serde2.Deserializer],
> >>>> Class[(some other)?0(in value tableDesc)(in value tableDesc)],
> >>>>
> >>> Class[?0(in
> >>>
> >>>> value tableDesc)(in   value tableDesc)], java.util.Properties)
> >>>> [ERROR]   val tableDesc = new TableDesc(
> >>>> [ERROR]                   ^
> >>>> [WARNING] Class org.antlr.runtime.tree.CommonTree not found -
> >>>> continuing
> >>>> with a stub.
> >>>> [WARNING] Class org.antlr.runtime.Token not found - continuing with
a
> >>>>
> >>> stub.
> >>>
> >>>> [WARNING] Class org.antlr.runtime.tree.Tree not found - continuing with
> >>>> a
> >>>> stub.
> >>>> [ERROR]
> >>>>       while compiling:
> >>>>
> >>>>  /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/
> >>> sql/hive/HiveQl.scala
> >>>
> >>>>          during phase: typer
> >>>>       library version: version 2.10.4
> >>>>      compiler version: version 2.10.4
> >>>>
> >>>> The above shows incompatible changes between 0.12 and 0.13.1
> >>>> e.g. the first error corresponds to the following method
> >>>> in CommandProcessorFactory :
> >>>>    public static CommandProcessor get(String[] cmd, HiveConf conf)
> >>>>
> >>>> Cheers
> >>>>
> >>>>
> >>>> On Mon, Jul 28, 2014 at 1:32 PM, Steve Nunez <snunez@hortonworks.com>
> >>>> wrote:
> >>>>
> >>>>  So, do we have a short-term fix until Hive 0.14 comes out? Perhaps
> >>>>>
> >>>> adding
> >>>
> >>>> the hive-exec jar to the spark-project repo? It doesn¹t look like
> >>>>>
> >>>> there¹s
> >>>
> >>>> a release date schedule for 0.14.
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 7/28/14, 10:50, "Cheng Lian" <lian.cs.zju@gmail.com> wrote:
> >>>>>
> >>>>>  Exactly, forgot to mention Hulu team also made changes to cope
with
> >>>>>>
> >>>>> those
> >>>
> >>>> incompatibility issues, but they said that¹s relatively easy once the
> >>>>>> re-packaging work is done.
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Jul 29, 2014 at 1:20 AM, Patrick Wendell <pwendell@gmail.com>
> >>>>>>
> >>>>>
> >>>>>  wrote:
> >>>>>>
> >>>>>>  I've heard from Cloudera that there were hive internal changes
> >>>>>>>
> >>>>>> between
> >>>
> >>>>  0.12 and 0.13 that required code re-writing. Over time it might be
> >>>>>>> possible for us to integrate with hive using API's that
are more
> >>>>>>> stable (this is the domain of Michael/Cheng/Yin more than
me!). It
> >>>>>>> would be interesting to see what the Hulu folks did.
> >>>>>>>
> >>>>>>> - Patrick
> >>>>>>>
> >>>>>>> On Mon, Jul 28, 2014 at 10:16 AM, Cheng Lian <lian.cs.zju@gmail.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> AFAIK, according a recent talk, Hulu team in China has
built Spark
> >>>>>>>>
> >>>>>>> SQL
> >>>>>
> >>>>>> against Hive 0.13 (or 0.13.1?) successfully. Basically they
also
> >>>>>>>> re-packaged Hive 0.13 as what the Spark team did. The
slides of the
> >>>>>>>>
> >>>>>>> talk
> >>>>>>>
> >>>>>>>> hasn't been released yet though.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Tue, Jul 29, 2014 at 1:01 AM, Ted Yu <yuzhihong@gmail.com>
> >>>>>>>>
> >>>>>>> wrote:
> >>>
> >>>>
> >>>>>>>>  Owen helped me find this:
> >>>>>>>>> https://issues.apache.org/jira/browse/HIVE-7423
> >>>>>>>>>
> >>>>>>>>> I guess this means that for Hive 0.14, Spark should
be able to
> >>>>>>>>>
> >>>>>>>> directly
> >>>>>>>
> >>>>>>>> pull in hive-exec-core.jar
> >>>>>>>>>
> >>>>>>>>> Cheers
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Mon, Jul 28, 2014 at 9:55 AM, Patrick Wendell
<
> >>>>>>>>>
> >>>>>>>> pwendell@gmail.com>
> >>>>>
> >>>>>>  wrote:
> >>>>>>>>>
> >>>>>>>>>  It would be great if the hive team can fix that
issue. If not,
> >>>>>>>>>>
> >>>>>>>>> we'll
> >>>>>>>
> >>>>>>>> have to continue forking our own version of Hive to
change the
> >>>>>>>>>>
> >>>>>>>>> way
> >>>
> >>>>  it
> >>>>>>>
> >>>>>>>> publishes artifacts.
> >>>>>>>>>>
> >>>>>>>>>> - Patrick
> >>>>>>>>>>
> >>>>>>>>>> On Mon, Jul 28, 2014 at 9:34 AM, Ted Yu <yuzhihong@gmail.com>
> >>>>>>>>>>
> >>>>>>>>> wrote:
> >>>>>>>
> >>>>>>>>  Talked with Owen offline. He confirmed that as of 0.13,
> >>>>>>>>>>>
> >>>>>>>>>> hive-exec is
> >>>>>>>
> >>>>>>>> still
> >>>>>>>>>>
> >>>>>>>>>>> uber jar.
> >>>>>>>>>>>
> >>>>>>>>>>> Right now I am facing the following error
building against
> >>>>>>>>>>>
> >>>>>>>>>> Hive
> >>>
> >>>>  0.13.1
> >>>>>>>
> >>>>>>>> :
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> [ERROR] Failed to execute goal on project
spark-hive_2.10:
> >>>>>>>>>>>
> >>>>>>>>>> Could
> >>>
> >>>>  not
> >>>>>>>
> >>>>>>>>  resolve dependencies for project
> >>>>>>>>>>> org.apache.spark:spark-hive_2.10:jar:1.1.0-SNAPSHOT:
The
> >>>>>>>>>>>
> >>>>>>>>>> following
> >>>>>>>
> >>>>>>>>  artifacts could not be resolved:
> >>>>>>>>>>> org.spark-project.hive:hive-metastore:jar:0.13.1,
> >>>>>>>>>>> org.spark-project.hive:hive-exec:jar:0.13.1,
> >>>>>>>>>>> org.spark-project.hive:hive-serde:jar:0.13.1:
Failure to find
> >>>>>>>>>>> org.spark-project.hive:hive-metastore:jar:0.13.1
in
> >>>>>>>>>>> http://repo.maven.apache.org/maven2 was
cached in the local
> >>>>>>>>>>>
> >>>>>>>>>> repository,
> >>>>>>>>>
> >>>>>>>>>> resolution will not be reattempted until the
update interval
> >>>>>>>>>>>
> >>>>>>>>>> of
> >>>
> >>>>  maven-repo
> >>>>>>>>>>
> >>>>>>>>>>> has elapsed or updates are forced ->
[Help 1]
> >>>>>>>>>>>
> >>>>>>>>>>> Some hint would be appreciated.
> >>>>>>>>>>>
> >>>>>>>>>>> Cheers
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Mon, Jul 28, 2014 at 9:15 AM, Sean Owen
<
> >>>>>>>>>>>
> >>>>>>>>>> sowen@cloudera.com>
> >>>
> >>>>  wrote:
> >>>>>>>
> >>>>>>>>
> >>>>>>>>>>>  Yes, it is published. As of previous versions,
at least,
> >>>>>>>>>>>>
> >>>>>>>>>>> hive-exec
> >>>>>>>
> >>>>>>>>  included all of its dependencies *in its artifact*,
making it
> >>>>>>>>>>>>
> >>>>>>>>>>> unusable
> >>>>>>>
> >>>>>>>>  as-is because it contained copies of dependencies that
clash
> >>>>>>>>>>>>
> >>>>>>>>>>> with
> >>>>>>>
> >>>>>>>>  versions present in other artifacts, and can't be managed
> >>>>>>>>>>>>
> >>>>>>>>>>> with
> >>>
> >>>>  Maven
> >>>>>>>
> >>>>>>>>  mechanisms.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I am not sure why hive-exec was not
published normally, with
> >>>>>>>>>>>>
> >>>>>>>>>>> just
> >>>>>>> its
> >>>>>>>
> >>>>>>>>  own classes. That's why it was copied, into an artifact
with
> >>>>>>>>>>>>
> >>>>>>>>>>> just
> >>>>>>>
> >>>>>>>>  hive-exec code.
> >>>>>>>>>>>>
> >>>>>>>>>>>> You could do the same thing for hive-exec
0.13.1.
> >>>>>>>>>>>> Or maybe someone knows that it's published
more 'normally'
> >>>>>>>>>>>>
> >>>>>>>>>>> now.
> >>>
> >>>>  I don't think hive-metastore is related to this question?
> >>>>>>>>>>>>
> >>>>>>>>>>>> I am no expert on the Hive artifacts,
just remembering what
> >>>>>>>>>>>>
> >>>>>>>>>>> the
> >>>
> >>>>  issue
> >>>>>>>
> >>>>>>>>  was initially in case it helps you get to a similar
solution.
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Mon, Jul 28, 2014 at 4:47 PM, Ted
Yu <yuzhihong@gmail.com
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>  wrote:
> >>>>>>>
> >>>>>>>>  hive-exec (as of 0.13.1) is published here:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>  http://search.maven.org/#artifactdetails%7Corg.apache.
> >>> hive%7Chive-exec%7C
> >>>
> >>>>  0.13.1%7Cjar
> >>>>>>>
> >>>>>>>>
> >>>>>>>>>>>>> Should a JIRA be opened so that
dependency on
> >>>>>>>>>>>>>
> >>>>>>>>>>>> hive-metastore
> >>>
> >>>>  can
> >>>>>>> be
> >>>>>>>
> >>>>>>>>  replaced by dependency on hive-exec ?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Cheers
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Mon, Jul 28, 2014 at 8:26 AM,
Sean Owen
> >>>>>>>>>>>>>
> >>>>>>>>>>>> <sowen@cloudera.com>
> >>>>>>>
> >>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>>  The reason for org.spark-project.hive
is that Spark relies
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>> on
> >>>>>
> >>>>>>  hive-exec, but the Hive project does not publish this
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>> artifact
> >>>>>>> by
> >>>>>>>
> >>>>>>>>  itself, only with all its dependencies as an uber jar.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>> Maybe
> >>>
> >>>>  that's
> >>>>>>>
> >>>>>>>>  been improved. If so, you need to point at the new
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>> hive-exec
> >>>
> >>>>  and
> >>>>>>>
> >>>>>>>>  perhaps sort out its dependencies manually in your
build.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Mon, Jul 28, 2014 at 4:01
PM, Ted Yu <
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>> yuzhihong@gmail.com>
> >>>>>
> >>>>>>  wrote:
> >>>>>>>>>
> >>>>>>>>>>  I found 0.13.1 artifacts in maven:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>  http://search.maven.org/#artifactdetails%7Corg.apache.
> >>> hive%7Chive-metasto
> >>>
> >>>>  re%7C0.13.1%7Cjar
> >>>>>>>
> >>>>>>>>
> >>>>>>>>>>>>>>> However, Spark uses groupId
of org.spark-project.hive,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>> not
> >>>
> >>>>   org.apache.hive
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Can someone tell me how
it is supposed to work ?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Cheers
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Mon, Jul 28, 2014 at
7:44 AM, Steve Nunez <
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>> snunez@hortonworks.com>
> >>>>>>>>>>
> >>>>>>>>>>>  wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>  I saw a note earlier, perhaps
on the user list, that at
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> least
> >>>>>>>
> >>>>>>>> one
> >>>>>>>>>
> >>>>>>>>>>  person is
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> using Hive 0.13. Anyone
got a working build
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> configuration
> >>>
> >>>>  for
> >>>>>>>
> >>>>>>>> this
> >>>>>>>>>
> >>>>>>>>>>  version
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> of Hive?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>>>>> - Steve
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>> CONFIDENTIALITY NOTICE
> >>>>>>>>>>>>>>>> NOTICE: This message
is intended for the use of the
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> individual
> >>>>>>>
> >>>>>>>> or
> >>>>>>>>>
> >>>>>>>>>>  entity to
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> which it is addressed and
may contain information that
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> is
> >>>
> >>>>  confidential,
> >>>>>>>>>>>>
> >>>>>>>>>>>>>  privileged and exempt from disclosure
under applicable
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> law.
> >>>>>>> If
> >>>>>>>
> >>>>>>>> the
> >>>>>>>>>
> >>>>>>>>>>  reader
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> of this message is not the
intended recipient, you are
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> hereby
> >>>>>>>
> >>>>>>>>  notified
> >>>>>>>>>>>>
> >>>>>>>>>>>>> that
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> any printing, copying, dissemination,
distribution,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> disclosure
> >>>>>>>
> >>>>>>>> or
> >>>>>>>>>
> >>>>>>>>>>  forwarding of this communication is strictly
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> prohibited.
> >>>
> >>>>  If
> >>>>>>> you
> >>>>>>>
> >>>>>>>> have
> >>>>>>>>>>
> >>>>>>>>>>>  received this communication in error, please
contact
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> the
> >>>
> >>>>  sender
> >>>>>>>
> >>>>>>>>  immediately
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> and delete it from your
system. Thank You.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> CONFIDENTIALITY NOTICE
> >>>>> NOTICE: This message is intended for the use of the individual or
> >>>>> entity
> >>>>> to
> >>>>> which it is addressed and may contain information that is confidential,
> >>>>> privileged and exempt from disclosure under applicable law. If the
> >>>>>
> >>>> reader
> >>>
> >>>> of this message is not the intended recipient, you are hereby notified
> >>>>> that
> >>>>> any printing, copying, dissemination, distribution, disclosure or
> >>>>> forwarding of this communication is strictly prohibited. If you
have
> >>>>> received this communication in error, please contact the sender
> >>>>> immediately
> >>>>> and delete it from your system. Thank You.
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >
> >
 		 	   		  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message