spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mayur Rustagi <mayur.rust...@gmail.com>
Subject Re: Pig on Spark
Date Mon, 10 Mar 2014 18:47:56 GMT
Hi Sameer,
Did you make any progress on this. My team is also trying it out would love
to know some detail so progress.

Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>



On Thu, Mar 6, 2014 at 2:20 PM, Sameer Tilak <sstilak@live.com> wrote:

> Hi Aniket,
> Many thanks! I will check this out.
>
> ------------------------------
> Date: Thu, 6 Mar 2014 13:46:50 -0800
> Subject: Re: Pig on Spark
> From: aniket486@gmail.com
> To: user@spark.apache.org; tgraves_cs@yahoo.com
>
>
> There is some work to make this work on yarn at
> https://github.com/aniket486/pig. (So, compile pig with ant
> -Dhadoopversion=23)
>
> You can look at https://github.com/aniket486/pig/blob/spork/pig-spark to
> find out what sort of env variables you need (sorry, I haven't been able to
> clean this up- in-progress). There are few known issues with this, I will
> work on fixing them soon.
>
> Known issues-
> 1. Limit does not work (spork-fix)
> 2. Foreach requires to turn off schema-tuple-backend (should be a pig-jira)
> 3. Algebraic udfs dont work (spork-fix in-progress)
> 4. Group by rework (to avoid OOMs)
> 5. UDF Classloader issue (requires SPARK-1053, then you can put
> pig-withouthadoop.jar as SPARK_JARS in SparkContext along with udf jars)
>
> ~Aniket
>
>
>
>
> On Thu, Mar 6, 2014 at 1:36 PM, Tom Graves <tgraves_cs@yahoo.com> wrote:
>
> I had asked a similar question on the dev mailing list a while back (Jan
> 22nd).
>
> See the archives:
> http://mail-archives.apache.org/mod_mbox/spark-dev/201401.mbox/browser ->
> look for spork.
>
> Basically Matei said:
>
> Yup, that was it, though I believe people at Twitter picked it up again recently. I’d
suggest
> asking Dmitriy if you know him. I’ve seen interest in this from several other groups,
and
> if there’s enough of it, maybe we can start another open source repo to track it. The
work
> in that repo you pointed to was done over one week, and already had most of Pig’s operators
> working. (I helped out with this prototype over Twitter’s hack week.) That work also
calls
> the Scala API directly, because it was done before we had a Java API; it should be easier
> with the Java one.
>
>
> Tom
>
>
>
>   On Thursday, March 6, 2014 3:11 PM, Sameer Tilak <sstilak@live.com>
> wrote:
>   Hi everyone,
>
> We are using to Pig to build our data pipeline. I came across Spork -- Pig
> on Spark at: https://github.com/dvryaboy/pig and not sure if it is still
> active.
>
> Can someone please let me know the status of Spork or any other effort
> that will let us run Pig on Spark? We can significantly benefit by using
> Spark, but we would like to keep using the existing Pig scripts.
>
>
>
>
>
> --
> "...:::Aniket:::... Quetzalco@tl"
>

Mime
View raw message