spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aniket Mokashi <>
Subject Re: Pig on Spark
Date Thu, 06 Mar 2014 21:46:50 GMT
There is some work to make this work on yarn at (So, compile pig with ant

You can look at to
find out what sort of env variables you need (sorry, I haven't been able to
clean this up- in-progress). There are few known issues with this, I will
work on fixing them soon.

Known issues-
1. Limit does not work (spork-fix)
2. Foreach requires to turn off schema-tuple-backend (should be a pig-jira)
3. Algebraic udfs dont work (spork-fix in-progress)
4. Group by rework (to avoid OOMs)
5. UDF Classloader issue (requires SPARK-1053, then you can put
pig-withouthadoop.jar as SPARK_JARS in SparkContext along with udf jars)


On Thu, Mar 6, 2014 at 1:36 PM, Tom Graves <> wrote:

> I had asked a similar question on the dev mailing list a while back (Jan
> 22nd).
> See the archives:
> ->
> look for spork.
> Basically Matei said:
> Yup, that was it, though I believe people at Twitter picked it up again recently. I'd
> asking Dmitriy if you know him. I've seen interest in this from several other groups,
> if there's enough of it, maybe we can start another open source repo to track it. The
> in that repo you pointed to was done over one week, and already had most of Pig's operators
> working. (I helped out with this prototype over Twitter's hack week.) That work also
> the Scala API directly, because it was done before we had a Java API; it should be easier
> with the Java one.
> Tom
>   On Thursday, March 6, 2014 3:11 PM, Sameer Tilak <>
> wrote:
>   Hi everyone,
> We are using to Pig to build our data pipeline. I came across Spork -- Pig
> on Spark at: and not sure if it is still
> active.
> Can someone please let me know the status of Spork or any other effort
> that will let us run Pig on Spark? We can significantly benefit by using
> Spark, but we would like to keep using the existing Pig scripts.

"...:::Aniket:::... Quetzalco@tl"

View raw message