spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aniket Mokashi <aniket...@gmail.com>
Subject Re: Pig on Spark
Date Fri, 14 Mar 2014 20:37:32 GMT
We will post fixes from our side at - https://github.com/twitter/pig.

Top on our list are-
1. Make it work with pig-trunk (execution engine interface) (with 0.8 or
0.9 spark).
2. Support for algebraic udfs (this mitigates the group by oom problems).

Would definitely love more contribution on this.

Thanks,
Aniket


On Fri, Mar 14, 2014 at 12:29 PM, Mayur Rustagi <mayur.rustagi@gmail.com>wrote:

> Dam I am off to NY for Structure Conf. Would it be possible to meet
> anytime after 28th March?
> I am really interested in making it stable & production quality.
>
> Regards
> Mayur Rustagi
> Ph: +1 (760) 203 3257
> http://www.sigmoidanalytics.com
> @mayur_rustagi <https://twitter.com/mayur_rustagi>
>
>
>
> On Fri, Mar 14, 2014 at 11:53 AM, Julien Le Dem <julien@twitter.com>wrote:
>
>> Hi Mayur,
>> Are you going to the Pig meetup this afternoon?
>> http://www.meetup.com/PigUser/events/160604192/
>> Aniket and I will be there.
>> We would be happy to chat about Pig-on-Spark
>>
>>
>>
>> On Tue, Mar 11, 2014 at 8:56 AM, Mayur Rustagi <mayur.rustagi@gmail.com>wrote:
>>
>>> Hi Lin,
>>> We are working on getting Pig on spark functional with 0.8.0, have you
>>> got it working on any spark version ?
>>> Also what all functionality works on it?
>>> Regards
>>> Mayur
>>>
>>> Mayur Rustagi
>>> Ph: +1 (760) 203 3257
>>> http://www.sigmoidanalytics.com
>>> @mayur_rustagi <https://twitter.com/mayur_rustagi>
>>>
>>>
>>>
>>> On Mon, Mar 10, 2014 at 11:00 PM, Xiangrui Meng <mengxr@gmail.com>wrote:
>>>
>>>> Hi Sameer,
>>>>
>>>> Lin (cc'ed) could also give you some updates about Pig on Spark
>>>> development on her side.
>>>>
>>>> Best,
>>>> Xiangrui
>>>>
>>>> On Mon, Mar 10, 2014 at 12:52 PM, Sameer Tilak <sstilak@live.com>
>>>> wrote:
>>>> > Hi Mayur,
>>>> > We are planning to upgrade our distribution MR1> MR2 (YARN) and the
>>>> goal is
>>>> > to get SPROK set up next month. I will keep you posted. Can you
>>>> please keep
>>>> > me informed about your progress as well.
>>>> >
>>>> > ________________________________
>>>> > From: mayur.rustagi@gmail.com
>>>> > Date: Mon, 10 Mar 2014 11:47:56 -0700
>>>> >
>>>> > Subject: Re: Pig on Spark
>>>> > To: user@spark.apache.org
>>>> >
>>>> >
>>>> > Hi Sameer,
>>>> > Did you make any progress on this. My team is also trying it out
>>>> would love
>>>> > to know some detail so progress.
>>>> >
>>>> > Mayur Rustagi
>>>> > Ph: +1 (760) 203 3257
>>>> > http://www.sigmoidanalytics.com
>>>> > @mayur_rustagi
>>>> >
>>>> >
>>>> >
>>>> > On Thu, Mar 6, 2014 at 2:20 PM, Sameer Tilak <sstilak@live.com>
>>>> wrote:
>>>> >
>>>> > Hi Aniket,
>>>> > Many thanks! I will check this out.
>>>> >
>>>> > ________________________________
>>>> > Date: Thu, 6 Mar 2014 13:46:50 -0800
>>>> > Subject: Re: Pig on Spark
>>>> > From: aniket486@gmail.com
>>>> > To: user@spark.apache.org; tgraves_cs@yahoo.com
>>>> >
>>>> >
>>>> > There is some work to make this work on yarn at
>>>> > https://github.com/aniket486/pig. (So, compile pig with ant
>>>> > -Dhadoopversion=23)
>>>> >
>>>> > You can look at https://github.com/aniket486/pig/blob/spork/pig-sparkto
>>>> > find out what sort of env variables you need (sorry, I haven't been
>>>> able to
>>>> > clean this up- in-progress). There are few known issues with this, I
>>>> will
>>>> > work on fixing them soon.
>>>> >
>>>> > Known issues-
>>>> > 1. Limit does not work (spork-fix)
>>>> > 2. Foreach requires to turn off schema-tuple-backend (should be a
>>>> pig-jira)
>>>> > 3. Algebraic udfs dont work (spork-fix in-progress)
>>>> > 4. Group by rework (to avoid OOMs)
>>>> > 5. UDF Classloader issue (requires SPARK-1053, then you can put
>>>> > pig-withouthadoop.jar as SPARK_JARS in SparkContext along with udf
>>>> jars)
>>>> >
>>>> > ~Aniket
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > On Thu, Mar 6, 2014 at 1:36 PM, Tom Graves <tgraves_cs@yahoo.com>
>>>> wrote:
>>>> >
>>>> > I had asked a similar question on the dev mailing list a while back
>>>> (Jan
>>>> > 22nd).
>>>> >
>>>> > See the archives:
>>>> >
>>>> http://mail-archives.apache.org/mod_mbox/spark-dev/201401.mbox/browser->
>>>> > look for spork.
>>>> >
>>>> > Basically Matei said:
>>>> >
>>>> > Yup, that was it, though I believe people at Twitter picked it up
>>>> again
>>>> > recently. I'd suggest
>>>> > asking Dmitriy if you know him. I've seen interest in this from
>>>> several
>>>> > other groups, and
>>>> > if there's enough of it, maybe we can start another open source repo
>>>> to
>>>> > track it. The work
>>>> > in that repo you pointed to was done over one week, and already had
>>>> most of
>>>> > Pig's operators
>>>> > working. (I helped out with this prototype over Twitter's hack week.)
>>>> That
>>>> > work also calls
>>>> > the Scala API directly, because it was done before we had a Java API;
>>>> it
>>>> > should be easier
>>>> > with the Java one.
>>>> >
>>>> >
>>>> > Tom
>>>> >
>>>> >
>>>> >
>>>> > On Thursday, March 6, 2014 3:11 PM, Sameer Tilak <sstilak@live.com>
>>>> wrote:
>>>> > Hi everyone,
>>>> >
>>>> > We are using to Pig to build our data pipeline. I came across Spork
>>>> -- Pig
>>>> > on Spark at: https://github.com/dvryaboy/pig and not sure if it is
>>>> still
>>>> > active.
>>>> >
>>>> > Can someone please let me know the status of Spork or any other
>>>> effort that
>>>> > will let us run Pig on Spark? We can significantly benefit by using
>>>> Spark,
>>>> > but we would like to keep using the existing Pig scripts.
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > "...:::Aniket:::... Quetzalco@tl"
>>>> >
>>>> >
>>>>
>>>
>>>
>>
>


-- 
"...:::Aniket:::... Quetzalco@tl"

Mime
View raw message