spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matei Zaharia <matei.zaha...@gmail.com>
Subject Re: Saying hello and helping out
Date Wed, 31 Jul 2013 03:56:44 GMT
Cool! The way I'd start is perhaps by adding a new Python example job. For example, a good
one to implement would be PageRank -- you can look at these slides for a Scala version of
it: http://ampcamp.berkeley.edu/wp-content/uploads/2012/06/matei-zaharia-part-2-amp-camp-2012-standalone-programs.pdf.
Another possibility is linear regression. But feel free to also come up with your own.

There are also a number of Python issues open relating to adding some missing API features,
but these require a more thorough understanding of how PySpark work and possibly some hacking
around in pickled data: https://spark-project.atlassian.net/browse/SPARK-791?jql=component%20%3D%20PySpark%20AND%20status%20%3D%20Open
. The easiest one to start with is probably SPARK-838.

Matei

On Jul 30, 2013, at 6:44 AM, Michael Joyce <joyce@apache.org> wrote:

> Hay Matei,
> 
> I would love to help on the Python API. I'll start taking a look at that.
> Unfortunately I don't have access to a Windows computer, so I can't be of
> much use there. I would also be more than happy to work on the JVM stuff as
> well. If you have a list stuff to do there (or it wouldn't take too long to
> compile one), I would gladly take a look.
> 
> Thanks for all the help!
> 
> 
> -- Joyce
> 
> 
> On Mon, Jul 29, 2013 at 4:17 PM, Matei Zaharia <matei.zaharia@gmail.com>wrote:
> 
>> Hey Michael,
>> 
>> Depending on your background, there are quite a few things to do.
>> 
>> One general area that we might use more help for, if you have experience
>> there, is the Python API. Part of it can be just to add more examples in
>> Python, e.g., to show how one can use NumPy or SciPy with it. Another thing
>> that would be super useful if you also have access to Windows is this:
>> https://spark-project.atlassian.net/browse/SPARK-649. We want to make
>> Spark very broadly accessible for science work and it sounds like your
>> background at JPL is good for that.
>> 
>> Alternatively, if you prefer to work on the Java VM, there are a bunch of
>> internal things to do there too -- I can give an overview of what I'd
>> consider easy to jump into there.
>> 
>> Matei
>> 
>> On Jul 29, 2013, at 1:03 PM, Michael Joyce <joyce@apache.org> wrote:
>> 
>>> Hay Matei,
>>> 
>>> Truth be told I haven't had much of a chance to look through JIRA and the
>>> code base to pick a specific part to work on. Is there anything in
>>> particular that needs some work? I'm more than happy to throw some effort
>>> at a specific problem if something needs attention. Otherwise I can just
>>> poke around and try to find a nice niche in which to work so I can help
>> out.
>>> 
>>> Thanks much!
>>> 
>>> -- Joyce
>>> 
>>> 
>>> On Mon, Jul 29, 2013 at 10:55 AM, Matei Zaharia <matei.zaharia@gmail.com
>>> wrote:
>>> 
>>>> Hey Michael,
>>>> 
>>>> Glad to hear you're interested in helping. Are there specific things
>> you'd
>>>> like to work on? Certainly we will need help with various Apache
>> packaging,
>>>> etc so it's good to have more people with experience at Apache.
>>>> 
>>>> Matei
>>>> 
>>>> On Jul 29, 2013, at 8:36 AM, Michael Joyce <joyce@apache.org> wrote:
>>>> 
>>>>> Hi all!
>>>>> 
>>>>> My name is Michael Joyce. I work at JPL and have heard some great
>> things
>>>>> about Spark from Chris Mattmann. I figured I would stop by, say hello,
>>>> and
>>>>> hopefully throw some helpful contributions at the project.
>>>>> 
>>>>> Look forward to helping out!
>>>>> 
>>>>> -- Joyce
>>>> 
>>>> 
>> 
>> 


Mime
View raw message