storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Montalenti <and...@parsely.com>
Subject Re: Storm with Python
Date Fri, 30 May 2014 21:14:51 GMT
For one thing, a recently accepted Storm pull request has made this
serialization pluggable and someone has already implemented a protobuf
variety. We plan to investigate alternative serialization options for
multilang once we get the other tooling out of the way.

For another, it is true the overhead for serialization is non trivial, but
the overhead also tends to be a constant factor applied to data size, and
machines are cheap while programming time is expensive. Storm and Python's
data analysis and data integration libraries are a pretty powerful combo
worth the performance penalty.
On May 30, 2014 1:42 PM, "Larry Palmer" <larry.a.palmer@gmail.com> wrote:

> We had experimented with Storm/Python 6 months ago or so, but found the
> JSON serialization/deserialization overhead was quite high, on the order of
> several hundred usec per tuple every time it transitioned from java to
> python or vice versa, limiting total throughput on a 12 core server to
> around 25k tuples/second. Considered trying to switch to a different
> serializer but ended up just doing everything in Java instead.
>
> Is that still the case, or perhaps has the speed been improved?
>
>
> On Thu, May 29, 2014 at 10:06 PM, Andrew Montalenti <andrew@parsely.com>
> wrote:
>
>> We are building a new Storm and Python interop option that is called
>> streamparse:
>>
>> https://github.com/Parsely/streamparse
>>
>> It includes a heavily rewritten Storm interop library and a command line
>> tool, sparse, for managing local and remote Storm clusters. The idea is to
>> make Storm projects as easy to build and manage in Python as RQ or Celery
>> projects.
>>
>> It currently has support for running local clusters in a single command,
>> managing virtualenvs on remote worker machines, submitting topologies,
>> listing/killing topologies, and tailing remote log files. The multilang
>> layer also has better support for logging and exception/error handling.
>> Multiple topologies can be built from a single codebase and multiple remote
>> Storm clusters can be supported via a simple JSON configuration file.
>>
>> We are already using it for production topologies atop Storm 0.9.1 and
>> Storm 0.8. We welcome contributions and if you join our mailing list, feel
>> free to make requests. We continue to develop it actively and in an open
>> manner.
>>
>> -Andrew Montalenti
>> CTO, Parse.ly
>> On May 29, 2014 6:35 PM, "Ashu Goel" <Ashu@shopkick.com> wrote:
>>
>>> (the reason being is that we are still running Python 2.6 but Petrel is
>>> only compatible with 2.7)
>>> On May 29, 2014, at 2:48 PM, Ashu Goel <Ashu@shopkick.com> wrote:
>>>
>>> Awesome! I'm looking more into using the storm.thrift to define a
>>> non-JVM DSL... does anyone have any working examples of this? Python
>>> preferred but any example will do. the wiki is a bit confusing...
>>> On May 28, 2014, at 1:54 PM, FRANCISCO JESUS GOMEZ RODRIGUEZ <
>>> franciscojesus.gomezrodriguez@telefonica.com> wrote:
>>>
>>>  Ashu, take a look this project: http://github.com/AirSage/Petrel
>>>
>>> Write, submit, debug and monitor in python.
>>>
>>> @ffranz
>>> El 28/05/2014 22:49, Ashu Goel <Ashu@shopkick.com> escribió:
>>>  Any examples where the entire infra is written in Python (including
>>> topology)? or is that not possible
>>>  On May 28, 2014, at 1:33 PM, Dilpreet Singh <dilpreet023@gmail.com>
>>> wrote:
>>>
>>>
>>> https://github.com/apache/incubator-storm/tree/master/examples/storm-starter
>>>
>>>  The WordCountTopology contains an example python bolt.
>>>
>>>  Regards,
>>> Dilpreet
>>>
>>>
>>> On Thu, May 29, 2014 at 1:59 AM, Ashu Goel <Ashu@shopkick.com> wrote:
>>>
>>>> Does anyone have a good example program/instructions of using Python
>>>> with storm? I can't seem to find anything concrete online.
>>>>
>>>> Thanks,
>>>> Ashu Goel
>>>
>>>
>>>
>>>
>>> ------------------------------
>>>
>>> Este mensaje y sus adjuntos se dirigen exclusivamente a su destinatario,
>>> puede contener información privilegiada o confidencial y es para uso
>>> exclusivo de la persona o entidad de destino. Si no es usted. el
>>> destinatario indicado, queda notificado de que la lectura, utilización,
>>> divulgación y/o copia sin autorización puede estar prohibida en virtud de
>>> la legislación vigente. Si ha recibido este mensaje por error, le rogamos
>>> que nos lo comunique inmediatamente por esta misma vía y proceda a su
>>> destrucción.
>>>
>>> The information contained in this transmission is privileged and
>>> confidential information intended only for the use of the individual or
>>> entity named above. If the reader of this message is not the intended
>>> recipient, you are hereby notified that any dissemination, distribution or
>>> copying of this communication is strictly prohibited. If you have received
>>> this transmission in error, do not read it. Please immediately reply to the
>>> sender that you have received this communication in error and then delete
>>> it.
>>>
>>> Esta mensagem e seus anexos se dirigem exclusivamente ao seu
>>> destinatário, pode conter informação privilegiada ou confidencial e é para
>>> uso exclusivo da pessoa ou entidade de destino. Se não é vossa senhoria o
>>> destinatário indicado, fica notificado de que a leitura, utilização,
>>> divulgação e/ou cópia sem autorização pode estar proibida em virtude da
>>> legislação vigente. Se recebeu esta mensagem por erro, rogamos-lhe que nos
>>> o comunique imediatamente por esta mesma via e proceda a sua destruição
>>>
>>>
>>>
>>>
>

Mime
View raw message