storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Montalenti <and...@parsely.com>
Subject Re: Storm with Python
Date Sat, 31 May 2014 00:24:41 GMT
We decided to start with topology definitions in Clojure because a) that
ensures that the topologies can support 100% of Storm's Clojure DSL
out-of-the-box and b) that allows easy mixing of Python, Java, Clojure, and
even other multi-lang bolts. For example, we plan on producing example
topologies that use Python bolts for processing, but use the built-in JVM
Kafka spout as a performant data integration option.

Above and beyond a nice bundling of the Clojure DSL provided with Storm,
streamparse provides:

i) *sparse*, a command-line tool that can quickstart a Storm project; run local
Storm clusters (using LocalCluster under the hood) in a single command;
manage virtualenvs on remote worker machines; submit topologies to remote
Nimbus nodes via SSH tunneling; list and kill topologies using the same;
tail remote log files across a cluster of Storm worker machines. So, in
short -- it provides a *whole lot* and it's still in very early releases.
It provides for a rapid development cycle that Pythonistas are used to when
using technologies like Storm. It makes Storm feel as lightweight as
something like RQ or Celery.

ii) An actively-developed *roadmap*. Planned future features include remote
debugging of crashed Storm bolts using pdb-over-a-socket; visualization
tools for Storm topologies; and higher-level Bolt patterns, such as a
BatchingBolt we have been using at Parse.ly that allows for simple time-
and data-based batching patterns with proper ack/fail semantics abstracted.

iii) The Python *streamparse *package, which is an improved multi-lang
layer that has better support for logging and exception/error handling. It
was rewritten from scratch but modeled on Storm's bundled storm.py to
implement the basics of IPC. In the future, it might also support
serializations beyond plain JSON.

I'm personally of the opinion that writing DSLs for Storm topologies in
languages other than Clojure is a bit of a rabbit hole. Doable -- but may
not be worth it, and the pro's of topologies defined in Clojure outweigh
the con's. But I can understand a widespread distaste for Clojure and even
"multi-language" projects as a concept.

We'd gladly accept contributions for a (simple) Python DSL that maps down
to Clojure's DSL. I think trying to build a "native" DSL using the Thrift
structures ends up with a lot of complexity for little gain. If you're
interested in that sort of thing, some other projects have been floating
around that attempt it -- but if you want my opinion, you're in for a world
of pain :)


On Fri, May 30, 2014 at 7:59 PM, Ashu Goel <Ashu@shopkick.com> wrote:

> Andrew,
>
> From what I understand streamparse still requires that the topologies be
> in Clojure... not entirely sure how this is different from what storm already
> provides. I was looking more for a DSL that we could use w/ Python 2.6 and
> be 100% Python, but it looks like that is not available.
>
> -Ashu
>
> On May 30, 2014, at 2:14 PM, Andrew Montalenti <andrew@parsely.com> wrote:
>
> For one thing, a recently accepted Storm pull request has made this
> serialization pluggable and someone has already implemented a protobuf
> variety. We plan to investigate alternative serialization options for
> multilang once we get the other tooling out of the way.
>
> For another, it is true the overhead for serialization is non trivial, but
> the overhead also tends to be a constant factor applied to data size, and
> machines are cheap while programming time is expensive. Storm and Python's
> data analysis and data integration libraries are a pretty powerful combo
> worth the performance penalty.
> On May 30, 2014 1:42 PM, "Larry Palmer" <larry.a.palmer@gmail.com> wrote:
>
>> We had experimented with Storm/Python 6 months ago or so, but found the
>> JSON serialization/deserialization overhead was quite high, on the order of
>> several hundred usec per tuple every time it transitioned from java to
>> python or vice versa, limiting total throughput on a 12 core server to
>> around 25k tuples/second. Considered trying to switch to a different
>> serializer but ended up just doing everything in Java instead.
>>
>> Is that still the case, or perhaps has the speed been improved?
>>
>>
>> On Thu, May 29, 2014 at 10:06 PM, Andrew Montalenti <andrew@parsely.com>
>> wrote:
>>
>>> We are building a new Storm and Python interop option that is called
>>> streamparse:
>>>
>>> https://github.com/Parsely/streamparse
>>>
>>> It includes a heavily rewritten Storm interop library and a command line
>>> tool, sparse, for managing local and remote Storm clusters. The idea is to
>>> make Storm projects as easy to build and manage in Python as RQ or Celery
>>> projects.
>>>
>>> It currently has support for running local clusters in a single command,
>>> managing virtualenvs on remote worker machines, submitting topologies,
>>> listing/killing topologies, and tailing remote log files. The multilang
>>> layer also has better support for logging and exception/error handling.
>>> Multiple topologies can be built from a single codebase and multiple remote
>>> Storm clusters can be supported via a simple JSON configuration file.
>>>
>>> We are already using it for production topologies atop Storm 0.9.1 and
>>> Storm 0.8. We welcome contributions and if you join our mailing list, feel
>>> free to make requests. We continue to develop it actively and in an open
>>> manner.
>>>
>>> -Andrew Montalenti
>>> CTO, Parse.ly
>>> On May 29, 2014 6:35 PM, "Ashu Goel" <Ashu@shopkick.com> wrote:
>>>
>>>> (the reason being is that we are still running Python 2.6 but Petrel is
>>>> only compatible with 2.7)
>>>> On May 29, 2014, at 2:48 PM, Ashu Goel <Ashu@shopkick.com> wrote:
>>>>
>>>> Awesome! I'm looking more into using the storm.thrift to define a
>>>> non-JVM DSL... does anyone have any working examples of this? Python
>>>> preferred but any example will do. the wiki is a bit confusing...
>>>> On May 28, 2014, at 1:54 PM, FRANCISCO JESUS GOMEZ RODRIGUEZ <
>>>> franciscojesus.gomezrodriguez@telefonica.com> wrote:
>>>>
>>>>  Ashu, take a look this project: http://github.com/AirSage/Petrel
>>>>
>>>> Write, submit, debug and monitor in python.
>>>>
>>>> @ffranz
>>>> El 28/05/2014 22:49, Ashu Goel <Ashu@shopkick.com> escribió:
>>>>  Any examples where the entire infra is written in Python (including
>>>> topology)? or is that not possible
>>>>  On May 28, 2014, at 1:33 PM, Dilpreet Singh <dilpreet023@gmail.com>
>>>> wrote:
>>>>
>>>>
>>>> https://github.com/apache/incubator-storm/tree/master/examples/storm-starter
>>>>
>>>>  The WordCountTopology contains an example python bolt.
>>>>
>>>>  Regards,
>>>> Dilpreet
>>>>
>>>>
>>>> On Thu, May 29, 2014 at 1:59 AM, Ashu Goel <Ashu@shopkick.com> wrote:
>>>>
>>>>> Does anyone have a good example program/instructions of using Python
>>>>> with storm? I can't seem to find anything concrete online.
>>>>>
>>>>> Thanks,
>>>>> Ashu Goel
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------
>>>>
>>>> Este mensaje y sus adjuntos se dirigen exclusivamente a su
>>>> destinatario, puede contener información privilegiada o confidencial y es
>>>> para uso exclusivo de la persona o entidad de destino. Si no es usted. el
>>>> destinatario indicado, queda notificado de que la lectura, utilización,
>>>> divulgación y/o copia sin autorización puede estar prohibida en virtud
de
>>>> la legislación vigente. Si ha recibido este mensaje por error, le rogamos
>>>> que nos lo comunique inmediatamente por esta misma vía y proceda a su
>>>> destrucción.
>>>>
>>>> The information contained in this transmission is privileged and
>>>> confidential information intended only for the use of the individual or
>>>> entity named above. If the reader of this message is not the intended
>>>> recipient, you are hereby notified that any dissemination, distribution or
>>>> copying of this communication is strictly prohibited. If you have received
>>>> this transmission in error, do not read it. Please immediately reply to the
>>>> sender that you have received this communication in error and then delete
>>>> it.
>>>>
>>>> Esta mensagem e seus anexos se dirigem exclusivamente ao seu
>>>> destinatário, pode conter informação privilegiada ou confidencial e é
para
>>>> uso exclusivo da pessoa ou entidade de destino. Se não é vossa senhoria
o
>>>> destinatário indicado, fica notificado de que a leitura, utilização,
>>>> divulgação e/ou cópia sem autorização pode estar proibida em virtude
da
>>>> legislação vigente. Se recebeu esta mensagem por erro, rogamos-lhe que
nos
>>>> o comunique imediatamente por esta mesma via e proceda a sua destruição
>>>>
>>>>
>>>>
>>>>
>>
>

Mime
View raw message