storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Junguk Cho <jman...@gmail.com>
Subject Re: Basic questions about Strom
Date Sat, 11 Jun 2016 01:57:42 GMT
Hi, Jungtaek.

Thank you for reply.

I have following questions.

1. If we look at the example (WordCountTopology), in WordCount class, it
uses   String word = tuple.getString(0); to get string (word).
So, I don't understand exact roles of  "word" and "count". Internally, they
use them for Map-like structure?
To be clear, does each bolt exchange data with this format  "word" : <data>
?

About default and non-default stream, do all tuples include stream id
whenever they send?


3. To be clear, if we set "false", storm does not use serialization for
inter-process and inter-node?

Thanks in advance.
- Junguk




2016-06-10 18:00 GMT-04:00 Jungtaek Lim <kabhwan@gmail.com>:

> Hi Junguk,
>
> 1. In declareOutputFields, you're declaring schema of output stream of
> this component. First value of tuple will be matched to "word", and second
> value of tuple will be matched to "count". You can access value as field
> name or index.
>
> Btw, declare() declares default stream, and there're other methods which
> declare named (non-default) stream.
>
> 2. When you're rebalancing topology, you're encouraged to input wait-time,
> too.
> Topology will be deactivated immediately so that Spout will not call
> nextTuple(), only Bolts will be running to handle on-going tuples while
> wait-time.
> If there're still on-going tuples left, they will not be acked. So if
> datasource of Spout is RabbitMQ with ack mode or Kafka or so on, Spout will
> read them from datasource again.
>
> 3. Right. In order to check serialization issue earlier, there's option
> "topology.testing.always.try.serialize" as debug purpose. Note that it
> affects performance so it should be disabled ("false" by default) for
> production environment.
>
> Hope this helps.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
>
> 2016년 6월 11일 (토) 오전 3:27, Junguk Cho <jmanbal@gmail.com>님이 작성:
>
>> Hi, I have some basic questions.
>>
>> 1. About Tuple.
>> We declare tuple in declareOutputFields.
>> For example, declarer.declare(new Fields("word", "count"));
>>
>> Are "word" and "count" forwarded to next node with actual data?
>> What are the roles of "word" and "count" here internally?
>>
>>
>> 2. About rebalancing (
>> http://storm.apache.org/releases/1.0.1/Understanding-the-parallelism-of-a-Storm-topology.html
>> )
>>
>> In storm, there is rebalancing capability.
>> What happened on-going tuples while storm rebalances topology?
>> Does it drop and replay?
>>
>> 3. Serialization.
>> In storm, as far as I know for inter-thread communication, serialization
>> does not happen. For inter-process and inter-node communication,
>> serialization is required.
>> Is it right?
>>
>> Thanks,
>> Junguk
>>
>>

Mime
View raw message