samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yan Fang" <yanfang...@gmail.com>
Subject Re: Review Request 35445: SAMZA-693: Very basic HDFS Producer service for Samza
Date Thu, 30 Jul 2015 18:59:19 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35445/#review93614
-----------------------------------------------------------



docs/learn/documentation/versioned/hdfs/producer.md (line 33)
<https://reviews.apache.org/r/35445/#comment148011>

    com.etsy -> org.apache.



docs/learn/documentation/versioned/hdfs/producer.md (line 65)
<https://reviews.apache.org/r/35445/#comment148012>

    you also need to add a link in the document/index.html to make it "reachable".



samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/BinarySequenceFileHdfsWriter.scala
(line 38)
<https://reviews.apache.org/r/35445/#comment148015>

    can we use the "camel case with an initial lower case character " for the varaible? Just
to obey the same coding guide http://samza.apache.org/contribute/coding-guide.html
    
    Thank you.



samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/Bucketer.scala (line 38)
<https://reviews.apache.org/r/35445/#comment148016>

    not necessary



samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/JobNameDateTimeBucketer.scala
(line 40)
<https://reviews.apache.org/r/35445/#comment148018>

    My overall concern here is that, if there are more than one tasks are running, is it possible
that all the tasks are writing to one file at the same time?



samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/JobNameDateTimeBucketer.scala
(line 55)
<https://reviews.apache.org/r/35445/#comment148017>

    this will very possibly never be executed, because line 53 already makes currentDateTime
== dateTime.



samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/TextSequenceFileHdfsWriter.scala
(line 37)
<https://reviews.apache.org/r/35445/#comment148024>

    I would prefer the param idea because 1) Samza is already using this fashion 2) less code
especially when there are more SequenceFileHdfsWriter come out (LongWritable, etc)
    
    "like the casting of the outgoing message to something not-Writable like Array[Byte] or
String might require a third param and it might start to get awkward"
    
    -- We can always cast the outgoing msg to Array[Byte] using the serde defined for this
msg. So as long as the Wriable accepts Array[Byte], this should be fine.
    
    "Also there are some Writable types that would not allow us to determine message size
for batching purposes the way "
    
    -- I think we can either give it a default size (this can be configurable) when there
is not getLength method or use a subclass. Either way will be fine.



samza-hdfs/src/test/scala/org/apache/samza/system/hdfs/TestHdfsSystemProducerTestSuite.scala
(line 106)
<https://reviews.apache.org/r/35445/#comment148019>

    do we need to bring up the MiniCluster in every test? Is it be better if we just bring
them up in the beforeClass and then shutdown it in afterClass?


- Yan Fang


On July 28, 2015, 5:25 a.m., Eli Reisman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35445/
> -----------------------------------------------------------
> 
> (Updated July 28, 2015, 5:25 a.m.)
> 
> 
> Review request for samza.
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> SAMZA-693: Very basic HDFS Producer service for Samza
> 
> 
> Diffs
> -----
> 
>   build.gradle 0852adc 
>   docs/learn/documentation/versioned/hdfs/producer.md PRE-CREATION 
>   samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsConfig.scala PRE-CREATION

>   samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemAdmin.scala PRE-CREATION

>   samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemFactory.scala PRE-CREATION

>   samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducer.scala PRE-CREATION

>   samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/HdfsSystemProducerMetrics.scala
PRE-CREATION 
>   samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/BinarySequenceFileHdfsWriter.scala
PRE-CREATION 
>   samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/Bucketer.scala PRE-CREATION

>   samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/HdfsWriter.scala PRE-CREATION

>   samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/JobNameDateTimeBucketer.scala
PRE-CREATION 
>   samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/SequenceFileHdfsWriter.scala
PRE-CREATION 
>   samza-hdfs/src/main/scala/org/apache/samza/system/hdfs/writer/TextSequenceFileHdfsWriter.scala
PRE-CREATION 
>   samza-hdfs/src/test/resources/samza-hdfs-test-batch-job-text.properties PRE-CREATION

>   samza-hdfs/src/test/resources/samza-hdfs-test-batch-job.properties PRE-CREATION 
>   samza-hdfs/src/test/resources/samza-hdfs-test-job-text.properties PRE-CREATION 
>   samza-hdfs/src/test/resources/samza-hdfs-test-job.properties PRE-CREATION 
>   samza-hdfs/src/test/scala/org/apache/samza/system/hdfs/TestHdfsSystemProducerTestSuite.scala
PRE-CREATION 
>   settings.gradle 19bff97 
> 
> Diff: https://reviews.apache.org/r/35445/diff/
> 
> 
> Testing
> -------
> 
> Updated: See JIRA SAMZA-693 for details, this latest update (693-4) addresses post-review
issues and adds more pluggable design, several default writer implementations, and more (and
more thorough) unit tests.
> 
> Passes 'gradle clean test'.
> 
> 
> Thanks,
> 
> Eli Reisman
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message