beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [beam] chadrik commented on issue #11038: [BEAM-7746] More typing fixes
Date Wed, 18 Mar 2020 03:03:11 GMT
chadrik commented on issue #11038: [BEAM-7746] More typing fixes
URL: https://github.com/apache/beam/pull/11038#issuecomment-600400560
 
 
   Great comments and questions. I’m in the middle of rolling out our COVID
   plan at work so it may take me a bit to get you proper answers but I’ll
   start chipping away at it as soon as I can.
   
   
   
   On Tue, Mar 17, 2020 at 7:30 PM Udi Meiri <notifications@github.com> wrote:
   
   > *@udim* commented on this pull request.
   >
   > This review is taking me forever. I'm at 18 out of 52 files reviewed, but
   > releasing 16 comments I've written so far.
   > ------------------------------
   >
   > In sdks/python/apache_beam/runners/worker/operations.py
   > <https://github.com/apache/beam/pull/11038#discussion_r393281202>:
   >
   > > @@ -569,7 +580,7 @@ def __init__(self,
   >      self.tagged_receivers = None  # type: Optional[_TaggedReceivers]
   >      # A mapping of timer tags to the input "PCollections" they come in on.
   >      self.timer_inputs = timer_inputs or {}
   > -    self.input_info = None  # type: Optional[Tuple[str, str, coders.WindowedValueCoder,
MutableMapping[str, str]]]
   >
   > So the MutableMapping hint here was a mistake?
   > ------------------------------
   >
   > In sdks/python/apache_beam/metrics/metric.py
   > <https://github.com/apache/beam/pull/11038#discussion_r393289270>:
   >
   > > @@ -56,7 +55,7 @@ class Metrics(object):
   >    @staticmethod
   >    def get_namespace(namespace):
   >      # type: (Union[Type, str]) -> str
   > -    if inspect.isclass(namespace):
   > +    if isinstance(namespace, type):
   >
   > Hopefully no one uses old-style classes any more (types.ClassType).
   > ------------------------------
   >
   > In sdks/python/mypy.ini
   > <https://github.com/apache/beam/pull/11038#discussion_r393304392>:
   >
   > >  color_output = true
   > -# uncomment this to see how close we are to being complete
   > +# required setting for dmypy:
   > +follow_imports = error
   >
   > Does this mean having to supply all imported modules on the mypy command
   > line?
   > ------------------------------
   >
   > In sdks/python/apache_beam/io/iobase.py
   > <https://github.com/apache/beam/pull/11038#discussion_r393383746>:
   >
   > > +
   > +class Position(Protocol):
   > +  def __lt__(self, other):
   > +    pass
   > +
   > +  def __le__(self, other):
   > +    pass
   > +
   > +  def __gt__(self, other):
   > +    pass
   > +
   > +  def __ge__(self, other):
   > +    pass
   > +
   > +
   > +PositionT = TypeVar('PositionT', bound='Position')
   >
   > I don't understand the usage of PositionT. Isn't Position already a type?
   > It seems that you could replace all uses of PositionT with Position and
   > it'd work the same.
   > ------------------------------
   >
   > In sdks/python/apache_beam/io/iobase.py
   > <https://github.com/apache/beam/pull/11038#discussion_r394017739>:
   >
   > > @@ -95,8 +115,11 @@
   >  #
   >  # Type for start and stop positions are specific to the bounded source and must
   >  # be consistent throughout.
   > -SourceBundle = namedtuple(
   > -    'SourceBundle', 'weight source start_position stop_position')
   > +SourceBundle = NamedTuple(
   > +    'SourceBundle',
   > +    [('weight', Optional[float]), ('source', 'BoundedSource'),
   >
   > It seems that weight is non-Optional.
   > ------------------------------
   >
   > In sdks/python/apache_beam/io/range_trackers.py
   > <https://github.com/apache/beam/pull/11038#discussion_r394021216>:
   >
   > > @@ -42,7 +48,7 @@
   >  _LOGGER = logging.getLogger(__name__)
   >
   >
   > -class OffsetRangeTracker(iobase.RangeTracker):
   > +class OffsetRangeTracker(iobase.RangeTracker[int]):
   >    """A 'RangeTracker' for non-negative positions of type 'long'."""
   >
   > s/long/int/
   > ------------------------------
   >
   > In sdks/python/apache_beam/io/range_trackers.py
   > <https://github.com/apache/beam/pull/11038#discussion_r394026348>:
   >
   > >      self._start_position = start_position
   >      self._stop_position = stop_position
   >      self._lock = threading.Lock()
   > -    self._last_claim = self.UNSTARTED
   > +    # the return on investment for properly typing this is low. cast it.
   >
   > Did you mean that the ROI is high, not low?
   > ------------------------------
   >
   > In sdks/python/apache_beam/io/range_trackers.py
   > <https://github.com/apache/beam/pull/11038#discussion_r394027007>:
   >
   > >    def fraction_to_position(self, fraction, start, end):
   > +    # type: (float, Optional[iobase.PositionT], Optional[iobase.PositionT]) ->
Optional[iobase.PositionT]
   >
   > The return value seems to be non-Optional.
   > ------------------------------
   >
   > In sdks/python/apache_beam/io/iobase.py
   > <https://github.com/apache/beam/pull/11038#discussion_r394029843>:
   >
   > > @@ -244,7 +272,7 @@ def is_bounded(self):
   >      return True
   >
   >
   > -class RangeTracker(object):
   > +class RangeTracker(Generic[PositionT]):
   >
   > I'm not sure. Positions might be opaque byte strings, where splitting
   > happens externally. This might be important for cross-language transforms.
   >
   > @lukecwik <https://github.com/lukecwik>, @robertwb
   > <https://github.com/robertwb>, @chamikaramj
   > <https://github.com/chamikaramj> , do you have opinion on RangeTracker
   > position types? Should they all support the Position protocol (defined
   > above)?
   > ------------------------------
   >
   > In sdks/python/apache_beam/io/restriction_trackers.py
   > <https://github.com/apache/beam/pull/11038#discussion_r394034514>:
   >
   > > @@ -52,6 +55,7 @@ def __hash__(self):
   >      return hash((type(self), self.start, self.stop))
   >
   >    def split(self, desired_num_offsets_per_split, min_num_offsets_per_split=1):
   > +    # type: (...) -> Iterator[OffsetRange]
   >
   > Input looks like (int, int). Any reason to leave it empty?
   > ------------------------------
   >
   > In sdks/python/apache_beam/ml/gcp/naturallanguageml.py
   > <https://github.com/apache/beam/pull/11038#discussion_r394034914>:
   >
   > > @@ -74,7 +75,9 @@ def __init__(
   >    def to_dict(document):
   >      # type: (Document) -> Mapping[str, Optional[str]]
   >      if document.from_gcs:
   > -      dict_repr = {'gcs_content_uri': document.content}
   > +      dict_repr = {
   > +          'gcs_content_uri': document.content
   > +      }  # type: Dict[str, Optional[str]]
   >
   > document.content is not Optional
   > ------------------------------
   >
   > In sdks/python/apache_beam/coders/coders.py
   > <https://github.com/apache/beam/pull/11038#discussion_r394038483>:
   >
   > > @@ -387,8 +387,11 @@ def register_structured_urn(urn, cls):
   >      """Register a coder that's completely defined by its urn and its
   >      component(s), if any, which are passed to construct the instance.
   >      """
   > -    cls.to_runner_api_parameter = (
   > -        lambda self, unused_context: (urn, None, self._get_component_coders()))
   > +    setattr(
   >
   > Could you explain (in a comment perhaps) why using setattr here is
   > necessary?
   > ------------------------------
   >
   > In sdks/python/apache_beam/utils/sentinel.py
   > <https://github.com/apache/beam/pull/11038#discussion_r394056361>:
   >
   > > +class Sentinel(enum.Enum):
   > +  """
   > +  A type-safe sentinel class
   > +  """
   > +
   > +  sentinel = object()
   >
   > SG.
   > Sentinel is the type.
   > SPLIT_POINTS_UNKNOWN is the unique value.
   > Inheriting from Enum (vs calling Enum()) simplifies pickling (not sure
   > necessary, but doesn't hurt).
   > ------------------------------
   >
   > In sdks/python/apache_beam/runners/worker/sdk_worker.py
   > <https://github.com/apache/beam/pull/11038#discussion_r394057261>:
   >
   > > @@ -171,7 +172,7 @@ def get_responses():
   >      self._worker_thread_pool.shutdown()
   >      # get_responses may be blocked on responses.get(), but we need to return
   >      # control to its caller.
   > -    self._responses.put(no_more_work)
   > +    self._responses.put(no_more_work)  # type: ignore[arg-type]
   >
   > Sounds like a good idea. There are ~20 places in the code that assign
   > object() as value.
   > One of them is even called READER_THREAD_IS_DONE_SENTINEL. :)
   > ------------------------------
   >
   > In sdks/python/apache_beam/runners/worker/sdk_worker.py
   > <https://github.com/apache/beam/pull/11038#discussion_r394060396>:
   >
   > > @@ -353,6 +359,8 @@ def release(self, instruction_id):
   >      self.cached_bundle_processors[descriptor_id].append(processor)
   >
   >    def shutdown(self):
   > +    # type: (...) -> None
   >
   > This can be () -> None right?
   > Same for the next 2 hints below.
   > ------------------------------
   >
   > In sdks/python/apache_beam/transforms/core.py
   > <https://github.com/apache/beam/pull/11038#discussion_r394072405>:
   >
   > > @@ -1300,12 +1300,13 @@ def to_runner_api_parameter(self, context):
   >            common_urns.requirements.REQUIRES_STATEFUL_PROCESSING.urn)
   >      from apache_beam.runners.common import DoFnSignature
   >      sig = DoFnSignature(self.fn)
   > -    is_splittable = sig.is_splittable_dofn()
   >
   > Not sure if checking get_restriction_coder() return type instead of
   > is_splittable_dofn() is future proof.
   >
   > Also, I don't understand the change, from a mypy correctness perspective.
   >
   > —
   > You are receiving this because you authored the thread.
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/beam/pull/11038#pullrequestreview-375545905>,
   > or unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AAAPOEY43E2LYCWV3TS7KD3RIAW23ANCNFSM4LAZDSSQ>
   > .
   >
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message