spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kimahriman <>
Subject sortWithinPartitions in Structured Streaming
Date Wed, 08 Apr 2020 12:11:16 GMT
Currently, all sorting is disallowed with structured streaming queries. Not
allowing global sorting makes sense, as you can't sort an infinite list, but
could non-global sorting (i.e. sortWithinPartitions) be allowed? I'm running
into this with an external source I'm using, but not sure if this would be
useful to file sources as well. I have to foreachBatch so that I can do a

Two main questions:
- Does a local sort cause issues with any exactly-once guarantees streaming
queries provides? I can't say I know or understand how these semantics work.
Or are there other issues I can't think of this would cause?

- Is the change as simple as changing the unsupported operations check to
only look for global sorts instead of all sorts?

The only other discussion on this topic I found is  here

, which suggested the local sort might be something to consider allowing in
structured streaming.

Sent from:

To unsubscribe e-mail:

View raw message