FYI: Filed https://issues.apache.org/jira/browse/SPARK-24466 and provided the patch https://github.com/apache/spark/pull/21497

2018년 6월 5일 (화) 오전 11:30, Jungtaek Lim <kabhwan@gmail.com>님이 작성:
Yeah that's why I initiated this thread, especially socket source is expected to be used from examples on official document or some experiments, which we tend to simply use netcat.

I'll file an issue and provide the fix.

2018년 6월 5일 (화) 오전 1:48, Joseph Torres <joseph.torres@databricks.com>님이 작성:
I tend to agree that this is a bug. It's kinda silly that nc does this, but a socket connector that doesn't work with netcat will surely seem broken to users. It wouldn't be a huge change to defer opening the socket until a read is actually required.

On Sun, Jun 3, 2018 at 9:55 PM, Jungtaek Lim <kabhwan@gmail.com> wrote:
Hi devs,

Not sure I can hear back the response sooner since Spark summit is just around the corner, but just would want to post and wait.

While playing with Spark 2.4.0-SNAPSHOT, I found nc command exits before reading actual data so the query also exits with error.

The reason is due to launching temporary reader for reading schema, and closing reader, and re-opening reader. While reliable socket server should be able to handle this without any issue, nc command normally can't handle multiple connections and simply exits when closing temporary reader.

I would like to file an issue and contribute on fixing this if we think this is a bug (otherwise we need to replace nc utility with another one, maybe our own implementation?), but not sure we are happy to apply workaround for specific source. 

Would like to hear opinions before giving a shot.

Thanks,
Jungtaek Lim (HeartSaVioR)