spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "bluejoe (JIRA)" <>
Subject [jira] [Created] (SPARK-22936) providing HttpStreamSource and HttpStreamSink
Date Tue, 02 Jan 2018 06:00:00 GMT
bluejoe created SPARK-22936:

             Summary: providing HttpStreamSource and HttpStreamSink
                 Key: SPARK-22936
             Project: Spark
          Issue Type: New Feature
          Components: Structured Streaming
    Affects Versions: 2.1.0
            Reporter: bluejoe

Hi, in my project I completed a spark-http-stream, which is now available on
I am thinking if it is useful to others and is ok to be integrated as a part of Spark.

spark-http-stream transfers Spark structured stream over HTTP protocol. Unlike tcp streams,
Kafka streams and HDFS file streams, http streams often flow across distributed big data centers
on the Web. This feature is very helpful to build global data processing pipelines across
different data centers (scientific research institutes, for example) who own separated data

The following code shows how to load messages from a HttpStreamSource:

{{val lines = spark.readStream.format(classOf[HttpStreamSourceProvider].getName)
	.option("httpServletUrl", "http://localhost:8080/xxxx")
	.option("topic", "topic-1");
	.option("includesTimestamp", "true")

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message