From user-return-12475-apmail-spark-user-archive=spark.apache.org@spark.apache.org Thu Jul 24 01:43:59 2014 Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BE5C4110E7 for ; Thu, 24 Jul 2014 01:43:59 +0000 (UTC) Received: (qmail 98251 invoked by uid 500); 24 Jul 2014 01:43:58 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 98183 invoked by uid 500); 24 Jul 2014 01:43:58 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@spark.apache.org Delivered-To: mailing list user@spark.apache.org Received: (qmail 98169 invoked by uid 99); 24 Jul 2014 01:43:58 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Jul 2014 01:43:58 +0000 X-ASF-Spam-Status: No, hits=2.3 required=5.0 tests=SPF_SOFTFAIL,URI_HEX X-Spam-Check-By: apache.org Received-SPF: softfail (nike.apache.org: transitioning domain of bfalls@outlook.com does not designate 216.139.236.26 as permitted sender) Received: from [216.139.236.26] (HELO sam.nabble.com) (216.139.236.26) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Jul 2014 01:43:56 +0000 Received: from ben.nabble.com ([192.168.236.152]) by sam.nabble.com with esmtp (Exim 4.72) (envelope-from ) id 1XA84B-0001GB-9X for user@spark.incubator.apache.org; Wed, 23 Jul 2014 18:43:31 -0700 Date: Wed, 23 Jul 2014 18:43:31 -0700 (PDT) From: Barnaby To: user@spark.incubator.apache.org Message-ID: <1406166211271-10557.post@n3.nabble.com> Subject: streaming sequence files? MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org If I save an RDD as a sequence file such as: val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _) wordCounts.foreachRDD( d => { d.saveAsSequenceFile("tachyon://localhost:19998/files/WordCounts-" + (new SimpleDateFormat("yyyyMMdd-HHmmss") format Calendar.getInstance.getTime).toString) }) How can I use these results in another Spark app since there is no StreamingContext.sequenceFileStream()? Or, What is the best way to save RDDs of objects to files in one streaming app so that another app can stream those files in? Basically, reuse partially reduced RDDs for further processing so that it doesn't have to be done more than once. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/streaming-sequence-files-tp10557.html Sent from the Apache Spark User List mailing list archive at Nabble.com.