flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mihail Vieru <vi...@informatik.hu-berlin.de>
Subject Re: writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE
Date Thu, 02 Jul 2015 19:17:56 GMT
I've implemented the alternating 2 files solution and everything works now.

Thanks a lot! You saved my day :)

Cheers,
Mihail

On 02.07.2015 12:37, Maximilian Michels wrote:
> The problem is that your input and output path are the same. Because 
> Flink executes in a pipelined fashion, all the operators will come up 
> at once. When you set WriteMode.OVERWRITE for the sink, it will delete 
> the path before writing anything. That means that when your DataSource 
> reads the input, there will be nothing to read from. Thus you get an 
> empty DataSet which you write to HDFS again. Any further loops will 
> then just write nothing.
>
> You can circumvent this problem, by prefixing every output file with a 
> counter that you increment in your loop. Alternatively, if you only 
> want to keep the latest output, you can use two files and let them 
> alternate to be input and output.
>
> Let me know if you have any further questions.
>
> Kind regards,
> Max
>
> On Thu, Jul 2, 2015 at 10:20 AM, Maximilian Michels <mxm@apache.org 
> <mailto:mxm@apache.org>> wrote:
>
>     Hi Mihail,
>
>     Thanks for the code. I'm trying to reproduce the problem now.
>
>     On Wed, Jul 1, 2015 at 8:30 PM, Mihail Vieru
>     <vieru@informatik.hu-berlin.de
>     <mailto:vieru@informatik.hu-berlin.de>> wrote:
>
>         Hi Max,
>
>         thank you for your reply. I wanted to revise and dismiss all
>         other factors before writing back. I've attached you my code
>         and sample input data.
>
>         I run the /APSPNaiveJob/ using the following arguments:
>
>         /0 100 hdfs://path/to/vertices-test-100
>         hdfs://path/to/edges-test-100 hdfs://path/to/tempgraph 10 0.5
>         hdfs://path/to/output-apsp 9/
>
>         I was wrong, I originally thought that the first writeAsCsv
>         call (line 50) doesn't work. An exception is thrown without
>         the WriteMode.OVERWRITE when the file exists.
>
>         But the problem lies with the second call (line 74), trying to
>         write to the same path on HDFS.
>
>         This issue is blocking me, because I need to persist the
>         vertices dataset between iterations.
>
>         Cheers,
>         Mihail
>
>         P.S.: I'm using the latest 0.10-SNAPSHOT and HDFS 1.2.1.
>
>
>
>         On 30.06.2015 16:51, Maximilian Michels wrote:
>>         HI Mihail,
>>
>>         Thank you for your question. Do you have a short example that
>>         reproduces the problem? It is hard to find the cause without
>>         an error message or some example code.
>>
>>         I wonder how your loop works without WriteMode.OVERWRITE
>>         because it should throw an exception in this case. Or do you
>>         change the file names on every write?
>>
>>         Cheers,
>>         Max
>>
>>         On Tue, Jun 30, 2015 at 3:47 PM, Mihail Vieru
>>         <vieru@informatik.hu-berlin.de
>>         <mailto:vieru@informatik.hu-berlin.de>> wrote:
>>
>>             I think my problem is related to a loop in my job.
>>
>>             Before the loop, the writeAsCsv method works fine, even
>>             in overwrite mode.
>>
>>             In the loop, in the first iteration, it writes an empty
>>             folder containing empty files to HDFS. Even though the
>>             DataSet it is supposed to write contains elements.
>>
>>             Needless to say, this doesn't occur in a local execution
>>             environment, when writing to the local file system.
>>
>>
>>             I would appreciate any input on this.
>>
>>             Best,
>>             Mihail
>>
>>
>>
>>             On 30.06.2015 12:10, Mihail Vieru wrote:
>>>             Hi Till,
>>>
>>>             thank you for your reply.
>>>
>>>             I have the following code snippet:
>>>
>>>             /intermediateGraph.getVertices().writeAsCsv(tempGraphOutputPath,
>>>             "\n", ";", WriteMode.OVERWRITE);/
>>>
>>>             When I remove the WriteMode parameter, it works. So I
>>>             can reason that the DataSet contains data elements.
>>>
>>>             Cheers,
>>>             Mihail
>>>
>>>
>>>             On 30.06.2015 12:06, Till Rohrmann wrote:
>>>>
>>>>             Hi Mihail,
>>>>
>>>>             have you checked that the |DataSet| you want to write
>>>>             to HDFS actually contains data elements? You can try
>>>>             calling |collect| which retrieves the data to your
>>>>             client to see what’s in there.
>>>>
>>>>             Cheers,
>>>>             Till
>>>>
>>>>             ​
>>>>
>>>>             On Tue, Jun 30, 2015 at 12:01 PM, Mihail Vieru
>>>>             <vieru@informatik.hu-berlin.de
>>>>             <mailto:vieru@informatik.hu-berlin.de>> wrote:
>>>>
>>>>                 Hi,
>>>>
>>>>                 the writeAsCsv method is not writing anything to
>>>>                 HDFS (version 1.2.1) when the WriteMode is set to
>>>>                 OVERWRITE.
>>>>                 A file is created but it's empty. And no trace of
>>>>                 errors in the Flink or Hadoop logs on all nodes in
>>>>                 the cluster.
>>>>
>>>>                 What could cause this issue? I really really need
>>>>                 this feature..
>>>>
>>>>                 Best,
>>>>                 Mihail
>>>>
>>>>
>>>
>>
>>
>
>
>


Mime
View raw message