flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-1520) Read edges and vertices from CSV files
Date Mon, 13 Jul 2015 08:57:04 GMT

    [ https://issues.apache.org/jira/browse/FLINK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14624390#comment-14624390
] 

ASF GitHub Bot commented on FLINK-1520:
---------------------------------------

Github user andralungu commented on the pull request:

    https://github.com/apache/flink/pull/847#issuecomment-120856171
  
    Hi,
    
    I just had a closer look at this PR and it made me seriously question the utility of a
`Graph.fromCSV` method. Why? First of all because it's more limited than the regular `env.fromCsv()`
in the sense that it does not allow POJOs and it would be a bit tedious to support that. There
would be a need for methods with 2 to n fields, according to the amount of attributes present
in the POJO. 
    
    Second, because, and I am speaking strictly as a user here, I would rather write:
    private static DataSet<Edge<Long, Double>> getEdgesDataSet(ExecutionEnvironment
env) {
    
    		if(fileOutput) {
    			return env.readCsvFile(edgeInputPath)
    					.ignoreComments("#")
    					.fieldDelimiter("\t")
    					.lineDelimiter("\n")
    					.types(Long.class, Long.class, Double.class)
    					.map(new Tuple3ToEdgeMap<Long, Double>());
    		} else {
    			return CommunityDetectionData.getDefaultEdgeDataSet(env);
    		}
    	}
    
    than...
    
    private static Graph<Long, Long, Double> getGraph(ExecutionEnvironment env) {
    		Graph<Long, Long, Double> graph;
    		if(!fileOutput) {
    			DataSet<Edge<Long, Double>> edges = CommunityDetectionData.getDefaultEdgeDataSet(env);
    			graph = Graph.fromDataSet(edges,
    					new MapFunction<Long, Long>() {
    
    						public Long map(Long label) {
    							return label;
    						}
    					}, env);
    		} else {
    			graph = Graph.fromCsvReader(edgeInputPath,new MapFunction<Long, Long>() {
    				public Long map(Long label) {
    					return label;
    				}
    			}, env).ignoreCommentsEdges("#")
    					.fieldDelimiterEdges("\t")
    					.lineDelimiterEdges("\n")
    					.typesEdges(Long.class, Double.class)
    					.typesVertices(Long.class, Long.class);
    		}
    		return graph;
    	}
    
    Maybe it's just a preference thing... but I believe it's at least worth a discussion.
On the other hand, the utility of such a method should have been questioned from its early
Jira days, so I guess that's my mistake.
    
    I would like to hear your thoughts on this. 
    Thanks!


> Read edges and vertices from CSV files
> --------------------------------------
>
>                 Key: FLINK-1520
>                 URL: https://issues.apache.org/jira/browse/FLINK-1520
>             Project: Flink
>          Issue Type: New Feature
>          Components: Gelly
>            Reporter: Vasia Kalavri
>            Assignee: Shivani Ghatge
>            Priority: Minor
>              Labels: easyfix, newbie
>
> Add methods to create Vertex and Edge Datasets directly from CSV file inputs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message