flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-933) Add an input format to read primitive types directly (not through tuples)
Date Thu, 26 Jun 2014 12:43:26 GMT

    [ https://issues.apache.org/jira/browse/FLINK-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14044619#comment-14044619
] 

ASF GitHub Bot commented on FLINK-933:
--------------------------------------

Github user qmlmoon commented on a diff in the pull request:

    https://github.com/apache/incubator-flink/pull/47#discussion_r14237611
  
    --- Diff: stratosphere-java/src/main/java/eu/stratosphere/api/java/io/PrimitiveInputFormat.java
---
    @@ -0,0 +1,71 @@
    +/***********************************************************************************************************************
    + *
    + * Copyright (C) 2010-2013 by the Stratosphere project (http://stratosphere.eu)
    + *
    + * Licensed under the Apache License, Version 2.0 (the "License"); you may not use this
file except in compliance with
    + * the License. You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software distributed under
the License is distributed on
    + * an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied. See the License for the
    + * specific language governing permissions and limitations under the License.
    + *
    + **********************************************************************************************************************/
    +package eu.stratosphere.api.java.io;
    +
    +import eu.stratosphere.api.common.io.DelimitedInputFormat;
    +import eu.stratosphere.core.fs.Path;
    +import eu.stratosphere.types.parser.FieldParser;
    +import eu.stratosphere.util.InstantiationUtil;
    +
    +/**
    + * An input format that reads single field primitive data from a given file. The difference
between this and
    + * {@link eu.stratosphere.api.java.io.CsvInputFormat} is that it won't go through {@link
eu.stratosphere.api.java.tuple.Tuple1}.
    + */
    +public class PrimitiveInputFormat<OT> extends DelimitedInputFormat<OT> {
    +
    +	private Class<OT> primitiveClass;
    +
    +	private static final byte CARRIAGE_RETURN = (byte) '\r';
    +
    +	private static final byte NEW_LINE = (byte) '\n';
    +
    +
    +	public PrimitiveInputFormat(Path filePath, Class<OT> primitiveClass) {
    +		super(filePath);
    +		Class<? extends FieldParser<OT>> parserType = FieldParser.getParserForType(primitiveClass);
    +		if (parserType == null) {
    +			throw new IllegalArgumentException("The type '" + primitiveClass.getName() + "' is
not supported for the primitive input format.");
    +		}
    +		this.primitiveClass = primitiveClass;
    +	}
    +
    +	public PrimitiveInputFormat(Path filePath, char delimiter, Class<OT> primitiveClass)
{
    +		super(filePath);
    +		Class<? extends FieldParser<OT>> parserType = FieldParser.getParserForType(primitiveClass);
    +		if (parserType == null) {
    +			throw new IllegalArgumentException("The type '" + primitiveClass.getName() + "' is
not supported for the primitive input format.");
    +		}
    +		this.primitiveClass = primitiveClass;
    +		this.setDelimiter(delimiter);
    +	}
    +
    +
    +	@Override
    +	public OT readRecord(OT reuse, byte[] bytes, int offset, int numBytes) {
    +		//Check if \n is used as delimiter and the end of this line is a \r, then remove \r
from the line
    +		if (this.getDelimiter() != null && this.getDelimiter().length == 1
    +			&& this.getDelimiter()[0] == NEW_LINE && offset+numBytes >= 1
    +			&& bytes[offset+numBytes-1] == CARRIAGE_RETURN){
    +			numBytes -= 1;
    +		}
    +
    +		Class<? extends FieldParser<OT>> parserType = FieldParser.getParserForType(this.primitiveClass);
    +		@SuppressWarnings("unchecked")
    +		FieldParser<OT> p = (FieldParser<OT>) InstantiationUtil.instantiate(parserType,
FieldParser.class);
    +		p.parseField(bytes, offset, numBytes + offset, (char) this.getDelimiter()[0], reuse);
    --- End diff --
    
    thanks, good point! I think it should in open method


> Add an input format to read primitive types directly (not through tuples)
> -------------------------------------------------------------------------
>
>                 Key: FLINK-933
>                 URL: https://issues.apache.org/jira/browse/FLINK-933
>             Project: Flink
>          Issue Type: Bug
>            Reporter: Stephan Ewen
>            Assignee: Mingliang Qi
>            Priority: Minor
>              Labels: easyfix, features, starter
>
> Right now, reading primitive types goes either through custom formats (work intensive),
or through CSV inputs. The latter return tuples.
> To read a sequence of primitives, you need to go though Tuple1, which is clumsy.
> I would suggest to add an input format to read primitive types line wise (or otherwise
delimited), and also add a method to the environment for that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message