crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Wills (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CRUNCH-97) Add helpers for parsing PCollection<String> instances
Date Sun, 02 Dec 2012 23:59:58 GMT

     [ https://issues.apache.org/jira/browse/CRUNCH-97?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Josh Wills updated CRUNCH-97:
-----------------------------

    Attachment: CRUNCH-97v4.patch

[~mafr] my interpretation of the Tokenizer idea: I made the ScannerFactory into a TokenizerFactory,
where my Tokenizer is just a wrapper for a Scanner that knows whether it should bypass certain
fields when it is called. Ends up being less typing for the default case (no need to specify
indices for the extractors) while still supporting your use case.
                
> Add helpers for parsing PCollection<String> instances
> -----------------------------------------------------
>
>                 Key: CRUNCH-97
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-97
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Josh Wills
>            Assignee: Josh Wills
>             Fix For: 0.5.0
>
>         Attachments: CRUNCH-97.patch, CRUNCH-97-take2.patch, CRUNCH-97-Tokenizer-v1.patch,
CRUNCH-97v3.patch, CRUNCH-97v4.patch
>
>
> We should make it a bit easier to parse delimited text files into specific data types
(e.g., ints, floats, etc.) or combinations of types-- e.g., pairs of strings and ints, a Tuple3
of booleans, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message