tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris A. Mattmann (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-1580) ISA-Tab parsers
Date Sat, 28 Mar 2015 21:51:52 GMT

    [ https://issues.apache.org/jira/browse/TIKA-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14385541#comment-14385541

Chris A. Mattmann commented on TIKA-1580:

Committed in r1669839.

Thank you [~gostep] you did amazing on this!

[chipotle:~/tmp/tika] mattmann% svn commit -m "Fix for TIKA-1580: Support IsaTab MIME identification
and parsing. Thanks to Giuseppe Totaro for all the great work!"
Sending        CHANGES.txt
Sending        tika-bundle/pom.xml
Sending        tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
Sending        tika-parsers/pom.xml
Adding         tika-parsers/src/main/java/org/apache/tika/parser/isatab
Adding         tika-parsers/src/main/java/org/apache/tika/parser/isatab/ISATabUtils.java
Adding         tika-parsers/src/main/java/org/apache/tika/parser/isatab/ISArchiveParser.java
Sending        tika-parsers/src/main/resources/META-INF/services/org.apache.tika.parser.Parser
Adding         tika-parsers/src/test/java/org/apache/tika/parser/isatab
Adding         tika-parsers/src/test/java/org/apache/tika/parser/isatab/ISArchiveParserTest.java
Adding         tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1
Adding         tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_bii-s-2_metabolite
profiling_NMR spectroscopy.txt
Adding         tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_metabolome.txt
Adding         tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_microarray.txt
Adding         tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_proteome.txt
Adding         tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/a_transcriptome.txt
Adding         tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/i_investigation.txt
Adding         tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/s_BII-S-1.txt
Adding         tika-parsers/src/test/resources/test-documents/testISATab_BII-I-1/s_BII-S-2.txt
Transmitting file data ................
Committed revision 1669839.
[chipotle:~/tmp/tika] mattmann% 

> ISA-Tab parsers
> ---------------
>                 Key: TIKA-1580
>                 URL: https://issues.apache.org/jira/browse/TIKA-1580
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>            Reporter: Giuseppe Totaro
>            Assignee: Chris A. Mattmann
>            Priority: Minor
>              Labels: new-parser
>             Fix For: 1.8
>         Attachments: TIKA-1580.Mattmann.Totaro.032515.patch.txt, TIKA-1580.patch, TIKA-1580.v02.patch,
> We are going to add parsers for ISA-Tab data formats.
> ISA-Tab files are related to [ISA Tools|http://www.isa-tools.org/] which help to manage
an increasingly diverse set of life science, environmental and biomedical experiments that
employing one or a combination of technologies.
> The ISA tools are built upon _Investigation_, _Study_, and _Assay_ tabular format. Therefore,
ISA-Tab data format includes three types of file: Investigation file ({{a_xxxx.txt}}), Study
file ({{s_xxxx.txt}}), Assay file ({{a_xxxx.txt}}). These files are organized as [top-down
hierarchy|http://www.isa-tools.org/format/specification/]: An Investigation file includes
one or more Study files: each Study files includes one or more Assay files.
> Essentially, the Investigation files contains high-level information about the related
study, so it provides only metadata about ISA-Tab files.
> More details on file format specification are [available online|http://isatab.sourceforge.net/docs/ISA-TAB_release-candidate-1_v1.0_24nov08.pdf].
> The patch in attachment provides a preliminary version of ISA-Tab parsers (there are
three parsers; one parser for each ISA-Tab filetype):
> * {{ISATabInvestigationParser.java}}: parses Investigation files. It extracts only metadata.
> * {{ISATabStudyParser.java}}: parses Study files.
> * {{ISATabAssayParser.java}}: parses Assay files.
> The most important improvements are:
> * Combine these three parsers in order to parse an ISArchive
> * Provide a better mapping of both study and assay data on XHML. Currently, {{ISATabStudyParser}}
and {{ISATabAssayParser}} provide a naive mapping function relying on [Apache Commons CSV|https://commons.apache.org/proper/commons-csv/].
> Thanks for supporting me on this work [~chrismattmann]. 

This message was sent by Atlassian JIRA

View raw message