tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ann Burgess" <anniebry...@gmail.com>
Subject Re: Review Request 22246: New parser for Matlab .mat files
Date Thu, 05 Jun 2014 23:23:55 GMT


> On June 4, 2014, 11:25 p.m., Matthias Krueger wrote:
> > The Matlab MIME types used seem to be application/x-matlab-data or application/matlab-mat.
> > 
> > Would it make sense to add them to the mime XML for detection?
> > 
> > <mime-type type="application/x-matlab-data">
> >   <comment>MATLAB data file</comment>
> >   <alias type="application/matlab-mat"/>
> >   <magic priority="50">
> >     <match value="MATLAB" type="string" offset="0"/>
> >   </magic>
> >   <glob pattern="*.mat"/>
> > </mime-type>
> > 
> >
> 
> Chris Mattmann wrote:
>     +1 this makes a ton of sense to add IMO.
> 
> Nick Burch wrote:
>     There's some odd whitespace going on - we normally use 4 spaces and no tabs.
>     
>     When outputting the variables, it would probably make sense to put each one into
either a paragraph or a list, so that we get helpful output in html mode as well as text mode
>     
>     With that in place, it would then be possible to have a unit test that checked the
html output, as well as the current text one
>     
>     Also on testing, I think at least some of the tests have an implementation of assertContains,
which generally gives a more helpful failure message than assertTrue(s.contains(...)) does,
might be worth looking into that?
> 
> Ann Burgess wrote:
>     Great input - thank you! I will integrate both and upload the diff.
> 
> Matthias Krueger wrote:
>     This is on a good way, some quick additional comments:
>     * I tested with the files in https://github.com/scipy/scipy/tree/master/scipy/io/matlab/tests/data.
JMatIO only support MATLAB 5 files. This could be added as a quick comment or javadoc.
>     * I think Tika is based on JDK 1.6. I don't see a reason for the test to take care
and always just return-succeeding on JDK 1.5.

+1 Matthias. 


- Ann


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22246/#review44773
-----------------------------------------------------------


On June 4, 2014, 10:23 p.m., Ann Burgess wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22246/
> -----------------------------------------------------------
> 
> (Updated June 4, 2014, 10:23 p.m.)
> 
> 
> Review request for tika and Chris Mattmann.
> 
> 
> Repository: tika
> 
> 
> Description
> -------
> 
> This is a new parser for Matlab .mat files.  The parser utilizes the JmatIO, Matlab's
MAT-file I/O API in JAVA. JmatIO is available through Maven Central.  The text output from
this parser provides variable names and dimensions that are both inside and outside of data
structures, but does NOT provide the actual data values within each .mat file. 
> 
> 
> Diffs
> -----
> 
> 
> Diff: https://reviews.apache.org/r/22246/diff/
> 
> 
> Testing
> -------
> 
> Successfully run a basic unit test that checks both --text and --metadata parser output.
 
> 
> 
> File Attachments
> ----------------
> 
> Parser File
>   https://reviews.apache.org/media/uploaded/files/2014/06/04/cb39636d-ec53-4fbc-b348-6a4db8907f6b__MatParser.java
> Unit Test
>   https://reviews.apache.org/media/uploaded/files/2014/06/04/bbff8c6b-caa1-4830-b441-532c28c3c78e__MatParserTest.java
> 
> 
> Thanks,
> 
> Ann Burgess
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message