tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2309) New Detector and Parser classes for Time Stamped Data Envelope file format
Date Tue, 04 Apr 2017 15:11:41 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15955231#comment-15955231
] 

ASF GitHub Bot commented on TIKA-2309:
--------------------------------------

Shinobi75 commented on issue #161: fix for TIKA-2309 contributed by Shinobi@75
URL: https://github.com/apache/tika/pull/161#issuecomment-291531208
 
 
   @tballison, Ok you're right. TSD is actually a crypto wrapper format for any other type
of data files. I've tried to create a private method inside the TSDParser class to extract
the metadata of the embedded TSD file: 
   
       private void parseTSDContent(InputStream stream, ContentHandler handler, 
       		                     Metadata metadata, ParseContext context) {
       	
   	    EmbeddedDocumentExtractor embeddedDocumentExtractor = 
   					              new ParsingEmbeddedDocumentExtractor(context);
   	    
   		if(embeddedDocumentExtractor.shouldParseEmbedded(metadata)) {
   		   try(InputStream is = TikaInputStream.get(new CMSTimeStampedData(stream).getContent()))
{
   			   embeddedDocumentExtractor.parseEmbedded(is, handler, metadata, true);
   		   } catch(Exception ex) {
   			 LOG.error("Error in TSDParser.parseTSDContent ", ex.getMessage());
   		   }
   		}
       }
   
   but the metadata map, after the parseEmbedded method call, contains the same data before
the call. Do you intend to call the EmbeddedDocumentExtractor inside TSDParser class or do
you mean to call EmbeddedDocumentExtractor for test purpose inside TSDParserTest class?
   
   You can find the updated code in TIKA-2309 branch.
   
   Thank you for your patience
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> New Detector and Parser classes for Time Stamped Data Envelope file format
> --------------------------------------------------------------------------
>
>                 Key: TIKA-2309
>                 URL: https://issues.apache.org/jira/browse/TIKA-2309
>             Project: Tika
>          Issue Type: Improvement
>          Components: detector, parser
>    Affects Versions: 1.13, 1.14
>            Reporter: Fabio
>            Priority: Minor
>         Attachments: MANIFEST.XML.TSD
>
>
> Hello,
> I'm Fabio Evangelista from Rome. I'm working for an italian Public Administration company
and i'm using Apache Tika in my Java applications to detect and parse a broad kinds of file
formats. During that activity, after following your good guide on Tika project page, I've
made with success new type of Detector and Parser classes for a particular crypto timestamp
type with these caracteristics:
> Format name:               Time Stamped Data Envelope
> Mime Type:                   application/timestamped-data
> File extension:              .tsd
> TSD file hax magic code at the start of the file:   30 80 06 0B 2A 86 48 86 F7
> I've integrated and tested successfully with my applications those new classes in Tika
1.13 tika-core.jar and tika-parsers.jar. What should I do to submit my new classes to you?
Should I to push those in a particular git branch or, is there a particular process to follow
to submit my classes?
> Thank you for you patience and best regards.
> Fabio.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message