tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2648) mime detection based on resource name detects resources as "text/x-php" instead of "text/html"
Date Wed, 23 May 2018 16:20:00 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487569#comment-16487569
] 

ASF GitHub Bot commented on TIKA-2648:
--------------------------------------

GerardBouchar opened a new pull request #236: TIKA-2648 : detect interpreted server-side scripting
languages
URL: https://github.com/apache/tika/pull/236
 
 
   mime detection based on resource name used to detect
   the mime-type of "http://example.com/test.php" as being "text/x-php"
   whereas given such an URL, the file extension doesn't give
   us any information about the mime type that will be returned
   by the server

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> mime detection based on resource name detects resources as "text/x-php" instead of "text/html"

> -----------------------------------------------------------------------------------------------
>
>                 Key: TIKA-2648
>                 URL: https://issues.apache.org/jira/browse/TIKA-2648
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Gerard Bouchar
>            Priority: Major
>
> When using tika to detect a mime type given only an URL containing ".php" and a content-type
hint of "text/html", it guesses "text/x-php", whereas one could expect "text/html".
> {code}
> TikaConfig tika = new TikaConfig();
> Metadata metadata = new Metadata();
> String url = "https://www.facebook.com/home.php";
> metadata.set(Metadata.RESOURCE_NAME_KEY, url);
> metadata.set(Metadata.CONTENT_TYPE, "text/html");
> MediaType type = tika.getDetector().detect(null, metadata);
> System.out.println(url + " is of type " + type.toString());
> // Prints https://www.facebook.com/home.php is of type text/x-php
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message