tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2720) A parser to output universal sentence encodings to text
Date Sun, 02 Sep 2018 22:38:00 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601688#comment-16601688

ASF GitHub Bot commented on TIKA-2720:

ThejanW commented on issue #248: Fix for TIKA-2720 [WIP]
URL: https://github.com/apache/tika/pull/248#issuecomment-417964712
   **The set of test sentences are as follows, (consider the "About.." part as the topic of
the particular sentence group)**
   About age
   > How old are you?
   > What is your age?
   > How old did you turn?
   > When is your birthday?
   About smart phones
   > The Samsung Galaxy S10 has the potential to be the most exciting phone of 2019
   > Android beats iOS in smartphone loyalty, study finds
   > IPhone X includes a 5.8-inch edge-to-edge display which covers the entire front of
the phone.
   > Apple became the world’s first trillion-dollar public company
   About weather
   > With roads covered with slippery snow and ice, can challenge even the most experienced
   > Heavy rain slammed the mid-Atlantic United States on Monday, delaying flights, forming
   > News showed, violent floodwaters surging down main Streets
   > Recently a lot of hurricanes have hit the US
   > Multiple lines of scientific evidence show that the climate system is warming
   About health
   > An ounce of prevention is worth a pound of cure
   > Green tea contains bioactive compounds that improve health
   > Yoga has been shown to help people reduce anxiety
   > Is paleo better than keto?

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

> A parser to output universal sentence encodings to text
> -------------------------------------------------------
>                 Key: TIKA-2720
>                 URL: https://issues.apache.org/jira/browse/TIKA-2720
>             Project: Tika
>          Issue Type: New Feature
>          Components: tika-dl
>            Reporter: Thejan Wijesinghe
>            Priority: Major
>             Fix For: 2.0
> This parser encodes a text into high dimensional vectors that can be used for text classification,
semantic similarity, clustering and other natural language tasks. The model is trained and
optimized for greater-than-word length text, such as sentences, phrases or short paragraphs.
It is trained on a variety of data sources and a variety of tasks with the aim of dynamically
accommodating a wide variety of natural language understanding tasks. The input is variable
length English text and the output is a 512 dimensional vector.

This message was sent by Atlassian JIRA

View raw message