tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thejan Wijesinghe (JIRA)" <j...@apache.org>
Subject [jira] [Created] (TIKA-2720) A parser to output universal sentence encodings to text
Date Sun, 02 Sep 2018 20:52:00 GMT
Thejan Wijesinghe created TIKA-2720:

             Summary: A parser to output universal sentence encodings to text
                 Key: TIKA-2720
                 URL: https://issues.apache.org/jira/browse/TIKA-2720
             Project: Tika
          Issue Type: New Feature
          Components: tika-dl
            Reporter: Thejan Wijesinghe
             Fix For: 2.0

This parser encodes a text into high dimensional vectors that can be used for text classification,
semantic similarity, clustering and other natural language tasks. The model is trained and
optimized for greater-than-word length text, such as sentences, phrases or short paragraphs.
It is trained on a variety of data sources and a variety of tasks with the aim of dynamically
accommodating a wide variety of natural language understanding tasks. The input is variable
length English text and the output is a 512 dimensional vector.

This message was sent by Atlassian JIRA

View raw message