tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2906) Modularize tika-eval's language stats from the application
Date Tue, 13 Aug 2019 17:43:00 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906448#comment-16906448
] 

Hudson commented on TIKA-2906:
------------------------------

SUCCESS: Integrated in Jenkins build tika-2.x-windows #449 (See [https://builds.apache.org/job/tika-2.x-windows/449/])
TIKA-2906 -- prep update common tokens files with termfreqs (tallison: rev a495473afb2e7ec9ff0a6d76809dc40401fe3b27)
* (edit) tika-eval/src/main/resources/common_tokens/lit
* (edit) tika-eval/src/main/resources/common_tokens/uig
* (edit) tika-eval/src/main/resources/common_tokens/bre
* (edit) tika-eval/src/main/resources/common_tokens/deu
* (edit) tika-eval/src/main/resources/common_tokens/nob
* (edit) tika-eval/src/main/resources/common_tokens/swe
* (edit) tika-eval/src/main/resources/common_tokens/mlg
* (edit) tika-eval/src/main/resources/common_tokens/ben
* (edit) tika-eval/src/main/resources/common_tokens/hrv
* (edit) tika-eval/src/main/resources/common_tokens/heb
* (edit) tika-eval/src/main/resources/common_tokens/pol
* (edit) tika-eval/src/main/resources/common_tokens/bos
* (edit) tika-eval/src/main/resources/common_tokens/swa
* (edit) tika-eval/src/main/resources/common_tokens/dan
* (edit) tika-eval/src/main/resources/common_tokens/ces
* (edit) tika-eval/src/main/resources/common_tokens/fao
* (edit) tika-eval/src/main/resources/common_tokens/mlt
* (edit) tika-eval/src/main/resources/common_tokens/mal
* (edit) tika-eval/src/main/resources/common_tokens/slk
* (edit) tika-eval/src/main/resources/common_tokens/srp
* (edit) tika-eval/src/main/resources/common_tokens/hun
* (edit) tika-eval/src/main/resources/common_tokens/guj
* (edit) tika-eval/src/main/resources/common_tokens/ltz
* (edit) tika-eval/src/main/resources/common_tokens/ckb
* (edit) tika-eval/src/main/resources/common_tokens/kin
* (edit) tika-eval/src/main/resources/common_tokens/eng
* (edit) tika-eval/src/main/resources/common_tokens/kir
* (edit) tika-eval/src/main/resources/common_tokens/pes
* (edit) tika-eval/src/main/resources/common_tokens/bul
* (edit) tika-eval/src/main/resources/common_tokens/lug
* (edit) tika-eval/src/main/resources/common_tokens/oci
* (edit) tika-eval/src/main/resources/common_tokens/min
* (edit) tika-eval/src/main/resources/common_tokens/lim
* (edit) tika-eval/src/main/resources/common_tokens/snd
* (edit) tika-eval/src/main/resources/common_tokens/nds
* (edit) tika-eval/src/main/resources/common_tokens/sun
* (edit) tika-eval/src/main/resources/common_tokens/msa
* (edit) tika-eval/src/main/resources/common_tokens/est
* (edit) tika-eval/src/main/resources/common_tokens/lvs
* (edit) tika-eval/src/main/resources/common_tokens/mkd
* (edit) tika-eval/src/main/resources/common_tokens/ori
* (edit) tika-eval/src/main/resources/common_tokens/kaz
* (edit) tika-eval/src/main/resources/common_tokens/pan
* (edit) tika-eval/src/main/resources/common_tokens/cym
* (edit) tika-eval/src/main/resources/common_tokens/sin
* (edit) tika-eval/src/main/resources/common_tokens/tam
* (edit) tika-eval/src/main/resources/common_tokens/bak
* (edit) tika-eval/src/main/resources/common_tokens/kor
* (edit) tika-eval/src/main/resources/common_tokens/epo
* (edit) tika-eval/src/main/resources/common_tokens/jpn
* (edit) tika-eval/src/main/resources/common_tokens/isl
* (edit) tika-eval/src/main/resources/common_tokens/fry
* (edit) tika-eval/src/main/resources/common_tokens/tel
* (edit) tika-eval/src/main/resources/common_tokens/tgl
* (edit) tika-eval/src/main/resources/common_tokens/urd
* (edit) tika-eval/src/main/resources/common_tokens/san
* (edit) tika-eval/src/main/resources/common_tokens/ast
* (edit) tika-eval/src/main/resources/common_tokens/rus
* (edit) tika-eval/src/main/resources/common_tokens/div
* (edit) tika-eval/src/main/resources/common_tokens/uzb
* (edit) tika-eval/src/main/resources/common_tokens/hat
* (edit) tika-eval/src/main/resources/common_tokens/cmn
* (edit) tika-eval/src/main/resources/common_tokens/war
* (edit) tika-eval/src/main/resources/common_tokens/bel
* (edit) tika-eval/src/main/resources/common_tokens/fra
* (edit) tika-eval/src/main/resources/common_tokens/ind
* (edit) tika-eval/src/main/resources/common_tokens/ukr
* (edit) tika-eval/src/main/resources/common_tokens/yid
* (edit) tika-eval/src/main/resources/common_tokens/vie
* (edit) tika-eval/src/main/resources/common_tokens/kur
* (edit) tika-eval/src/main/resources/common_tokens/som
* (edit) tika-eval/src/main/resources/common_tokens/ara
* (edit) tika-eval/src/main/resources/common_tokens/nno
* (edit) tika-eval/src/main/resources/common_tokens/mar
* (edit) tika-eval/src/main/resources/common_tokens/lav
* (edit) tika-eval/src/main/resources/common_tokens/pus
* (edit) tika-eval/src/main/resources/common_tokens/afr
* (edit) tika-eval/src/main/resources/common_tokens/nld
* (edit) tika-eval/src/main/resources/common_tokens/spa
* (edit) tika-eval/src/main/resources/common_tokens/aze
* (edit) tika-eval/src/main/resources/common_tokens/fin
* (edit) tika-eval/src/main/resources/common_tokens/tur
* (edit) tika-eval/src/main/resources/common_tokens/lat
* (edit) tika-eval/src/main/resources/common_tokens/ell
* (edit) tika-eval/src/main/resources/common_tokens/nep
* (edit) tika-eval/src/main/resources/common_tokens/tat
* (edit) tika-eval/src/main/resources/common_tokens/nan
* (edit) tika-eval/src/main/resources/common_tokens/vol
* (edit) tika-eval/src/main/resources/common_tokens/gsw
* (edit) tika-eval/src/main/resources/common_tokens/tuk
* (edit) tika-eval/src/main/resources/common_tokens/eus
* (edit) tika-eval/src/main/resources/common_tokens/azj
* (edit) tika-eval/src/main/resources/common_tokens/kat
* (edit) tika-eval/src/main/resources/common_tokens/ceb
* (edit) tika-eval/src/main/resources/common_tokens/fas
* (edit) tika-eval/src/main/resources/common_tokens/che
* (edit) tika-eval/src/main/resources/common_tokens/mon
* (edit) tika-eval/src/main/resources/common_tokens/plt
* (edit) tika-eval/src/main/resources/common_tokens/hin
* (edit) tika-eval/src/main/resources/common_tokens/glg
* (edit) tika-eval/src/main/resources/common_tokens/jav
* (edit) tika-eval/src/main/resources/common_tokens/mri
* (edit) tika-eval/src/main/resources/common_tokens/sqi
* (edit) tika-eval/src/main/resources/common_tokens/zul
* (edit) tika-eval/src/main/resources/common_tokens/hye
* (edit) tika-eval/src/main/resources/common_tokens/tgk
* (edit) tika-eval/src/main/resources/common_tokens/slv
* (edit) tika-eval/src/main/resources/common_tokens/kan
* (edit) tika-eval/src/main/resources/common_tokens/ban
* (edit) tika-eval/src/main/resources/common_tokens/cat
* (edit) tika-eval/src/main/resources/common_tokens/mhr
* (edit) tika-eval/src/main/resources/common_tokens/gle
* (edit) tika-eval/src/main/resources/common_tokens/por
* (edit) tika-eval/src/main/resources/common_tokens/asm
* (edit) tika-eval/src/main/resources/common_tokens/ita
* (edit) tika-eval/src/main/resources/common_tokens/tha
* (edit) tika-eval/src/main/resources/common_tokens/xho
* (edit) tika-eval/src/main/resources/common_tokens/ekk
* (edit) tika-eval/src/main/resources/common_tokens/amh
* (edit) tika-eval/src/main/resources/common_tokens/pnb
* (edit) tika-eval/src/main/resources/common_tokens/ron


> Modularize tika-eval's language stats from the application
> ----------------------------------------------------------
>
>                 Key: TIKA-2906
>                 URL: https://issues.apache.org/jira/browse/TIKA-2906
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Assignee: Tim Allison
>            Priority: Major
>             Fix For: 1.23
>
>
> Tika-eval's language stats are tightly coupled to the application and the initial workflow
of running against a directory of extracts and reporting info to an H2 db.
> It would be helpful for large-scale data processing pipelines to modularize some of tika-eval's
stats so that they can be applied to, e.g. a full Solr/ES cluster.  We won't build the actual
connectors to Solr/ES/other on this ticket, but we will make it easier for integrators to
build their own.
> This is slated for 1.23/2.0...not 1.22.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Mime
View raw message