tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fossies Administrator <Jens.Schleuse...@fossies.org>
Subject Re: Codespell report for Tika 1.23
Date Thu, 26 Dec 2019 20:19:32 GMT
On Wed, 25 Dec 2019, Tilman Hausherr wrote:

> Hello Jens,
> Thank you again, I have corrected all I wanted to, and created one issue for 
> a false positive
> https://github.com/codespell-project/codespell/issues/1399
> Tilman

Yes, that is a false positive but I assume that the issue isn't easily to 
solve since "codespell" claims to be "designed primarily for checking 
misspelled words in source code" but the context recognition seems 
currently to be improvable.

So it's more my error while manually pre-checking for false positives.
I let ignore now also "endianess" and "instanciate" and the current result 
(with the very good rating grade: "A") can be found here:

  https://fossies.org/linux/test/pdfbox-trunk.zip/codespell.html
  https://fossies.org/linux/test/pdfbox-trunk-a6bc826.191225.zip/codespell.html

Regards

Jens

> Am 15.12.2019 um 16:33 schrieb Fossies Administrator:
>>  Hi Tilman,
>>
>>>  Thank you! I've now corrected all typos except those related to variable
>>>  / method names (want to keep API stability), "Cloneable
>>>  <https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell.html#Cloneable>"
>>>  (that is in java itself LOL) and a few that are in resource files (these
>>>  are text extractions, i.e. the typos are in the original PDF, e.g.
>>>  PDFBOX-3044-010197-p5-ligatures.pdf).
>>
>>  Oops, that file I have overseen and "Cloneable" is now also ignored.
>>
>>>  Yes, I would like to have a report for the trunk too, although I don't
>>>  expect much new typos.
>>
>>  A new "false positive" word "hIST" is now ignored but for better
>>  comparability I have leaved all other unchanged.
>>
>>  Here the main URLs for trunk checked out today Sunday at 14:59 CET.
>>
>>   https://fossies.org/linux/test/pdfbox-trunk.zip/codespell.html
>>   https://fossies.org/linux/test/pdfbox-trunk.191215_1459.zip/codespell.html 
>> 
>>
>>  Looks much better!
>>
>>  Regards
>>
>>  Jens
>>
>>>  Am 11.12.2019 um 21:50 schrieb Fossies Administrator:
>>>>   Hi Tilman,
>>>>
>>>>>   Am 10.12.2019 um 16:51 schrieb Fossies Administrator:
>>>>>>    Although such reports are normally only generated on request
>>>>> 
>>>>>
>>>>>   Hello, can we also get this for Apache PDFBox? I've corrected typos
>>>>>  when
>>>>>   I hit them, but I can't look everywhere.
>>>>>
>>>>>   https://github.com/apache/pdfbox/
>>>>>
>>>>>   or
>>>>>
>>>>>   https://svn.apache.org/repos/asf/pdfbox/
>>>>>
>>>>>   The PDFBox is used by the Tika project, and has people common to both
>>>>>   projects.
>>>>
>>>>   Although Fossies has now also the possibilty to create such reports in
>>>>  a
>>>>   special test folder that isn't integrated in the Fossies standard
>>>>  services
>>>>   and should hopefully also not accessible by search engines, that
>>>>  package
>>>>   is now included in the main Fossies folder "/linux/misc":
>>>>
>>>>    https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/
>>>>
>>>>   The according codespell URLs are
>>>>
>>>>    https://fossies.org/linux/misc/pdfbox/codespell.html
>>>>
>>>>   currently redirecting to
>>>>
>>>>   https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell.html
>>>>
>>>>   and
>>>>
>>>>    https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell_conf.html

>>>>
>>>>    https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell_fps.html

>>>> 
>>>>
>>>>   If it would be meaningful to do a codespell check for e.g. for the
>>>>  "trunk"
>>>>   version so let it know me and I can do that in the mentioned
>>>>  "/linux/test"
>>>>   folder.
>>>>
>>>>   Regards
>>>>
>>>>   Jens
Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message