tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ahmad Sawalhah (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2183) Can't Read file if its name is Arabic
Date Wed, 23 Nov 2016 15:24:58 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15690396#comment-15690396
] 

Ahmad Sawalhah commented on TIKA-2183:
--------------------------------------

Traceback (most recent call last):
  File "D:/NisreenThalgi/Project/frmmain.py", line 80, in startProcessing
    rdFile=ReadFile(self.fname)
  File "D:\NisreenThalgi\Project\ReadFile_2.py", line 33, in __init__
    self.ReadCorpusFile(filename)
  File "D:\NisreenThalgi\Project\ReadFile_2.py", line 37, in ReadCorpusFile
    parsed = parser.from_file( filename)
  File "C:\Python34\lib\site-packages\tika\parser.py", line 25, in from_file
    jsonOutput = parse1('all', filename, serverEndpoint)
  File "C:\Python34\lib\site-packages\tika\tika.py", line 217, in parse1
    verbose, tikaServerJar)
  File "C:\Python34\lib\site-packages\tika\tika.py", line 338, in callServer
    resp = verbFn(serviceUrl, encodedData, headers=headers)
  File "C:\Python34\lib\site-packages\requests\api.py", line 123, in put
    return request('put', url, data=data, **kwargs)
  File "C:\Python34\lib\site-packages\requests\api.py", line 56, in request
    return session.request(method=method, url=url, **kwargs)
  File "C:\Python34\lib\site-packages\requests\sessions.py", line 475, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\Python34\lib\site-packages\requests\sessions.py", line 596, in send
    r = adapter.send(request, **kwargs)
  File "C:\Python34\lib\site-packages\requests\adapters.py", line 423, in send
    timeout=timeout
  File "C:\Python34\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 595,
in urlopen
    chunked=chunked)
  File "C:\Python34\lib\site-packages\requests\packages\urllib3\connectionpool.py", line 363,
in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "C:\Python34\lib\http\client.py", line 1137, in request
    self._send_request(method, url, body, headers)
  File "C:\Python34\lib\http\client.py", line 1177, in _send_request
    self.putheader(hdr, value)
  File "C:\Python34\lib\http\client.py", line 1109, in putheader
    values[i] = one_value.encode('latin-1')
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 21-27: ordinal not
in range(256)




> Can't Read file if its name is Arabic
> -------------------------------------
>
>                 Key: TIKA-2183
>                 URL: https://issues.apache.org/jira/browse/TIKA-2183
>             Project: Tika
>          Issue Type: Bug
>          Components: general, languageidentifier
>    Affects Versions: 1.14
>            Reporter: Ahmad Sawalhah
>
> if I have an Arabic File name like ( احمد.docx ) it gives me this error
>  File "C:\Python34\lib\http\client.py", line 1109, in putheader
>     values[i] = one_value.encode('latin-1')
> UnicodeEncodeError: 'latin-1' codec can't encode characters in position 21-27: ordinal
not in range(256)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message