lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan Ernst (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-6231) smokeTestRelease.py should retry failed downloads
Date Sat, 14 Feb 2015 17:28:11 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-6231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14321575#comment-14321575
] 

Ryan Ernst commented on LUCENE-6231:
------------------------------------

[~steve_rowe] Regarding the patch, why not use exponential backoff? That allows you to start
smaller, but still get the desired retries over a larger number of seconds.

Now with this particular issue, I think I see merits to both sides.  I have seen these download
issues in the past (seems to be partly flakiness from the p.a.o servers), and they are annoying.
Thankfully the apache servers that real releases come from are not so flaky.  But regardless
of the size of whichever file has trouble, the more and larger files there are, the higher
the likelihood a download issue occurs. And I think that is worth addressing, and not masking
the issue.

So I am in favor of this patch, but I also think taking a step back and addressing LUCENE-6247
is important.  Users will not have these retries when downloading through a browser, and the
larger the total download, the higher the chance something goes wrong.

> smokeTestRelease.py should retry failed downloads
> -------------------------------------------------
>
>                 Key: LUCENE-6231
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6231
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Steve Rowe
>            Assignee: Steve Rowe
>             Fix For: 5.0, Trunk, 5.1
>
>         Attachments: LUCENE-6231-part-2.patch, LUCENE-6231-part-3.patch, LUCENE-6231-part-3.patch,
LUCENE-6231.patch
>
>
> In the 5.0 RC2 vote thread, [~anshumg] mentioned that 6 attempts at running the smoke
tester against the people.apache.org RC URL all failed because of download failures.
> I had the same problem - my first two attempts also failed because of failed downloads
- here's the trace from one of them:
> {noformat}
> Traceback (most recent call last):
>   File "/Users/sarowe/homebrew/Cellar/python3/3.3.2/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py",
line 1248, in do_open
>     h.request(req.get_method(), req.selector, req.data, headers)
>   File "/Users/sarowe/homebrew/Cellar/python3/3.3.2/Frameworks/Python.framework/Versions/3.3/lib/python3.3/http/client.py",
line 1061, in request
>     self._send_request(method, url, body, headers)
>   File "/Users/sarowe/homebrew/Cellar/python3/3.3.2/Frameworks/Python.framework/Versions/3.3/lib/python3.3/http/client.py",
line 1099, in _send_request
>     self.endheaders(body)
>   File "/Users/sarowe/homebrew/Cellar/python3/3.3.2/Frameworks/Python.framework/Versions/3.3/lib/python3.3/http/client.py",
line 1057, in endheaders
>     self._send_output(message_body)
>   File "/Users/sarowe/homebrew/Cellar/python3/3.3.2/Frameworks/Python.framework/Versions/3.3/lib/python3.3/http/client.py",
line 902, in _send_output
>     self.send(msg)
>   File "/Users/sarowe/homebrew/Cellar/python3/3.3.2/Frameworks/Python.framework/Versions/3.3/lib/python3.3/http/client.py",
line 840, in send
>     self.connect()
>   File "/Users/sarowe/homebrew/Cellar/python3/3.3.2/Frameworks/Python.framework/Versions/3.3/lib/python3.3/http/client.py",
line 818, in connect
>     self.timeout, self.source_address)
>   File "/Users/sarowe/homebrew/Cellar/python3/3.3.2/Frameworks/Python.framework/Versions/3.3/lib/python3.3/socket.py",
line 435, in create_connection
>     raise err
>   File "/Users/sarowe/homebrew/Cellar/python3/3.3.2/Frameworks/Python.framework/Versions/3.3/lib/python3.3/socket.py",
line 426, in create_connection
>     sock.connect(sa)
> TimeoutError: [Errno 60] Operation timed out
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "dev-tools/scripts/smokeTestRelease.py", line 117, in download
>     fIn = urllib.request.urlopen(urlString)
>   File "/Users/sarowe/homebrew/Cellar/python3/3.3.2/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py",
line 156, in urlopen
>     return opener.open(url, data, timeout)
>   File "/Users/sarowe/homebrew/Cellar/python3/3.3.2/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py",
line 469, in open
>     response = self._open(req, data)
>   File "/Users/sarowe/homebrew/Cellar/python3/3.3.2/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py",
line 487, in _open
>     '_open', req)
>   File "/Users/sarowe/homebrew/Cellar/python3/3.3.2/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py",
line 447, in _call_chain
>     result = func(*args)
>   File "/Users/sarowe/homebrew/Cellar/python3/3.3.2/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py",
line 1268, in http_open
>     return self.do_open(http.client.HTTPConnection, req)
>   File "/Users/sarowe/homebrew/Cellar/python3/3.3.2/Frameworks/Python.framework/Versions/3.3/lib/python3.3/urllib/request.py",
line 1251, in do_open
>     raise URLError(err)
> urllib.error.URLError: <urlopen error [Errno 60] Operation timed out>
> The above exception was the direct cause of the following exception:
> Traceback (most recent call last):
>   File "dev-tools/scripts/smokeTestRelease.py", line 1523, in <module>
>     main()
>   File "dev-tools/scripts/smokeTestRelease.py", line 1468, in main
>     smokeTest(c.java, c.url, c.revision, c.version, c.tmp_dir, c.is_signed, ' '.join(c.test_args))
>   File "dev-tools/scripts/smokeTestRelease.py", line 1517, in smokeTest
>     checkMaven(baseURL, tmpDir, svnRevision, version, isSigned)
>   File "dev-tools/scripts/smokeTestRelease.py", line 1012, in checkMaven
>     crawl(artifacts[project], artifactsURL, targetDir)
>   File "dev-tools/scripts/smokeTestRelease.py", line 1280, in crawl
>     crawl(downloadedFiles, subURL, path, exclusions)
>   File "dev-tools/scripts/smokeTestRelease.py", line 1280, in crawl
>     crawl(downloadedFiles, subURL, path, exclusions)
>   File "dev-tools/scripts/smokeTestRelease.py", line 1283, in crawl
>     download(text, subURL, targetDir, quiet=True)
>   File "dev-tools/scripts/smokeTestRelease.py", line 139, in download
>     raise RuntimeError('failed to download url "%s"' % urlString) from e
> RuntimeError: failed to download url "http://people.apache.org/~anshum/staging_area/lucene-solr-5.0.0-RC2-rev1658469//lucene/maven/org/apache/lucene/lucene-analyzers-uima/5.0.0/lucene-analyzers-uima-5.0.0.jar.asc"
> {noformat}
> I did a recursive download of the RC2 folder on people.apache.org using wget, and there
were three download failures, which wget auto-retried, and succeeded in each case on the second
attempt - two of these were timeouts and the third was a reset connection: 
> {noformat}
> HTTP request sent, awaiting response... No data received.
> Retrying.
> [...]
> HTTP request sent, awaiting response... Read error (Connection reset by peer) in headers.
> Retrying.
> {noformat}
> I think we should just automatically retry all failed downloads once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message