nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Srinivasan <>
Subject Generating Remote URL for InvokeHTTP
Date Fri, 16 Nov 2018 16:10:38 GMT
Hi all,

I'm observing some slightly unusual behaviour with my flow and wanted
to run a possible explanation past the list. I'm using NiFi to scrape
a website consisting of nested data

e.g. GET http://server/2018/16/11/  returns a webpage full of links to
today's data

I'm using a combination of InvokeHTTP (to traverse the hierarchy) and
GetHTMLElement (to extract file and directory links), starting at the
root i.e. http://server/, then walking the years, months, days etc.

I'm generating the Remote URLs as


where invokehttp.request.url is the URL previously fetched for the day
listing in the hierarchy, and HTMLElement is the link to the file
extracted by GetHTMLElement.

Finally, I've routed "retry" and "failure" back to the InvokeHTTP
processor since my network is quite flaky.

Mostly everything is ok, but sometimes I manage to generate URLs which
look a bit like this:


i.e. the filename part of the URL is duplicated

My thesis is that this is occurring when there is a network issue, so
the flowfile is routed to retry, then the InvokeHTTP processor
re-evaluates the expression for the Remote URL which leads to the
duplication of the filename (since invokehttp.request.url will have
been updated by the failed request).

Does this sound feasible? My proposed fix for my flow is to use a
single attribute for the URL and UpdateAttribute before InvokeHTTP to
set this, so that any retries don't munge the URL.

Many thanks, hope this makes sense.


View raw message