Are you able to list the bucket with the AWS CLI (aws s3 ls)?  It can be helpful to compare performance between NiFi and the AWS CLI, especially if you are able to do so from the same machine, with the same permissions, and as similar bucket and prefix settings as you can manage.

In the screenshot above, the bucket is shown as "part-d-prescription-drug/unstructured", which looks unusual to me.  Is the bucket "part-d-prescription-drug" and the prefix "unstructured/"?

Thanks,

James

On Tue, Dec 12, 2017 at 7:34 AM, Aruna Sankaralingam <Aruna.Sankaralingam@cormac-corp.com> wrote:

Joe,

 

No, I don’t have anything in between AWS and NiFi.

NiFi is installed in one of the EC2 instance in AWS – N.Virginia Region

S3 is also in N.Virginia Region

 

From: Joe Witt [mailto:joe.witt@gmail.com]
Sent: Monday, December 11, 2017 1:28 PM
To: users@nifi.apache.org
Subject: Re: ListS3 Processor Error

 

The XML response is truncated for some reason as implied by the following. Do you have any devices/software/systems/proxies in between your NiFi and the amazon service?  Are you able to manually issue the request and get the response you expect?

 

2017-12-11 18:01:02,875 ERROR [Timer-Driven Process Thread-6] org.apache.nifi.processors.aws.s3.ListS3 ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] failed to process session due to com.amazonaws.SdkClientException: Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler: {}

com.amazonaws.SdkClientException: Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler

            at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseXmlInputStream(XmlResponsesSaxParser.java:156)

            at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseListBucketObjectsResponse(XmlResponsesSaxParser.java:298)

            at com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsUnmarshaller.unmarshall(Unmarshallers.java:70)

            at com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsUnmarshaller.unmarshall(Unmarshallers.java:59)

            at com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:62)

            at com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:31)

            at com.amazonaws.http.response.AwsResponseHandlerAdapter.handle(AwsResponseHandlerAdapter.java:70)

            at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1444)

            at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1151)

            at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:964)

            at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:676)

            at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:650)

            at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:633)

            at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$300(AmazonHttpClient.java:601)

            at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:583)

            at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:447)

            at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4137)

            at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4079)

            at com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:819)

            at org.apache.nifi.processors.aws.s3.ListS3$S3ObjectBucketLister.listVersions(ListS3.java:314)

            at org.apache.nifi.processors.aws.s3.ListS3.onTrigger(ListS3.java:208)

            at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)

            at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1119)

            at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:147)

            at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47)

            at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:128)

            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

            at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)

            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)

            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)

            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

            at java.lang.Thread.run(Thread.java:748)

Caused by: org.xml.sax.SAXParseException: Premature end of file.

            at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)

            at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)

            at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:400)

            at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:327)

            at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1472)

            at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:1014)

            at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:602)

            at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112)

            at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:505)

            at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:841)

            at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:770)

            at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)

            at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)

            at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseXmlInputStream(XmlResponsesSaxParser.java:142)

            ... 32 common frames omitted

 

 

On Mon, Dec 11, 2017 at 1:07 PM, Aruna Sankaralingam <Aruna.Sankaralingam@cormac-corp.com> wrote:

Attached my nifi-app.log. Could you please let me know what went wrong?

 

From: Joe Witt [mailto:joe.witt@gmail.com]
Sent: Friday, December 08, 2017 4:04 PM


To: users@nifi.apache.org
Subject: Re: ListS3 Processor Error

 

 

On Fri, Dec 8, 2017 at 4:02 PM, Aruna Sankaralingam <Aruna.Sankaralingam@cormac-corp.com> wrote:

Joe,

Could you please let me know how to turn on the debug logging?

 

From: Joe Witt [mailto:joe.witt@gmail.com]
Sent: Friday, December 08, 2017 3:59 PM
To: users@nifi.apache.org
Subject: Re: ListS3 Processor Error

 

What version of NiFi?

 

Looks like either a classpath/classloader issue OR the amazon client library cannot parse the response it is getting back...

 

The logs/nifi-app.log should have the full stack trace.  If not you can turn on debug logging for that processor and perhaps then it will.

 

Thanks

 

On Fri, Dec 8, 2017 at 3:56 PM, Aruna Sankaralingam <Aruna.Sankaralingam@cormac-corp.com> wrote:

I am trying to get a pdf file from S3 and load to Elastic Search. The ListS3 processor is giving me this error. Could someone please let me know where I am going wrong?

 

20:52:25 UTC

ERROR

37d7226e-0160-1000-6049-d4c489cd32f3

ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] failed to process session due to com.amazonaws.SdkClientException: Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler: Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler

20:52:25 UTC

WARNING

37d7226e-0160-1000-6049-d4c489cd32f3

ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] Processor Administratively Yielded for 1 sec due to processing failure

20:52:26 UTC

ERROR

37d7226e-0160-1000-6049-d4c489cd32f3

ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] failed to process due to com.amazonaws.SdkClientException: Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler; rolling back session: Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler

20:52:26 UTC

ERROR

37d7226e-0160-1000-6049-d4c489cd32f3

ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] failed to process session due to com.amazonaws.SdkClientException: Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler: Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler

20:52:26 UTC

WARNING

37d7226e-0160-1000-6049-d4c489cd32f3

ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] Processor Administratively Yielded for 1 sec due to processing failure

Auto-refresh