nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aruna Sankaralingam <Aruna.Sankaralin...@Cormac-Corp.com>
Subject RE: ListS3 Processor Error
Date Tue, 12 Dec 2017 15:32:27 GMT
Thank you so much, Matt. You have answered my question.

Thanks
Aruna

From: Matt Burgess [mailto:mattyb149@apache.org]
Sent: Monday, December 11, 2017 7:26 PM
To: users@nifi.apache.org
Subject: Re: ListS3 Processor Error

Aruna,

The index and type for Elasticsearch are kinds of partitioning that can help the users organize
data, but definitely help in indexing and searching data. Types are not always required, but
an index is. Imagine you are trying to store a bunch of tweets from a Twitter feed (or firehose)
into Elasticsearch. You could call the index "twitter" and type "tweet" for each tweet that
you store in the twitter index. Now say you want to also put Twitter user information into
that index. You can reuse "twitter" as the index but then specify "user" as the type. Now
you can search the entire index for information in tweets and user data, or you can additionally
search by type, perhaps searching only the documents with user type.

In the REST API, the index/type is specified such as GET /twitter/tweet/1 or GET /twitter/user/2
or something like that. The Elasticsearch processors use the index and type information to
determine the right call to make to Elasticsearch.

You can certainly choose "pdf" as the type if you like, although depending on the sort of
queries you'll be running, you may want to pick an index that incorporates any kind of data
you'll be keeping together, and a type that is more domain-specific (such as "customer" if
it is a PDF full of customer data).  Please let me know if that answers your question, I can
provide more information if need be.

Regards,
Matt


On Mon, Dec 11, 2017 at 4:18 PM, Joe Witt <joe.witt@gmail.com<mailto:joe.witt@gmail.com>>
wrote:
For that we'll need someone familiar with that processor/Elastic to chime in :)

Thanks

On Mon, Dec 11, 2017 at 4:16 PM, Aruna Sankaralingam <Aruna.Sankaralingam@cormac-corp.com<mailto:Aruna.Sankaralingam@cormac-corp.com>>
wrote:
Oops I overlooked the question on version that you asked. My apologies. I am using Nifi v1.4.

I moved the pdf file to another folder in the same S3 bucket and Nifi was able to pick up.

Initially it was in
S3 > part-d-prescription-drug/unstructured
I moved to
S3 > Nifi-Pecos-files

I still don’t know what was wrong with the old location. But for now, I am using the one
that works.

I am trying to put this pdf file in elasticsearch.

I am not sure what I should give for “Index” and “Type”. Should the type be “PDF”
?

Thanks
Aruna

From: Joe Witt [mailto:joe.witt@gmail.com<mailto:joe.witt@gmail.com>]
Sent: Monday, December 11, 2017 3:32 PM

To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: Re: ListS3 Processor Error

Aruna,

We'll need to know more about your config/env to help I think.  I am not aware of any normal
usage situation that should result in truncated responses.  It is possible it is a coding
bug we can resolve but I think we'll need more details.  Did you see the questions in my last
reply?

Thanks

On Mon, Dec 11, 2017 at 2:50 PM, Aruna Sankaralingam <Aruna.Sankaralingam@cormac-corp.com<mailto:Aruna.Sankaralingam@cormac-corp.com>>
wrote:
Could someone please let me know what is wrong with the configuration that it is failing?

From: Aruna Sankaralingam [mailto:Aruna.Sankaralingam@Cormac-Corp.com<mailto:Aruna.Sankaralingam@Cormac-Corp.com>]
Sent: Monday, December 11, 2017 1:07 PM
To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: RE: ListS3 Processor Error

Attached my nifi-app.log. Could you please let me know what went wrong?

From: Joe Witt [mailto:joe.witt@gmail.com]
Sent: Friday, December 08, 2017 4:04 PM
To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: Re: ListS3 Processor Error

Here is an example I found for another processor

  https://mail-archives.apache.org/mod_mbox/nifi-dev/201509.mbox/%3CCAFddr26AEVqnoQ=mWr7DSNDFVrr9NuYy9GCcXg=4FYyCQAbbuw@mail.gmail.com%3E

Thanks

On Fri, Dec 8, 2017 at 4:02 PM, Aruna Sankaralingam <Aruna.Sankaralingam@cormac-corp.com<mailto:Aruna.Sankaralingam@cormac-corp.com>>
wrote:
Joe,
Could you please let me know how to turn on the debug logging?

From: Joe Witt [mailto:joe.witt@gmail.com<mailto:joe.witt@gmail.com>]
Sent: Friday, December 08, 2017 3:59 PM
To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: Re: ListS3 Processor Error

What version of NiFi?

Looks like either a classpath/classloader issue OR the amazon client library cannot parse
the response it is getting back...

The logs/nifi-app.log should have the full stack trace.  If not you can turn on debug logging
for that processor and perhaps then it will.

Thanks

On Fri, Dec 8, 2017 at 3:56 PM, Aruna Sankaralingam <Aruna.Sankaralingam@cormac-corp.com<mailto:Aruna.Sankaralingam@cormac-corp.com>>
wrote:
I am trying to get a pdf file from S3 and load to Elastic Search. The ListS3 processor is
giving me this error. Could someone please let me know where I am going wrong?

20:52:25 UTC
ERROR
37d7226e-0160-1000-6049-d4c489cd32f3
ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3]
failed to process session due to com.amazonaws.SdkClientException: Failed to parse XML document
with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler:
Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler
20:52:25 UTC
WARNING
37d7226e-0160-1000-6049-d4c489cd32f3
ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] Processor Administratively Yielded for 1 sec
due to processing failure
20:52:26 UTC
ERROR
37d7226e-0160-1000-6049-d4c489cd32f3
ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3]
failed to process due to com.amazonaws.SdkClientException: Failed to parse XML document with
handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler;
rolling back session: Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler
20:52:26 UTC
ERROR
37d7226e-0160-1000-6049-d4c489cd32f3
ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3]
failed to process session due to com.amazonaws.SdkClientException: Failed to parse XML document
with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler:
Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler
20:52:26 UTC
WARNING
37d7226e-0160-1000-6049-d4c489cd32f3
ListS3[id=37d7226e-0160-1000-6049-d4c489cd32f3] Processor Administratively Yielded for 1 sec
due to processing failure
Auto-refresh

[cid:image001.png@01D37334.7FDF7180]





Mime
View raw message