nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Bende <bbe...@gmail.com>
Subject Re: GetHDFS from Azure Blob
Date Tue, 28 Mar 2017 19:55:24 GMT
Austin,

I believe the default FS is only used when you write to a path that
doesn't specify the filesystem. Meaning, if you set the directory of
PutHDFS to /data then it will use the default FS, but if you specify
wasb://user@wasb2/data then it will go to /data in a different
filesystem.

The problem here is that I don't see a way to specify different keys
for each WASB filesystem in the core-site.xml.

Admittedly I have never tried to setup something like this with many
different filesystems.

-Bryan


On Tue, Mar 28, 2017 at 3:50 PM, Austin Heyne <aheyne@ccri.com> wrote:
> Hi Andre,
>
> Yes, I'm aware of that configuration property, it's what I have been using
> to set the core-site.xml and hdfs-site.xml. For testing this I didn't modify
> the core-site located in the HADOOP_CONF_DIR but rather copied and modified
> it and the pointed the processor to the copy. The problem with this is that
> we'll end up with a large number of core-site.xml copies that will all have
> to be maintained separately. Ideally we'd be able to specify the defaultFS
> in the processor config or have the processor behave like the hdfs command
> line tools. The command line tools don't require the defaultFS to be set to
> a wasb url in order to use wasb urls.
>
> The key idea here is long term maintainability and using Ambari to maintain
> the configuration. If we need to change any other setting in the
> core-site.xml we'd have to change it in a bunch of different files manually.
>
> Thanks,
> Austin
>
>
> On 03/28/2017 03:34 PM, Andre wrote:
>
> Austin,
>
> Perhaps that wasn't explicit but the settings don't need to be system wide,
> instead the defaultFS may be changed just for a particular processor, while
> the others may use configurations.
>
> The *HDFS processor documentation mentions it allows yout to set particular
> hadoop configurations:
>
> " A file or comma separated list of files which contains the Hadoop file
> system configuration. Without this, Hadoop will search the classpath for a
> 'core-site.xml' and 'hdfs-site.xml' file or will revert to a default
> configuration"
>
> Have you tried using this field to point to a file as described by Bryan?
>
> Cheers
>
> On 29 Mar 2017 05:21, "Austin Heyne" <aheyne@ccri.com> wrote:
>
> Thanks Bryan,
>
> Working with the configuration you sent what I needed to change was to set
> the fs.defaultFS to the wasb url that we're working from. Unfortunately this
> is a less than ideal solution since we'll be pulling files from multiple
> wasb urls and ingesting them into an Accumulo datastore. Changing the
> defaultFS I'm pretty certainly would mess with our local HDFS/Accumulo
> install. In addition we're trying to maintain all of this configuration with
> Ambari, which from what I can tell only supports one core-site configuration
> file.
>
> Is the only solution here to maintain multiple core-site.xml files or is
> there another way we configure this?
>
> Thanks,
>
> Austin
>
>
>
> On 03/28/2017 01:41 PM, Bryan Bende wrote:
>>
>> Austin,
>>
>> Can you provide the full error message and stacktrace for  the
>> IllegalArgumentException from nifi-app.log?
>>
>> When you start the processor it creates a FileSystem instance based on
>> the config files provided to the processor, which in turn causes all
>> of the corresponding classes to load.
>>
>> I'm not that familiar with Azure, but if "Azure blob store" is WASB,
>> then I have successfully done the following...
>>
>> In core-site.xml:
>>
>> <configuration>
>>
>>      <property>
>>        <name>fs.defaultFS</name>
>>        <value>wasb://YOUR_USER@YOUR_HOST/</value>
>>      </property>
>>
>>      <property>
>>        <name>fs.azure.account.key.nifi.blob.core.windows.net</name>
>>        <value>YOUR_KEY</value>
>>      </property>
>>
>>      <property>
>>        <name>fs.AbstractFileSystem.wasb.impl</name>
>>        <value>org.apache.hadoop.fs.azure.Wasb</value>
>>      </property>
>>
>>      <property>
>>        <name>fs.wasb.impl</name>
>>        <value>org.apache.hadoop.fs.azure.NativeAzureFileSystem</value>
>>      </property>
>>
>>      <property>
>>        <name>fs.azure.skip.metrics</name>
>>        <value>true</value>
>>      </property>
>>
>> </configuration>
>>
>> In Additional Resources property of an HDFS processor, point to a
>> directory with:
>>
>> azure-storage-2.0.0.jar
>> commons-codec-1.6.jar
>> commons-lang3-3.3.2.jar
>> commons-logging-1.1.1.jar
>> guava-11.0.2.jar
>> hadoop-azure-2.7.3.jar
>> httpclient-4.2.5.jar
>> httpcore-4.2.4.jar
>> jackson-core-2.2.3.jar
>> jsr305-1.3.9.jar
>> slf4j-api-1.7.5.jar
>>
>>
>> Thanks,
>>
>> Bryan
>>
>>
>> On Tue, Mar 28, 2017 at 1:15 PM, Austin Heyne <aheyne@ccri.com> wrote:
>>>
>>> Hi all,
>>>
>>> Thanks for all the help you've given me so far. Today I'm trying to pull
>>> files from an Azure blob store. I've done some reading on this and from
>>> previous tickets [1] and guides [2] it seems the recommended approach is
>>> to
>>> place the required jars, to use the HDFS Azure protocol, in 'Additional
>>> Classpath Resoures' and the hadoop core-site and hdfs-site configs into
>>> the
>>> 'Hadoop Configuration Resources'. I have my local HDFS properly
>>> configured
>>> to access wasb urls. I'm able to ls, copy to and from, etc with out
>>> problem.
>>> Using the same HDFS config files and trying both all the jars in my
>>> hadoop-client/lib directory (hdp) and using the jars recommend in [1] I'm
>>> still seeing the "java.lang.IllegalArgumentException: Wrong FS: " error
>>> in
>>> my NiFi logs and am unable to pull files from Azure blob storage.
>>>
>>> Interestingly, it seems the processor is spinning up way to fast, the
>>> errors
>>> appear in the log as soon as I start the processor. I'm not sure how it
>>> could be loading all of those jars that quickly.
>>>
>>> Does anyone have any experience with this or recommendations to try?
>>>
>>> Thanks,
>>> Austin
>>>
>>> [1] https://issues.apache.org/jira/browse/NIFI-1922
>>> [2]
>>>
>>> https://community.hortonworks.com/articles/71916/connecting-to-azure-data-lake-from-a-nifi-dataflow.html
>>>
>>>
>
>
>

Mime
View raw message