nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Austin Heyne <ahe...@ccri.com>
Subject Re: GetHDFS from Azure Blob
Date Tue, 28 Mar 2017 20:14:01 GMT
Bryan,

So I initially didn't think much of it (assumed it a typo, etc) but 
you've said that the access url for wasb that you've been using is 
wasb://YOUR_USER@YOUR_HOST/. However, this has never worked for us and 
I'm wondering if we have a difference configuration somewhere. What we 
have to use is 
wasb[s]://<containername>@<accountname>.blob.core.windows.net/<path> 
which seems to be in line with the Azure blob storage GUI and is what is 
outlined here [1]. Is there some other way this connector is being 
setup? It would make much more sense using your access pattern as then 
each container wouldn't need to have it's own core-site.xml.

Thanks,
Austin

[1a] 
https://hadoop.apache.org/docs/current/hadoop-azure/index.html#Accessing_wasb_URLs
[1b] 
https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-use-blob-storage



On 03/28/2017 03:55 PM, Bryan Bende wrote:
> Austin,
>
> I believe the default FS is only used when you write to a path that
> doesn't specify the filesystem. Meaning, if you set the directory of
> PutHDFS to /data then it will use the default FS, but if you specify
> wasb://user@wasb2/data then it will go to /data in a different
> filesystem.
>
> The problem here is that I don't see a way to specify different keys
> for each WASB filesystem in the core-site.xml.
>
> Admittedly I have never tried to setup something like this with many
> different filesystems.
>
> -Bryan
>
>
> On Tue, Mar 28, 2017 at 3:50 PM, Austin Heyne <aheyne@ccri.com> wrote:
>> Hi Andre,
>>
>> Yes, I'm aware of that configuration property, it's what I have been using
>> to set the core-site.xml and hdfs-site.xml. For testing this I didn't modify
>> the core-site located in the HADOOP_CONF_DIR but rather copied and modified
>> it and the pointed the processor to the copy. The problem with this is that
>> we'll end up with a large number of core-site.xml copies that will all have
>> to be maintained separately. Ideally we'd be able to specify the defaultFS
>> in the processor config or have the processor behave like the hdfs command
>> line tools. The command line tools don't require the defaultFS to be set to
>> a wasb url in order to use wasb urls.
>>
>> The key idea here is long term maintainability and using Ambari to maintain
>> the configuration. If we need to change any other setting in the
>> core-site.xml we'd have to change it in a bunch of different files manually.
>>
>> Thanks,
>> Austin
>>
>>
>> On 03/28/2017 03:34 PM, Andre wrote:
>>
>> Austin,
>>
>> Perhaps that wasn't explicit but the settings don't need to be system wide,
>> instead the defaultFS may be changed just for a particular processor, while
>> the others may use configurations.
>>
>> The *HDFS processor documentation mentions it allows yout to set particular
>> hadoop configurations:
>>
>> " A file or comma separated list of files which contains the Hadoop file
>> system configuration. Without this, Hadoop will search the classpath for a
>> 'core-site.xml' and 'hdfs-site.xml' file or will revert to a default
>> configuration"
>>
>> Have you tried using this field to point to a file as described by Bryan?
>>
>> Cheers
>>
>> On 29 Mar 2017 05:21, "Austin Heyne" <aheyne@ccri.com> wrote:
>>
>> Thanks Bryan,
>>
>> Working with the configuration you sent what I needed to change was to set
>> the fs.defaultFS to the wasb url that we're working from. Unfortunately this
>> is a less than ideal solution since we'll be pulling files from multiple
>> wasb urls and ingesting them into an Accumulo datastore. Changing the
>> defaultFS I'm pretty certainly would mess with our local HDFS/Accumulo
>> install. In addition we're trying to maintain all of this configuration with
>> Ambari, which from what I can tell only supports one core-site configuration
>> file.
>>
>> Is the only solution here to maintain multiple core-site.xml files or is
>> there another way we configure this?
>>
>> Thanks,
>>
>> Austin
>>
>>
>>
>> On 03/28/2017 01:41 PM, Bryan Bende wrote:
>>> Austin,
>>>
>>> Can you provide the full error message and stacktrace for  the
>>> IllegalArgumentException from nifi-app.log?
>>>
>>> When you start the processor it creates a FileSystem instance based on
>>> the config files provided to the processor, which in turn causes all
>>> of the corresponding classes to load.
>>>
>>> I'm not that familiar with Azure, but if "Azure blob store" is WASB,
>>> then I have successfully done the following...
>>>
>>> In core-site.xml:
>>>
>>> <configuration>
>>>
>>>       <property>
>>>         <name>fs.defaultFS</name>
>>>         <value>wasb://YOUR_USER@YOUR_HOST/</value>
>>>       </property>
>>>
>>>       <property>
>>>         <name>fs.azure.account.key.nifi.blob.core.windows.net</name>
>>>         <value>YOUR_KEY</value>
>>>       </property>
>>>
>>>       <property>
>>>         <name>fs.AbstractFileSystem.wasb.impl</name>
>>>         <value>org.apache.hadoop.fs.azure.Wasb</value>
>>>       </property>
>>>
>>>       <property>
>>>         <name>fs.wasb.impl</name>
>>>         <value>org.apache.hadoop.fs.azure.NativeAzureFileSystem</value>
>>>       </property>
>>>
>>>       <property>
>>>         <name>fs.azure.skip.metrics</name>
>>>         <value>true</value>
>>>       </property>
>>>
>>> </configuration>
>>>
>>> In Additional Resources property of an HDFS processor, point to a
>>> directory with:
>>>
>>> azure-storage-2.0.0.jar
>>> commons-codec-1.6.jar
>>> commons-lang3-3.3.2.jar
>>> commons-logging-1.1.1.jar
>>> guava-11.0.2.jar
>>> hadoop-azure-2.7.3.jar
>>> httpclient-4.2.5.jar
>>> httpcore-4.2.4.jar
>>> jackson-core-2.2.3.jar
>>> jsr305-1.3.9.jar
>>> slf4j-api-1.7.5.jar
>>>
>>>
>>> Thanks,
>>>
>>> Bryan
>>>
>>>
>>> On Tue, Mar 28, 2017 at 1:15 PM, Austin Heyne <aheyne@ccri.com> wrote:
>>>> Hi all,
>>>>
>>>> Thanks for all the help you've given me so far. Today I'm trying to pull
>>>> files from an Azure blob store. I've done some reading on this and from
>>>> previous tickets [1] and guides [2] it seems the recommended approach is
>>>> to
>>>> place the required jars, to use the HDFS Azure protocol, in 'Additional
>>>> Classpath Resoures' and the hadoop core-site and hdfs-site configs into
>>>> the
>>>> 'Hadoop Configuration Resources'. I have my local HDFS properly
>>>> configured
>>>> to access wasb urls. I'm able to ls, copy to and from, etc with out
>>>> problem.
>>>> Using the same HDFS config files and trying both all the jars in my
>>>> hadoop-client/lib directory (hdp) and using the jars recommend in [1] I'm
>>>> still seeing the "java.lang.IllegalArgumentException: Wrong FS: " error
>>>> in
>>>> my NiFi logs and am unable to pull files from Azure blob storage.
>>>>
>>>> Interestingly, it seems the processor is spinning up way to fast, the
>>>> errors
>>>> appear in the log as soon as I start the processor. I'm not sure how it
>>>> could be loading all of those jars that quickly.
>>>>
>>>> Does anyone have any experience with this or recommendations to try?
>>>>
>>>> Thanks,
>>>> Austin
>>>>
>>>> [1] https://issues.apache.org/jira/browse/NIFI-1922
>>>> [2]
>>>>
>>>> https://community.hortonworks.com/articles/71916/connecting-to-azure-data-lake-from-a-nifi-dataflow.html
>>>>
>>>>
>>
>>


Mime
View raw message