spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dino Fancellu <d...@felstar.com>
Subject Re: Local Spark talking to remote HDFS?
Date Tue, 25 Aug 2015 11:49:28 GMT
Tried adding 50010, 50020 and 50090. Still no difference.

I can't imagine I'm the only person on the planet wanting to do this.

Anyway, thanks for trying to help.

Dino.

On 25 August 2015 at 08:22, Roberto Congiu <roberto.congiu@gmail.com> wrote:
> Port 8020 is not the only port you need tunnelled for HDFS to work. If you
> only list the contents of a directory, port 8020 is enough... for instance,
> using something
>
> val p = new org.apache.hadoop.fs.Path("hdfs://localhost:8020/")
> val fs = p.getFileSystem(sc.hadoopConfiguration)
> fs.listStatus(p)
>
> you should see the file list.
> But then, when accessing a file, you need to actually get its blocks, it has
> to connect to the data node.
> The error 'could not obtain block' means it can't get that block from the
> DataNode.
> Refer to
> http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.2.1/bk_reference/content/reference_chap2_1.html
> to see the complete list of ports that also need to be tunnelled.
>
>
>
> 2015-08-24 13:10 GMT-07:00 Dino Fancellu <dino@felstar.com>:
>>
>> Changing the ip to the guest IP address just never connects.
>>
>> The VM has port tunnelling, and it passes through all the main ports,
>> 8020 included to the host VM.
>>
>> You can tell that it was talking to the guest VM before, simply
>> because it said when file not found
>>
>> Error is:
>>
>> Exception in thread "main" org.apache.spark.SparkException: Job
>> aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most
>> recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost):
>> org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block:
>> BP-452094660-10.0.2.15-1437494483194:blk_1073742905_2098
>> file=/tmp/people.txt
>>
>> but I have no idea what it means by that. It certainly can find the
>> file and knows it exists.
>>
>>
>>
>> On 24 August 2015 at 20:43, Roberto Congiu <roberto.congiu@gmail.com>
>> wrote:
>> > When you launch your HDP guest VM, most likely it gets launched with NAT
>> > and
>> > an address on a private network (192.168.x.x) so on your windows host
>> > you
>> > should use that address (you can find out using ifconfig on the guest
>> > OS).
>> > I usually add an entry to my /etc/hosts for VMs that I use often....if
>> > you
>> > use vagrant, there's also a vagrant module that can do that
>> > automatically.
>> > Also, I am not sure how the default HDP VM is set up, that is, if it
>> > only
>> > binds HDFS to 127.0.0.1 or to all addresses. You can check that with
>> > netstat
>> > -a.
>> >
>> > R.
>> >
>> > 2015-08-24 11:46 GMT-07:00 Dino Fancellu <dino@felstar.com>:
>> >>
>> >> I have a file in HDFS inside my HortonWorks HDP 2.3_1 VirtualBox VM.
>> >>
>> >> If I go into the guest spark-shell and refer to the file thus, it works
>> >> fine
>> >>
>> >>   val words=sc.textFile("hdfs:///tmp/people.txt")
>> >>   words.count
>> >>
>> >> However if I try to access it from a local Spark app on my Windows
>> >> host,
>> >> it
>> >> doesn't work
>> >>
>> >>   val conf = new SparkConf().setMaster("local").setAppName("My App")
>> >>   val sc = new SparkContext(conf)
>> >>
>> >>   val words=sc.textFile("hdfs://localhost:8020/tmp/people.txt")
>> >>   words.count
>> >>
>> >> Emits
>> >>
>> >>
>> >>
>> >> The port 8020 is open, and if I choose the wrong file name, it will
>> >> tell
>> >> me
>> >>
>> >>
>> >>
>> >> My pom has
>> >>
>> >>         <dependency>
>> >>                         <groupId>org.apache.spark</groupId>
>> >>                         <artifactId>spark-core_2.11</artifactId>
>> >>                         <version>1.4.1</version>
>> >>                         <scope>provided</scope>
>> >>                 </dependency>
>> >>
>> >> Am I doing something wrong?
>> >>
>> >> Thanks.
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> View this message in context:
>> >>
>> >> http://apache-spark-user-list.1001560.n3.nabble.com/Local-Spark-talking-to-remote-HDFS-tp24425.html
>> >> Sent from the Apache Spark User List mailing list archive at
>> >> Nabble.com.
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> >> For additional commands, e-mail: user-help@spark.apache.org
>> >>
>> >
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message