apache spark - reading a file in hdfs from pyspark -

- May 15, 2015

i'm trying read file in hdfs. here's showing of hadoop file structure.

hduser@gvm:/usr/local/spark/bin$ hadoop fs -ls -r / drwxr-xr-x   - hduser supergroup          0 2016-03-06 17:28 /inputfiles drwxr-xr-x   - hduser supergroup          0 2016-03-06 17:31 /inputfiles/countofmontecristo -rw-r--r--   1 hduser supergroup    2685300 2016-03-06 17:31 /inputfiles/countofmontecristo/booktext.txt

here's pyspark code:

from pyspark import sparkcontext, sparkconf  conf = sparkconf().setappname("myfirstapp").setmaster("local") sc = sparkcontext(conf=conf)  textfile = sc.textfile("hdfs://inputfiles/countofmontecristo/booktext.txt") textfile.first()

the error is:

py4jjavaerror: error occurred while calling o64.partitions. : java.lang.illegalargumentexception: java.net.unknownhostexception: inputfiles

is because i'm setting sparkcontext incorrectly? i'm running in ubuntu 14.04 virtual machine through virtual box.

i'm not sure i'm doing wrong here....

you access hdfs files via full path if no configuration provided.(namenodehost if localhost if hdfs located in local environment).

hdfs://namenodehost/inputfiles/countofmontecristo/booktext.txt

Search This Blog

Earony

apache spark - reading a file in hdfs from pyspark -

Comments

Post a Comment

Popular posts from this blog

java - Run spring boot application error: Cannot instantiate interface org.springframework.context.ApplicationListener -

reactjs - React router and this.props.children - how to pass state to this.props.children -

Excel VBA "Microsoft Windows Common Controls 6.0 (SP6)" Location Changes -