apache spark - reading a file in hdfs from pyspark -


i'm trying read file in hdfs. here's showing of hadoop file structure.

hduser@gvm:/usr/local/spark/bin$ hadoop fs -ls -r / drwxr-xr-x   - hduser supergroup          0 2016-03-06 17:28 /inputfiles drwxr-xr-x   - hduser supergroup          0 2016-03-06 17:31 /inputfiles/countofmontecristo -rw-r--r--   1 hduser supergroup    2685300 2016-03-06 17:31 /inputfiles/countofmontecristo/booktext.txt 

here's pyspark code:

from pyspark import sparkcontext, sparkconf  conf = sparkconf().setappname("myfirstapp").setmaster("local") sc = sparkcontext(conf=conf)  textfile = sc.textfile("hdfs://inputfiles/countofmontecristo/booktext.txt") textfile.first() 

the error is:

py4jjavaerror: error occurred while calling o64.partitions. : java.lang.illegalargumentexception: java.net.unknownhostexception: inputfiles 

is because i'm setting sparkcontext incorrectly? i'm running in ubuntu 14.04 virtual machine through virtual box.

i'm not sure i'm doing wrong here....

you access hdfs files via full path if no configuration provided.(namenodehost if localhost if hdfs located in local environment).

hdfs://namenodehost/inputfiles/countofmontecristo/booktext.txt 

Comments

Popular posts from this blog

java - Run spring boot application error: Cannot instantiate interface org.springframework.context.ApplicationListener -

reactjs - React router and this.props.children - how to pass state to this.props.children -

Excel VBA "Microsoft Windows Common Controls 6.0 (SP6)" Location Changes -