apache spark - reading a file in hdfs from pyspark -
i'm trying read file in hdfs. here's showing of hadoop file structure.
hduser@gvm:/usr/local/spark/bin$ hadoop fs -ls -r / drwxr-xr-x - hduser supergroup 0 2016-03-06 17:28 /inputfiles drwxr-xr-x - hduser supergroup 0 2016-03-06 17:31 /inputfiles/countofmontecristo -rw-r--r-- 1 hduser supergroup 2685300 2016-03-06 17:31 /inputfiles/countofmontecristo/booktext.txt
here's pyspark code:
from pyspark import sparkcontext, sparkconf conf = sparkconf().setappname("myfirstapp").setmaster("local") sc = sparkcontext(conf=conf) textfile = sc.textfile("hdfs://inputfiles/countofmontecristo/booktext.txt") textfile.first()
the error is:
py4jjavaerror: error occurred while calling o64.partitions. : java.lang.illegalargumentexception: java.net.unknownhostexception: inputfiles
is because i'm setting sparkcontext incorrectly? i'm running in ubuntu 14.04 virtual machine through virtual box.
i'm not sure i'm doing wrong here....
you access hdfs files via full path if no configuration provided.(namenodehost if localhost if hdfs located in local environment).
hdfs://namenodehost/inputfiles/countofmontecristo/booktext.txt
Comments
Post a Comment