environment variables - spark.executorEnv doesn't seem to take any effect -
according spark docs, there way pass environment variables spawned executors:
spark.executorenv.[environmentvariablename] add environment variable specified environmentvariablename executor process. user can specify multiple of these set multiple environment variables.
i'm trying direct pyspark app use specific python executable(anaconda environment numpy etc etc), done altering pyspark_python variable in spark-env.sh. although way works, shipping new config cluster nodes every time want switch virtualenv looks huge overkill.
that's why tried pass pyspark_python in following way:
uu@e1:~$ pyspark_driver_python=ipython pyspark --conf \ spark.executorenv.pyspark_python="/usr/share/anaconda/bin/python" \ --master spark://e1.local:7077 but doesn't seem work:
in [1]: sc._conf.getall() out[1]: [(u'spark.executorenv.pyspark_python', u'/usr/share/anaconda/bin/python'), (u'spark.rdd.compress', u'true'), (u'spark.serializer.objectstreamreset', u'100'), (u'spark.master', u'spark://e1.local:7077'), (u'spark.submit.deploymode', u'client'), (u'spark.app.name', u'pysparkshell')] in [2]: def dummy(x): import sys return sys.executable ...: in [3]: sc.parallelize(xrange(100),50).map(dummy).take(10) out[3]: ['/usr/bin/python2.7', '/usr/bin/python2.7', '/usr/bin/python2.7', '/usr/bin/python2.7', '/usr/bin/python2.7', '/usr/bin/python2.7', '/usr/bin/python2.7', '/usr/bin/python2.7', '/usr/bin/python2.7', '/usr/bin/python2.7'] my spark-env.sh not have pyspark_python configured, default python gets called. additional info: it's spark 1.6.0 standalone mode cluster.
am missing important here?
taking quick peek @ https://github.com/apache/spark/blob/master/bin/pyspark
i think doing export can do
export pyspark_python="/usr/share/anaconda/bin/python"
to see if applies executors , run
pyspark_driver_python=ipython pyspark
Comments
Post a Comment