Question

[Solved] environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON

I have installed pyspark recently. It was installed correctly. When I am using following simple program in python, I am getting an error.

>>from pyspark import SparkContext
>>sc = SparkContext()
>>data = range(1,1000)
>>rdd = sc.parallelize(data)
>>rdd.collect()

while running the last line I am getting error whose key line seems to be

[Stage 0:>                                                          (0 + 0) / 4]18/01/15 14:36:32 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/pyspark/python/lib/pyspark.zip/pyspark/worker.py", line 123, in main
    ("%d.%d" % sys.version_info[:2], version))
Exception: Python in worker has different version 2.7 than that in driver 3.5, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.

I have the following variables in .bashrc

export SPARK_HOME=/opt/spark
export PYTHONPATH=$SPARK_HOME/python3

I am using Python 3.

Solution #1:

By the way, if you use PyCharm, you could add PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON to run/debug configurations per image below
enter image description here

Respondent: buxizhizhoum

Solution #2:

You should set the following environment variables in $SPARK_HOME/conf/spark-env.sh:

export PYSPARK_PYTHON=/usr/bin/python
export PYSPARK_DRIVER_PYTHON=/usr/bin/python

If spark-env.sh doesn’t exist, you can rename spark-env.sh.template

Respondent: Alex

Solution #3:

I got the same issue, and I set both variable in .bash_profile

export PYSPARK_PYTHON=/usr/local/bin/python3
export PYSPARK_DRIVER_PYTHON=/usr/local/bin/python3

But My problem is still there.

Then I found out the problem is that my default python version is python 2.7 by typing python --version

So I solved the problem by following below page:
How to set Python’s default version to 3.x on OS X?

Respondent: Ruxi Zhang

Solution #4:

Just run the code below in the very beginning of your code. I am using Python3.7. You might need to run locate python3.7 to get your Python path.

import os
os.environ["PYSPARK_PYTHON"] = "/Library/Frameworks/Python.framework/Versions/3.7/bin/python3.7"
os.environ["PYSPARK_DRIVER_PYTHON"] = "/Library/Frameworks/Python.framework/Versions/3.7/bin/python3.7"
Respondent: James Chang

Solution #5:

This may happen also if you’re working within an environment. In this case, it may be harder to retrieve the correct path to the python executable (and anyway I think it’s not a good idea to hardcode the path if you want to share it with others).

If you run the following lines at the beginning of your script/notebook (at least before you create the SparkSession/SparkContext) the problem is solved:

import os
import sys

os.environ['PYSPARK_PYTHON'] = sys.executable
os.environ['PYSPARK_DRIVER_PYTHON'] = sys.executable

Package os allows you to set global variables; package sys gives the string with the absolute path of the executable binary for the Python interpreter.

Respondent: Davide Frison

Solution #6:

I’m using Jupyter Notebook to study PySpark, and that’s what worked for me.
Find where python3 is installed doing in a terminal:

which python3

Here is pointing to /usr/bin/python3.
Now in the the beginning of the notebook (or .py script), do:

import os

# Set spark environments
os.environ['PYSPARK_PYTHON'] = '/usr/bin/python3'
os.environ['PYSPARK_DRIVER_PYTHON'] = '/usr/bin/python3'

Restart your notebook session and it should works!

Respondent: igorkf

Solution #7:

Apache-Spark 2.4.3 on Archlinux

I’ve just installed Apache-Spark-2.3.4 from Apache-Spark website, I’m using Archlinux distribution, it’s simple and lightweight distribution. So, I’ve installed and put the apache-spark directory on /opt/apache-spark/, now it’s time to export our environment variables, remember, I’m using Archlinux, so take in mind to using your $JAVA_HOME for example.

Importing environment variables

echo 'export JAVA_HOME=/usr/lib/jvm/java-7-openjdk/jre' >> /home/user/.bashrc
echo 'export SPARK_HOME=/opt/apache-spark'  >> /home/user/.bashrc
echo 'export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH'  >> /home/user/.bashrc
echo 'export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.7-src.zip:$PYTHONPATH'  >> /home/user/.bashrc
source ../.bashrc 

Testing

[email protected] ~ $ echo 'export JAVA_HOME=/usr/lib/jvm/java-7-openjdk/jre' >> /home/emanuel/.bashrc
[email protected] ~ $ echo 'export SPARK_HOME=/opt/apache-spark'  >> /home/emanuel/.bashrc
[email protected] ~ $ echo 'export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH'  >> /home/emanuel/.bashrc
[email protected] ~ $ echo 'export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.7-src.zip:$PYTHONPATH'  >> /home/emanuel/.bashrc
[email protected] ~ $ source .bashrc 
[email protected] ~ $ python
Python 3.7.3 (default, Jun 24 2019, 04:54:02) 
[GCC 9.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyspark
>>> 

Everything it’s working fine since you correctly imported the environment variables for SparkContext.

Using Apache-Spark on Archlinux via DockerImage

For my use purposes I’ve created a Docker image with python, jupyter-notebook and apache-spark-2.3.4

running the image

docker run -ti -p 8888:8888 emanuelfontelles/spark-jupyter

just go to your browser and type

http://localhost:8888/tree

and will prompted a authentication page, come back to terminal and copy the token number and voila, will have Archlinux container running a Apache-Spark distribution.

Respondent: Emanuel Fontelles

Solution #8:

If you are using Pycharm , Got to Run – > Edit Configurations and click on Environment variables to add as below(basically the PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON should point to the same version of Python) . This solution worked for me .Thanks to the above posts.
enter image description here

Respondent: RaHuL VeNuGoPaL

Solution #9:

To make it easier to see for people, that instead of having to set a specific path /usr/bin/python3 that you can do this:

I put this line in my ~/.zshrc

export PYSPARK_PYTHON=python3.8
export PYSPARK_DRIVER_PYTHON=python3.8

When I type in python3.8 in my terminal I get Python3.8 going. I think it’s because I installed pipenv.

Another good website to reference to get your SPARK_HOME is https://towardsdatascience.com/how-to-use-pyspark-on-your-computer-9c7180075617
(for permission denied issues use sudo mv)

Respondent: S.Doe_Dude

Solution #10:

I tried two methods for the question. the method in the picture can works.

add environment variables

PYSPARK_PYTHON=/usr/local/bin/python3.7;PYSPARK_DRIVER_PYTHON=/usr/local/bin/python3.7;PYTHONUNBUFFERED=1

Respondent: Eric Cheng

The answers/resolutions are collected from stackoverflow, are licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0 .

Most Popular

To Top
India and Pakistan’s steroid-soaked rhetoric over Kashmir will come back to haunt them both clenbuterol australia bossier man pleads guilty for leadership role in anabolic steriod distribution conspiracy