Setting up Virtual Environment for Pyspark or any other clustered env

On clustered environment, we face lot of issues with the python version available on the nodes, if we are shipping our product in that case we had to perform lot of sanity test pre-deployment to make sure our application will run as per our expectation but we can’t cover all scenarios and hence there is high chance of hitting issue.

So we thought of a better way and come up with an idea of shipping our own python version with everything preinstalled in that package, everyone might have been familiar with Virtual Environment or Anaconda but believe me after reading this you would get something new to learn.

Before we proceed it’s require to understand the basic structure of python:

├── bin
│ ├── activate
│ ├── activate.csh
│ ├── activate.fish
│ ├── activate_this.py
│ ├── easy_install
│ ├── easy_install-3.6
│ ├── pip
│ ├── pip3
│ ├── pip3.6
│ ├── python
│ ├── python-config
│ ├── python3 -> python
│ ├── python3.6 -> python
│ └── wheel
├── include
│ └── python3.6m -> /usr/include/python3.6m
├── lib
│ └── python3.6
| ├── site-packages
│ ├── lib-dynload -> /usr/lib/python3.6/lib-dynload [Dynamic Library]

Environment Variables:

PYSPARK_PYTHON : Points to the executable python file: bin/python

LD_LIBRARY_PATH : Points to the dynamic library path: lib/python3.6/lib-dynload [All .so* files]

PYTHONPATH : Points to the installed packages within virtual environment as well as the dynamic library path : lib/python3.6/site-packages<CPS>lib/python3.6/lib-dynload [All .py files]

PYTHONHOME : Points to the python library path: lib/python3.6/site-packages

Steps to build Virtual environment:

  1. Install python in the machine of desired version.
  2. Create Virtual Env
    virtualenv env -/usr/local/bin/python3
  3. Activate Virtual Env
    source env/bin/activate
  4. Install requirements
    pip install numpy
  5. Now here is the trick, you can see this line ├── lib-dynload -> /usr/lib/python3.6/lib-dynload it’s a symbolic link and pointing to the local machine path and hence even if you just zip this virtual environment folder then these dependencies would be missing on the cluster.
  6. So, it’s required to copy all the .so* files from /usr/lib/python3.6/lib-dynload, /usr/lib64/*.so.*, etc… to lib/python3.6/lib-dynload  [Be careful about  /usr/lib64/*.so.*, it does contain os specific libs, which may fail on different os versions, hence try to avoid so files from this specific folder].
  7. Copy all the .pyfiles from /usr/lib/python3.6/lib-dynload, /usr/lib64/*.so.*, etc… to lib/python3.6/site-packages.
  8. Run it from the home dir of virtual environment in our case it’s env/

    Prepare zip
    zip -rq ../venv.zip *

Environmental variable setup

For driver: spark.yarn.appMasterEnv.[Environment variable]

For executor: spark.executorEnv.[Environment variable]

PYSPARK_PYTHON

  1. pyspark.spark.yarn.appMasterEnv.PYSPARK_PYTHON = venv/bin/python
  2. pyspark.spark.executorEnv.PYSPARK_PYTHON = venv/bin/python

PYTHONHOME

  1. pyspark.spark.yarn.appMasterEnv.PYTHONHOME = venv/lib64/python3.6/site-packages
  2. pyspark.spark.executorEnv.PYTHONHOME = venv/lib64/python3.6/site-packages

LD_LIBRARY_PATH

  1. pyspark.spark.yarn.appMasterEnv.LD_LIBRARY_PATH = venv/lib64/python3.6/lib-dynload
  2. pyspark.spark.executorEnv.LD_LIBRARY_PATH = venv/lib64/python3.6/lib-dynload

PYTHONPATH

This need to included in YARN-ENV-ENTRIES, it’s not getting set from the spark configs.

PYTHONPATH = {{PWD}}/__venv__.zip<CPS>{{PWD}}/__py4j-0.10.7-src__.zip<CPS>venv/lib64/python3.6/site-packages<CPS>venv/lib64/python3.6/lib-dynload<CPS>

To run python
cd venv
export PYTHONPATH=lib64/python3.6/site-packages:lib64/python3.6/lib-dynload/
export LD_LIBRARY_PATH=lib64/python3.6/lib-dynload
source bin/activate

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Create a website or blog at WordPress.com

Up ↑

%d bloggers like this: