Install#
Woodwork is available for Python 3.8 - 3.11. It can be installed from PyPI, conda-forge, or from source.
To install Woodwork, run the following command:
$ python -m pip install woodwork
$ conda install -c conda-forge woodwork
Add-ons#
Woodwork allows users to install add-ons. Woodwork allows users to install add-ons individually or all at once:
Hint
Be sure to install Scala and Spark
$ python -m pip install "woodwork[complete]"
$ python -m pip install "woodwork[dask]"
$ python -m pip install "woodwork[spark]"
$ python -m pip install "woodwork[updater]"
$ conda install -c conda-forge dask pyspark alteryx-open-src-update-checker
$ conda install -c conda-forge dask
$ conda install -c conda-forge pyspark
$ conda install -c conda-forge alteryx-open-src-update-checker
Dask: Use Woodwork with Dask DataFrames
Spark: Use Woodwork with Spark DataFrames
Update Checker: Receive automatic notifications of new Woodwork releases
Source#
To install Woodwork from source, clone the repository from Github, and install the dependencies.
Hint
Be sure to install Scala and Spark if you want to run all unit tests
git clone https://github.com/alteryx/woodwork.git
cd woodwork
python -m pip install .
Scala and Spark#
$ brew tap AdoptOpenJDK/openjdk
$ brew install --cask adoptopenjdk11
$ brew install scala apache-spark
$ echo 'export JAVA_HOME=$(/usr/libexec/java_home)' >> ~/.zshrc
$ echo 'export PATH="/usr/local/opt/[email protected]/bin:$PATH"' >> ~/.zshrc
$ brew install [email protected] scala apache-spark pandoc
$ echo 'export PATH="/opt/homebrew/opt/[email protected]/bin:$PATH"' >> ~/.zshrc
$ echo 'export CPPFLAGS="-I/opt/homebrew/opt/[email protected]/include:$CPPFLAGS"' >> ~/.zprofile
$ sudo ln -sfn /opt/homebrew/opt/[email protected]/libexec/openjdk.jdk /Library/Java/JavaVirtualMachines/openjdk-11.jdk
$ sudo apt install openjdk-11-jre openjdk-11-jdk scala pandoc -y
$ echo "export SPARK_HOME=/opt/spark" >> ~/.profile
$ echo "export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin" >> ~/.profile
$ echo "export PYSPARK_PYTHON=/usr/bin/python3" >> ~/.profile
$ sudo amazon-linux-extras install java-openjdk11 scala -y
$ amazon-linux-extras enable java-openjdk11
Docker#
It is also possible to run Woodwork inside a Docker container.
You can do so by installing it as a package inside a container (following the normal install guide) or
creating a new image with Woodwork pre-installed, using the following commands in your Dockerfile
:
FROM --platform=linux/x86_64 python:3.8-slim-buster
RUN apt update && apt -y update
RUN apt install -y build-essential
RUN pip3 install --upgrade --quiet pip
RUN pip3 install woodwork
Optional Python Dependencies#
Woodwork has several other Python dependencies that are used only for specific methods. Attempting to use one of these methods without having the necessary library installed will result in an ImportError
with instructions on how to install the necessary dependency.
Dependency |
Min Version |
Notes |
---|---|---|
boto3 |
1.10.45 |
Required to read/write to URLs and S3 |
smart_open |
5.0.0 |
Required to read/write to URLs and S3 |
pyarrow |
4.0.1 |
Required to serialize to parquet |
dask[distributed] |
2021.10.0 |
Required to use with Dask DataFrames |
pyspark |
3.2.0 |
Required to use with Spark DataFrames |
Development#
To make contributions to the codebase, please follow the guidelines here.