Install¶
Woodwork is available for Python 3.7, 3.8 and 3.9. It can be installed from PyPi, conda-forge, or from source.
To install Woodwork, run the following command:
$ python -m pip install woodwork
$ conda install -c conda-forge woodwork
Add-ons¶
Woodwork allows users to install add-ons. Woodwork allows users to install add-ons individually or all at once:
Hint
Be sure to install Scala and Spark if you want to use Koalas
$ python -m pip install "woodwork[complete]"
$ python -m pip install "woodwork[dask]"
$ python -m pip install "woodwork[koalas]"
$ python -m pip install "woodwork[update_checker]"
$ conda install -c conda-forge dask koalas pyspark alteryx-open-src-update-checker
$ conda install -c conda-forge dask
$ conda install -c conda-forge koalas pyspark
$ conda install -c conda-forge alteryx-open-src-update-checker
Dask: Use Woodwork with Dask DataFrames
Koalas: Use Woodwork with Koalas DataFrames
Update Checker: Receive automatic notifications of new Woodwork releases
Source¶
To install Woodwork from source, clone the repository from Github, and install the dependencies.
Hint
Be sure to install Scala and Spark if you want to run all unit tests
git clone https://github.com/alteryx/woodwork.git
cd woodwork
python -m pip install .
Scala and Spark¶
$ brew tap AdoptOpenJDK/openjdk
$ brew install --cask adoptopenjdk11
$ brew install scala apache-spark
$ echo 'export JAVA_HOME=$(/usr/libexec/java_home)' >> ~/.zshrc
$ echo 'export PATH="/usr/local/opt/openjdk@11/bin:$PATH"' >> ~/.zshrc
$ sudo apt install openjdk-11-jre openjdk-11-jdk scala -y
$ echo "export SPARK_HOME=/opt/spark" >> ~/.profile
$ echo "export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin" >> ~/.profile
$ echo "export PYSPARK_PYTHON=/usr/bin/python3" >> ~/.profile
$ sudo amazon-linux-extras install java-openjdk11 scala -y
$ amazon-linux-extras enable java-openjdk11
Optional Python Dependencies¶
Woodwork has several other Python dependencies that are used only for specific methods. Attempting to use one of these methods without having the necessary library installed will result in an ImportError
with instructions on how to install the necessary dependency.
Dependency |
Min Version |
Notes |
---|---|---|
boto3 |
1.10.45 |
Required to read/write to URLs and S3 |
smart_open |
5.0.0 |
Required to read/write to URLs and S3 |
pyarrow |
4.0.1 |
Required to serialize to parquet |
dask[distributed] |
2021.10.0 |
Required to use with Dask DataFrames |
koalas |
1.8.0 |
Required to use with Koalas DataFrames |
pyspark |
3.0.0 |
Required to use with Koalas DataFrames |
Development¶
To make contributions to the codebase, please follow the guidelines here.