Setting up Anovos locally
💿 Software Prerequisites
The current Beta release of Anovos requires Spark, Python, and Java to be set up. We test for and officially support the following combinations:
-
Spark 2.4.8, Python 3.7, and Java 8
-
Spark 3.1.3, Python 3.9, and Java 11
-
Spark 3.2.1, Python 3.9, and Java 11
The following tutorials can be helpful in setting up Apache Spark:
💡 For the foreseeable future, _Anovos will support Spark 2.4.x, 3.1.x, and 3.2.x. _To see which precise combinations we're currently testing, see this workflow configuration.
Installing Anovos
Anovos can be installed and used in one of two ways:
- Cloning the GitHub repository and running via
spark-submit
. - Installing through
pip
and importing it into your own Python scripts.
Clone the GitHub repository to use Anovos with spark-submit
Clone the Anovos repository to your local environment using the command:
For production use, you'll always want to clone a specific version, e.g.,
to just get the code for version0.1.0
.
Afterwards, go to the newly created anovos
directory and execute the following command to clean and build
the latest modules:
Next, install Anovos' dependencies by running
and go to the dist/
folder. There, you should
-
Update the input and output paths in
configs.yaml
and configure the data set. You might also want to adapt the threshold settings to your needs. -
Adapt the
main.py
sample script. It demonstrates how different functions from Anovos can be stitched together to create a workflow. -
If necessary, update
spark-submit.sh
. This is the shell script used to run the Spark application viaspark-submit
.
Once everything is configured, you can start your workflow run using the aforementioned script:
While the job is running, you can check the logs written to stdout
using
Once the run completes, the script will attempt to automatically open the final report
(report_stats/ml_anovos_report.html
) in your web browser.
🐍 Install through pip
to use Anovos within your Python applications
To install Anovos, simply run
Then, you can import Anovos as a module into your Python applications using
To trigger Spark workloads from Python, you have to ensure that the necessary external packages
are included in the SparkSession
.
For this, you can either use the pre-configured SparkSession
provided by Anovos:
If you need to use your own custom SparkSession
, make sure to include the following dependencies: