Skip to content

Setting up Anovos locally

💿 Software Prerequisites

The current Beta release of Anovos requires Spark, Python, and Java to be set up. We test for and officially support the following combinations:

The following tutorials can be helpful in setting up Apache Spark:

💡 For the foreseeable future, _Anovos will support Spark 2.4.x, 3.1.x, and 3.2.x. _To see which precise combinations we're currently testing, see this workflow configuration.

Installing Anovos

Anovos can be installed and used in one of two ways:

  • Cloning the GitHub repository and running via spark-submit.
  • Installing through pip and importing it into your own Python scripts.

Clone the GitHub repository to use Anovos with spark-submit

Clone the Anovos repository to your local environment using the command:

git clone

For production use, you'll always want to clone a specific version, e.g.,

git clone -b v0.1.0 --depth 1
to just get the code for version 0.1.0.

Afterwards, go to the newly created anovos directory and execute the following command to clean and build the latest modules:

make clean build

Next, install Anovos' dependencies by running

pip install -r requirements.txt

and go to the dist/ folder. There, you should

  • Update the input and output paths in configs.yaml and configure the data set. You might also want to adapt the threshold settings to your needs.

  • Adapt the sample script. It demonstrates how different functions from Anovos can be stitched together to create a workflow.

  • If necessary, update This is the shell script used to run the Spark application via spark-submit.

Once everything is configured, you can start your workflow run using the aforementioned script:

nohup ./ > run.txt &

While the job is running, you can check the logs written to stdout using

tail -f run.txt

Once the run completes, the script will attempt to automatically open the final report (report_stats/ml_anovos_report.html) in your web browser.

🐍 Install through pip to use Anovos within your Python applications

To install Anovos, simply run

pip install anovos

Then, you can import Anovos as a module into your Python applications using

import anovos

To trigger Spark workloads from Python, you have to ensure that the necessary external packages are included in the SparkSession.

For this, you can either use the pre-configured SparkSession provided by Anovos:

from anovos.shared.spark import spark

If you need to use your own custom SparkSession, make sure to include the following dependencies: