Setting up Anovos locally
💿 Software Prerequisites
The current Beta release of Anovos requires Spark, Python, and Java to be set up. We test for and officially support the following combinations:
The following tutorials can be helpful in setting up Apache Spark:
💡 For the foreseeable future, _Anovos will support Spark 2.4.x, 3.1.x, and 3.2.x. _To see which precise combinations we're currently testing, see this workflow configuration.
Anovos can be installed and used in one of two ways:
- Cloning the GitHub repository and running via
- Installing through
pipand importing it into your own Python scripts.
Clone the GitHub repository to use Anovos with
Clone the Anovos repository to your local environment using the command:
For production use, you'll always want to clone a specific version, e.g.,
Afterwards, go to the newly created
anovos directory and execute the following command to clean and build
the latest modules:
Next, install Anovos' dependencies by running
and go to the
dist/ folder. There, you should
Update the input and output paths in
configs.yamland configure the data set. You might also want to adapt the threshold settings to your needs.
main.pysample script. It demonstrates how different functions from Anovos can be stitched together to create a workflow.
If necessary, update
spark-submit.sh. This is the shell script used to run the Spark application via
Once everything is configured, you can start your workflow run using the aforementioned script:
While the job is running, you can check the logs written to
Once the run completes, the script will attempt to automatically open the final report
report_stats/ml_anovos_report.html) in your web browser.
🐍 Install through
pip to use Anovos within your Python applications
To install Anovos, simply run
Then, you can import Anovos as a module into your Python applications using
To trigger Spark workloads from Python, you have to ensure that the necessary external packages
are included in the
For this, you can either use the pre-configured
SparkSession provided by Anovos:
If you need to use your own custom
SparkSession, make sure to include the following dependencies: