MLflow Integration
MLflow is a popular open source solution for managing all aspects of the machine learning lifecycle. The platform encompasses four components:
- MLflow Tracking to record code, data, configuration, and results of ML experiments
- MLflow Projects to package data science code in a format that allows it to run reproducibly in different environments
- MLflow Models to deploy ML models in different environments
- MLflow Model Registry to store and manage ML models in a central repository
To learn more about MLflow and its capabilities, see the MLflow documentation.
Reporting Anovos data to MLflow Tracking
Anovos integrates with MLflow by reporting workflow metadata and results to MLflow Tracking.
To track your workflows with MLflow, add an mlflow
block to your workflow configuration file:
mlflow:
experiment: "Anovos" # The name of the MLflow experiment associated with your workflow
tracking_uri: "http://127.0.0.1:8889" # The URL of the MLflow Tracking server
track_output: True # Store the workflow output (i.e., resulting dataset(s))
track_reports: True # Store the generated reports
track_intermediates: False # Store any intermediate data generated by your workflow
Current Limitations
It is currently not possible to select which intermediate outputs are stored.
If track_intermediate
is set to True
, all intermediate outputs will be stored.
Using MLflow on Azure Databricks
If you are running Anovos workloads on Azure Databricks, you can use the integrated Managed MLflow to track your Anovos runs and artifacts.
To learn more about moving your Anovos workloads to Azure Databricks, see the 📖 Setting up Anovos on Azure Databricks guide.
To track an Anovos workflow with Managed MLflow, you first need to create a new MLflow experiment. This is possible either through the Databricks Machine Learning UI or the MLflow API. Please refer to the Azure Databricks documentation for detailed and up-to-date instructions.
Once you have created an experiment for your workflow, you can then use its "Location" as the experiment_name
in the Anovos workflow configuration's mlflow
config block.
The tracking_uri
needs to be set to databricks
.
🤓 Example:
mlflow:
experiment: "/Users/your_user_name@your_domain.tld/your_experiment_name"
tracking_uri: "databricks"
track_output: True # Store the workflow output (i.e., resulting dataset(s))
track_reports: True # Store the generated reports
track_intermediates: False # Store any intermediate data generated by your workflow
Roadmap
We're exploring integration of Anovos with MLflow Projects and MLFlow Pipelines. Let us know which capabilities you'd like to see in future versions of Anovos!