MLflow Integration

MLflow is a popular open source solution for managing all aspects of the machine learning lifecycle. The platform encompasses four components:

MLflow Tracking to record code, data, configuration, and results of ML experiments
MLflow Projects to package data science code in a format that allows it to run reproducibly in different environments
MLflow Models to deploy ML models in different environments
MLflow Model Registry to store and manage ML models in a central repository

To learn more about MLflow and its capabilities, see the MLflow documentation.

Reporting Anovos data to MLflow Tracking

Anovos integrates with MLflow by reporting workflow metadata and results to MLflow Tracking.

To track your workflows with MLflow, add an mlflow block to your workflow configuration file:

mlflow:
  experiment: "Anovos"                   # The name of the MLflow experiment associated with your workflow
  tracking_uri: "http://127.0.0.1:8889"  # The URL of the MLflow Tracking server
  track_output: True                     # Store the workflow output (i.e., resulting dataset(s))
  track_reports: True                    # Store the generated reports
  track_intermediates: False             # Store any intermediate data generated by your workflow

Current Limitations

It is currently not possible to select which intermediate outputs are stored. If track_intermediate is set to True, all intermediate outputs will be stored.

Using MLflow on Azure Databricks

If you are running Anovos workloads on Azure Databricks, you can use the integrated Managed MLflow to track your Anovos runs and artifacts.

To learn more about moving your Anovos workloads to Azure Databricks, see the 📖 Setting up Anovos on Azure Databricks guide.

To track an Anovos workflow with Managed MLflow, you first need to create a new MLflow experiment. This is possible either through the Databricks Machine Learning UI or the MLflow API. Please refer to the Azure Databricks documentation for detailed and up-to-date instructions.

Once you have created an experiment for your workflow, you can then use its "Location" as the experiment_name in the Anovos workflow configuration's mlflow config block. The tracking_uri needs to be set to databricks.

🤓 Example:

mlflow:
  experiment: "/Users/your_user_name@your_domain.tld/your_experiment_name"
  tracking_uri: "databricks"
  track_output: True                     # Store the workflow output (i.e., resulting dataset(s))
  track_reports: True                    # Store the generated reports
  track_intermediates: False             # Store any intermediate data generated by your workflow

Roadmap

We're exploring integration of Anovos with MLflow Projects and MLFlow Pipelines. Let us know which capabilities you'd like to see in future versions of Anovos!