Skip to content

Feature Store Integration

Feature stores are an essential building block of a modern MLOps setup. For an introduction into the concept and an overview of available options and vendors, see the Feature Store Comparison & Evaluation on the MLOps Community website.

Anovos provides integration with Feast, a widely used open source feature store, out of the box. Using the same abstractions, it is straightforward to integrate Anovos with other feature stores.

If there is a particular feature store integration you'd like to see supported by Anovos, let us know!

Using Anovos with Feast

The following guide describes how to use Anovos to push data to Feast. We assume that you are familiar with the fundamentals of both Anovos workflows and Feast.

For an introduction to Feast, see 📖 the Feast Quickstart guide.

Prerequisites

In order to use Anovos with Feast, you need to install it:

pip install feast

Next, we'll instantiate a new Feast repository:

feast init anovos_repo

🤓 Note: You can also use an existing repository. In this case, Anovos will simply add a new file anovos.py containing the feature definitions as well as the output file to the existing repository.

Adding the Feast export to your Anovos workflow

To export data to Feat at the end of a workflow run, you need to add the write_feast_features block to the configuration file. (To learn more about the configuration file in general and available options, see 📖 the configuration file documentation.)

You can use the following template as a starting point:

write_feast_features:
  file_path: "../anovos_repo/"                     # the location of your Feast repository
  entity:
    name: "income"                                 # the Feast entity
    description: "this entity is a ...."           # the entity description used by Feast
    id_col: 'ifa'                                  # the primary key column to identify this entity by
  file_source:
    description: 'data source description'         # the data source description used by Feast
    owner: "me@business.com"                       # the data source owner registered in Feast
    timestamp_col: 'event_time'                    # the name of the logical timestamp at which the feature was observed
    create_timestamp_col: 'create_time_col'        # the name of the physical timestamp (wallclock time)
                                                   # of when the feature value was computed
  feature_view:
    name: 'income_view'                            # the name of the generated feature view
    owner: 'view@owner.com'                        # the view owner registered in Feast
    ttl_in_seconds: 36000000                       # the time to live in seconds for features in this view.
                                                   # Feast will use this value to look backwards when performing
                                                   # point-in-time joins
  service_name: 'income_feature_service'           # the name of the feature service generated by the workflow

Let's break this down!

The following block generates an entity definition in Feast. This block and all its child elements are mandatory. the name elements specifies the entity name. The description element provides a human-readable description to be displayed in the Feast UI. The element id_col specifies the primary key of the entity.

entity:
    name: "income"
    description: "this entity is a ...."
    id_col: 'ifa'

The subsequent block generates a file source definition in Feast. This block and all its children are mandatory.

The owner element describes the owner of the file data source in the shape of an email address. The two elements timestamp_col and create_timestamp_col refer to timestamped columns used when retrieving data.

file_source:
    description: 'data source description'
    owner: "me@business.com"
    timestamp_col: 'event_time'
    create_timestamp_col: 'create_time_col'

The next block generates a feature view definition in Feast. This block and all its children are mandatory.

feature_view:
    name: 'income_view'
    owner: 'view@owner.com'
    ttl_in_seconds: 3600

The following element generates a feature service definition in Feast. This element is optional.

service_name: 'income_feature_service'

Setting repartition value to 1 in write_main

The current version of the feast integration only supports adding single output files to feast repositories. Thus, it is required to set the value of the repartition attribute to 1 in the write_main config.

write_main:
  #... set your output path config values etc here
  file_configs:
    repartition: 1

Exporting data to Feast

First, run your Anovos workflow with the configuration above.

Once the workflow has finished, switch into the folder of anovos_feature repository, apply the changed feature definitions, and materialize the features:

cd anovos_repo
feast apply
feast materialize `date "+%Y-%m-%dT"` `date "+%Y-%m-%dT%H:%M:%S"`

To verify that the features have been loaded correctly, you can check them using Feast's UI. Run

feast ui

and access the Feast UI at http://127.0.0.1:8888. The UI gives a realtime overview about data sources, entities, feature views etc. of the entire feature repository (i.e., across multiple .py files that contain feature definitions).

Retrieve feature data from Feast

The following script shows how to access historical data, e.g., for the purpose of training an ML model. For more information, see the feast documentation on feature retrieval. Documentation on how to specify event_time and its use in point in time joins can be found here.

import datetime
import feast
import pandas as pd

repo_path="./anovos_repo"

store = feast.FeatureStore(repo_path=repo_path)

# ACCESS HISTORICAL FEATURES

# Either read directly from parquet file generated by the Anovos workflow or generated manually
income_entities = pd.DataFrame.from_dict(
    {
        "ifa": [
            "27a",
            "30a",
            "475a",
            "965a",
            "1678a",
            "1698a",
            "1807a",
            "1951a",
            "2041a",
            "2215a",
        ],
        "event_time": [
            datetime.now(),
            datetime.now(),
            datetime.now(),
            datetime.now(),
            datetime.now(),
            datetime.now(),
            datetime.now(),
            datetime.now(),
            datetime.now(),
            datetime.now(),
        ],
    }
)

fs = feast.FeatureStore(repo_path=repo_path)

# Alternative 1: retrieve features via explicit specification
income_features_df = fs.get_historical_features(
    entity_df=income_entities,
    features=[
        "income_view:income",
        "income_view:latent_0",
        "income_view:latent_1",
        "income_view:latent_2",
        "income_view:latent_3",
    ],
).to_df()
print(income_features_df.head())

# Alternative 2: retrieve features using the feature service
feature_service = fs.get_feature_service("income_feature_service")
income_features_by_service_df = fs.get_historical_features(
    features=feature_service, entity_df=income_entities
).to_df()
print(income_features_by_service_df.head())

# Now, you can use the features to train your model

...

Integrating Anovos with other feature stores

We're exploring further support for Feast and the integration of Anovos with other feature stores. Let us know which capabilities you'd like to see in future versions of Anovos!