How to Use Feast Feature Store with Vultr Managed Database for Caching

Updated on June 22, 2024
How to Use Feast Feature Store with Vultr Managed Database for Caching header image

Introduction

Feast is an open-source feature store that enables efficient management and serving of machine learning (ML) features for real-time applications. It provides a unified interface for storing, discovering, and accessing features, which are the individual measurable properties or characteristics of the data used for ML modeling. Feast follows a distributed architecture that consists of several components working together. These include the Feast Registry, Stream Processor, Batch Materialization Engine, and Stores.

Feast supports offline and online stores. While an offline store works with historical time-series feature values that are stored in data sources, Feast uses online stores to serve features at low latency. Feature values are loaded from data sources into the online store through materialization, which can be triggered through the materialize command.

One of the supported online stores in Feast is Redis®, which is an open-source, in-memory data structure store. This article explains how to use a Vultr Managed Database for Caching as an online feature store for Feast.

Advantages of Redis® as an Online Feature Store

High latency can harm model performance and the overall user experience. One of the crucial factors in the success of a feature store is the ability to serve features at low latency. Using Redis® as an online feature store attracts several advantages such as:

  • Elimination of the need for disk I/O operations that can introduce delays.
  • Features can be retrieved and served quickly, resulting in faster response times.
  • Machine learning models can offer efficient and timely predictions.
  • Data is stored directly in-memory instead of the on-disk saving server resources and improving the overall processing times.

Prerequisites

To follow the instructions in this article, make sure you:

  • Deploy a Vultr Managed Database for Caching.

    When deployed, copy your Vultr Managed Database for Caching instance connection information, and take note of the host, password, and port to establish a connection to the database.

  • Deploy a Ubuntu 22.04 Management server on Vultr.
  • Use SSH to access the server as a non-root sudo user.
  • Update the server packages.

Using Vultr Managed Database for Caching as an Online Feature Store for Feast

Install Dependecies

To successfully connect to a Vultr Managed Database for Caching and install Feast, you need to set up Python, Redis® CLI, and install the Feast SDK as described in this section.

  1. Install Python 3.10 on the server.

     $ sudo apt-get install python3.10
  2. Install the Pip3 Python package manager.

     $ sudo apt-get -y install python3-pip
  3. Install the Redis® CLI tool.

     $ sudo apt-get install redis
  4. Install the Feast SDK and CLI.

     $ pip install feast
  5. To use Redis® as the online store, install the redis dependency.

     $ pip install 'feast[redis]'

Create a Feature Repository

  1. Using Feast, bootstrap a new feature repository.

     $ feast init feast_vultr_redis 

    Output:

     Creating a new Feast repository in <full path to your directory>
  2. Switch to the newly added directory.

     $ cd feast_vultr_redis/feature_repo
  3. Using a text editor such as Nano, edit the feast_vultr_redis/feature_repo/feature_store.yaml file.

     $ nano feast_vultr_redis/feature_repo/feature_store.yaml
  4. Add the following contents to the file. Replace VULTR_REDIS_HOST, VULTR_REDIS_PORT, and VULTR_REDIS_PASSWORD with your actual database details.

     project: feast_vultr_redis
     registry: data/registry.db
     provider: local
     online_store:
       type: redis
       connection_string: "VULTR_REDIS_HOST:VULTR_REDIS_PORT,ssl=true,password=VULTR_REDIS_PASSWORD"

    Save and close the file.

Register Feature Definitions and Deploy a Feature Store

To register feature definitions, run the following command.

$ feast apply

The apply command scans Python files in the current directory (example_repo.py in this case) for feature view/entity definitions, registers the objects, and deploys infrastructure.

When successful, your output should look like the one below.

....
Created entity driver
Created feature view driver_hourly_stats_fresh
Created feature view driver_hourly_stats
Created on demand feature view transformed_conv_rate
Created on demand feature view transformed_conv_rate_fresh
Created feature service driver_activity_v1
Created feature service driver_activity_v3
Created feature service driver_activity_v2

Generate Training Data

  1. Create a new file generate_training_data.py.

     $ nano `generate_training_data.py`
  2. Add the following code to the file.

     from datetime import datetime
     import pandas as pd
    
     from feast import FeatureStore
    
     entity_df = pd.DataFrame.from_dict(
         {
             # entity's join key -> entity values
             "driver_id": [1001, 1002, 1003],
             # "event_timestamp" (reserved key) -> timestamps
             "event_timestamp": [
                 datetime(2021, 4, 12, 10, 59, 42),
                 datetime(2021, 4, 12, 8, 12, 10),
                 datetime(2021, 4, 12, 16, 40, 26),
             ],
             # (optional) label name -> label values. Feast does not process these
             "label_driver_reported_satisfaction": [1, 5, 3],
             # values we're using for an on-demand transformation
             "val_to_add": [1, 2, 3],
             "val_to_add_2": [10, 20, 30],
         }
     )
    
     store = FeatureStore(repo_path=".")
    
     training_df = store.get_historical_features(
         entity_df=entity_df,
         features=[
             "driver_hourly_stats:conv_rate",
             "driver_hourly_stats:acc_rate",
             "driver_hourly_stats:avg_daily_trips",
             "transformed_conv_rate:conv_rate_plus_val1",
             "transformed_conv_rate:conv_rate_plus_val2",
         ],
     ).to_df()
    
     print("----- Feature schema -----\n")
     print(training_df.info())
    
     print()
     print("----- Example features -----\n")
     print(training_df.head())

    Save and close the file.

  3. Generate training data.

     $ python3 generate_training_data.py

Load Batch Features to your Online Store

  1. Serialize the latest values of features to prepare for serving:

     $ CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S") &&\ feast materialize-incremental $CURRENT_TIME

    When feature data is stored using Redis® as the online store, Feast uses it as a two-level map with the help of Redis® Hashes. The first level of the map contains the Feast project name and entity key. The entity key is composed of entity names and values. The second level key (in Redis® terminology, this is the "field" in a Redis® Hash) contains the feature table name and the feature name, and the Redis® Hash value contains the feature value.

  2. In a new terminal window, paste your Vultr Managed Database for Caching connection string to establish a connection to the database.

     $ redis-cli -u rediss://default:[DATABASE_PASSWORD]@[DATABASE_HOST]:[DATABASE_PORT]

    Replace DATABASE_PASSWORD, DATABASE_HOST, and DATABASE_PORT with your actual Vultr Managed Database for caching values.

  3. When connected, your shell prompt changes to >. Run the following command to view all stored keys.

    keys "*"

    Your output should look like the one below:

     1) "\x02\x00\x00\x00driver_id\x04\x00\x00\x00\x04\x00\x00\x00\xed\x03\x00\x00feast_vultr_redis"
     2) "\x02\x00\x00\x00driver_id\x04\x00\x00\x00\x04\x00\x00\x00\xec\x03\x00\x00feast_vultr_redis"
     3) "\x02\x00\x00\x00driver_id\x04\x00\x00\x00\x04\x00\x00\x00\xeb\x03\x00\x00feast_vultr_redis"
     4) "\x02\x00\x00\x00driver_id\x04\x00\x00\x00\x04\x00\x00\x00\xe9\x03\x00\x00feast_vultr_redis"
     5) "\x02\x00\x00\x00driver_id\x04\x00\x00\x00\x04\x00\x00\x00\xea\x03\x00\x00feast_vultr_redis"
  4. Check the Redis® data type:

     > type "\x02\x00\x00\x00driver_id\x04\x00\x00\x00\x04\x00\x00\x00\xed\x03\x00\x00feast_vultr_redis"

    Output:

     hash
  5. Verify the contents of the hash.

     > hgetall "\x02\x00\x00\x00driver_id\x04\x00\x00\x00\x04\x00\x00\x00\xed\x03\x00\x00feast_vultr_redis"

    Your output should look like the one below.

     1) "_ts:driver_hourly_stats"
     2) "\b\xd0\xa4\xb5\xa5\x06"
     3) "a`\xe3\xda"
     4) "5\xf20Q?"
     5) "\xfa^X\xad"
     6) "5\x83\x7f\xcb>"

Fetch Feature Vectors for Inference

At inference time, you can read the latest feature values for different drivers from the online feature store using get_online_features(). In this section, fetch feature vectors for inference as described below.

  1. Create a new fetch_feature_vectors.py file.

     $ nano `fetch_feature_vectors.py`
  2. Add the following code to the file.

     from pprint import pprint
     from feast import FeatureStore
    
     store = FeatureStore(repo_path=".")
    
     feature_vector = store.get_online_features(
         features=[
             "driver_hourly_stats:conv_rate",
             "driver_hourly_stats:acc_rate",
             "driver_hourly_stats:avg_daily_trips",
         ],
         entity_rows=[
             # {join_key: entity_value}
             {"driver_id": 1004},
             {"driver_id": 1005},
         ],
     ).to_dict()
    
     pprint(feature_vector)

    Save and close the file.

  3. Fetch feature vectors, run:

     $ python3 fetch_feature_vectors.py

    Your output should look like the one below.

     {
         'acc_rate': [0.1056235060095787, 0.7656288146972656],
         'avg_daily_trips': [521, 45],
         'conv_rate': [0.24400927126407623, 0.48361605405807495],
         'driver_id': [1004, 1005]
     }

Conclusion

In this article, you used Feast for feature retrieval, and discovered why Redis® is a good fit using a Vultr Managed Database for Caching as the online store. For more information about Feast, visit the official documentation.