How to Use Feast Feature Store with Vultr Managed Database for Caching
Introduction
Feast is an open-source feature store that enables efficient management and serving of machine learning (ML) features for real-time applications. It provides a unified interface for storing, discovering, and accessing features, which are the individual measurable properties or characteristics of the data used for ML modeling. Feast follows a distributed architecture that consists of several components working together. These include the Feast Registry, Stream Processor, Batch Materialization Engine, and Stores.
Feast supports offline and online stores. While an offline store works with historical time-series feature values that are stored in data sources, Feast uses online stores to serve features at low latency. Feature values are loaded from data sources into the online store through materialization, which can be triggered through the materialize
command.
One of the supported online stores in Feast is Redis®, which is an open-source, in-memory data structure store. This article explains how to use a Vultr Managed Database for Caching as an online feature store for Feast.
Advantages of Redis® as an Online Feature Store
High latency can harm model performance and the overall user experience. One of the crucial factors in the success of a feature store is the ability to serve features at low latency. Using Redis® as an online feature store attracts several advantages such as:
- Elimination of the need for disk I/O operations that can introduce delays.
- Features can be retrieved and served quickly, resulting in faster response times.
- Machine learning models can offer efficient and timely predictions.
- Data is stored directly in-memory instead of the on-disk saving server resources and improving the overall processing times.
Prerequisites
To follow the instructions in this article, make sure you:
- Deploy a Vultr Managed Database for Caching.
When deployed, copy your Vultr Managed Database for Caching instance connection information, and take note of the
host
,password
, andport
to establish a connection to the database. - Deploy a Ubuntu 22.04 Management server on Vultr.
- Use SSH to access the server as a non-root sudo user.
- Update the server packages.
Using Vultr Managed Database for Caching as an Online Feature Store for Feast
Install Dependecies
To successfully connect to a Vultr Managed Database for Caching and install Feast, you need to set up Python, Redis® CLI, and install the Feast SDK as described in this section.
Install Python
3.10
on the server.$ sudo apt-get install python3.10
Install the
Pip3
Python package manager.$ sudo apt-get -y install python3-pip
Install the Redis® CLI tool.
$ sudo apt-get install redis
Install the Feast SDK and CLI.
$ pip install feast
To use Redis® as the online store, install the
redis
dependency.$ pip install 'feast[redis]'
Create a Feature Repository
Using Feast, bootstrap a new feature repository.
$ feast init feast_vultr_redis
Output:
Creating a new Feast repository in <full path to your directory>
Switch to the newly added directory.
$ cd feast_vultr_redis/feature_repo
Using a text editor such as
Nano
, edit thefeast_vultr_redis/feature_repo/feature_store.yaml
file.$ nano feast_vultr_redis/feature_repo/feature_store.yaml
Add the following contents to the file. Replace
VULTR_REDIS_HOST
,VULTR_REDIS_PORT
, andVULTR_REDIS_PASSWORD
with your actual database details.project: feast_vultr_redis registry: data/registry.db provider: local online_store: type: redis connection_string: "VULTR_REDIS_HOST:VULTR_REDIS_PORT,ssl=true,password=VULTR_REDIS_PASSWORD"
Save and close the file.
Register Feature Definitions and Deploy a Feature Store
To register feature definitions, run the following command.
$ feast apply
The apply
command scans Python files in the current directory (example_repo.py
in this case) for feature view/entity definitions, registers the objects, and deploys infrastructure.
When successful, your output should look like the one below.
....
Created entity driver
Created feature view driver_hourly_stats_fresh
Created feature view driver_hourly_stats
Created on demand feature view transformed_conv_rate
Created on demand feature view transformed_conv_rate_fresh
Created feature service driver_activity_v1
Created feature service driver_activity_v3
Created feature service driver_activity_v2
Generate Training Data
Create a new file
generate_training_data.py
.$ nano `generate_training_data.py`
Add the following code to the file.
from datetime import datetime import pandas as pd from feast import FeatureStore entity_df = pd.DataFrame.from_dict( { # entity's join key -> entity values "driver_id": [1001, 1002, 1003], # "event_timestamp" (reserved key) -> timestamps "event_timestamp": [ datetime(2021, 4, 12, 10, 59, 42), datetime(2021, 4, 12, 8, 12, 10), datetime(2021, 4, 12, 16, 40, 26), ], # (optional) label name -> label values. Feast does not process these "label_driver_reported_satisfaction": [1, 5, 3], # values we're using for an on-demand transformation "val_to_add": [1, 2, 3], "val_to_add_2": [10, 20, 30], } ) store = FeatureStore(repo_path=".") training_df = store.get_historical_features( entity_df=entity_df, features=[ "driver_hourly_stats:conv_rate", "driver_hourly_stats:acc_rate", "driver_hourly_stats:avg_daily_trips", "transformed_conv_rate:conv_rate_plus_val1", "transformed_conv_rate:conv_rate_plus_val2", ], ).to_df() print("----- Feature schema -----\n") print(training_df.info()) print() print("----- Example features -----\n") print(training_df.head())
Save and close the file.
Generate training data.
$ python3 generate_training_data.py
Load Batch Features to your Online Store
Serialize the latest values of features to prepare for serving:
$ CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S") &&\ feast materialize-incremental $CURRENT_TIME
When feature data is stored using Redis® as the online store, Feast uses it as a two-level map with the help of Redis® Hashes. The first level of the map contains the Feast project name and entity key. The entity key is composed of entity names and values. The second level key (in Redis® terminology, this is the "field" in a Redis® Hash) contains the feature table name and the feature name, and the Redis® Hash value contains the feature value.
In a new terminal window, paste your Vultr Managed Database for Caching connection string to establish a connection to the database.
$ redis-cli -u rediss://default:[DATABASE_PASSWORD]@[DATABASE_HOST]:[DATABASE_PORT]
Replace
DATABASE_PASSWORD
,DATABASE_HOST
, andDATABASE_PORT
with your actual Vultr Managed Database for caching values.When connected, your shell prompt changes to
>
. Run the following command to view all stored keys.keys "*"
Your output should look like the one below:
1) "\x02\x00\x00\x00driver_id\x04\x00\x00\x00\x04\x00\x00\x00\xed\x03\x00\x00feast_vultr_redis" 2) "\x02\x00\x00\x00driver_id\x04\x00\x00\x00\x04\x00\x00\x00\xec\x03\x00\x00feast_vultr_redis" 3) "\x02\x00\x00\x00driver_id\x04\x00\x00\x00\x04\x00\x00\x00\xeb\x03\x00\x00feast_vultr_redis" 4) "\x02\x00\x00\x00driver_id\x04\x00\x00\x00\x04\x00\x00\x00\xe9\x03\x00\x00feast_vultr_redis" 5) "\x02\x00\x00\x00driver_id\x04\x00\x00\x00\x04\x00\x00\x00\xea\x03\x00\x00feast_vultr_redis"
Check the Redis® data type:
> type "\x02\x00\x00\x00driver_id\x04\x00\x00\x00\x04\x00\x00\x00\xed\x03\x00\x00feast_vultr_redis"
Output:
hash
Verify the contents of the
hash
.> hgetall "\x02\x00\x00\x00driver_id\x04\x00\x00\x00\x04\x00\x00\x00\xed\x03\x00\x00feast_vultr_redis"
Your output should look like the one below.
1) "_ts:driver_hourly_stats" 2) "\b\xd0\xa4\xb5\xa5\x06" 3) "a`\xe3\xda" 4) "5\xf20Q?" 5) "\xfa^X\xad" 6) "5\x83\x7f\xcb>"
Fetch Feature Vectors for Inference
At inference time, you can read the latest feature values for different drivers from the online feature store using get_online_features()
. In this section, fetch feature vectors for inference as described below.
Create a new
fetch_feature_vectors.py
file.$ nano `fetch_feature_vectors.py`
Add the following code to the file.
from pprint import pprint from feast import FeatureStore store = FeatureStore(repo_path=".") feature_vector = store.get_online_features( features=[ "driver_hourly_stats:conv_rate", "driver_hourly_stats:acc_rate", "driver_hourly_stats:avg_daily_trips", ], entity_rows=[ # {join_key: entity_value} {"driver_id": 1004}, {"driver_id": 1005}, ], ).to_dict() pprint(feature_vector)
Save and close the file.
Fetch feature vectors, run:
$ python3 fetch_feature_vectors.py
Your output should look like the one below.
{ 'acc_rate': [0.1056235060095787, 0.7656288146972656], 'avg_daily_trips': [521, 45], 'conv_rate': [0.24400927126407623, 0.48361605405807495], 'driver_id': [1004, 1005] }
Conclusion
In this article, you used Feast for feature retrieval, and discovered why Redis® is a good fit using a Vultr Managed Database for Caching as the online store. For more information about Feast, visit the official documentation.