Scheduled upgrade from November 26, 07:00 UTC to November 26, 17:00 UTC
Kindly note that during the maintenance window, app.hopsworks.ai will not be accessible.
5
View the Changes
arrow back
Back to Blog
Javier Cabrera
link to linkedin
Software Engineer
Article updated on

Amazon FSx for NetApp ONTAP interoperability test in a Hopsworks 4.x Deployment

January 3, 2025
23 min
Read
Javier Cabrera
Javier Cabreralink to linkedin
Software Engineer
Hopsworks

TL;DR

By following this tutorial you can evaluate how you can get a flexible, cost-effective, topology-aware NVME provisioning without sacrificing performance by using the Amazon FSx for NetApp ONTAP along with Hopsworks 4.x. We improve operational simplicity and more controlled storage management.

Introduction

This blog post describes the usage of Amazon FSx for NetApp ONTAP in a Hopsworks 4.x deployment in Amazon Elastic Kubernetes Service (Amazon EKS) . We connect NVMEs volumes on demand as Kubernetes Persistent Volume Claims (PVC) for the RonDB service. 

Amazon FSx for NetApp ONTAP provides a scalable, highly available, and secure data management platform that integrates well with Kubernetes environments, specially in EKS clusters. By leveraging Amazon FSx for NetApp ONTAP in our Hopsworks 4.x deployment on AWS EKS, we can take advantage of its robust and cost-effective features for persistent storage, such as automated data tiering, snapshotting, and high-performance NVMe volumes.

Key points:

  • Test the interoperability of Amazon FSx for NetApp ONTAP for Hopsworks. In other words, we go step by step in order to connect an EKS to an Amazon FSx for NetApp ONTAP filesystem and then install Hopsworks.
  • Compare the performance of regular EBS gp3 volumes of AWS EKS vs NVME disks provided by the Amazon FSx for NetApp ONTAP.

Results: You can get flexible, topology-aware NVME provisioning without sacrificing performance. The benchmark numbers are essentially unchanged, so the advantage isn’t about speed; it’s about operational simplicity and more controlled storage management.

Requirements

  • kubectl cli tool installed in your machine
  • helm cli tool installed in your machine
  • AWS account

Setting Up the Cluster

The first step involves creating an EKS cluster with nodes that are prepared for NFS or iSCSI.  Notice the preBootstrapCommands in the snippet below, the AmazonFSxFullAccess policy. Place the snippet below into a `cluster_def.yaml` file.

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: eks-netapp-ontap
  region: eu-central-1
  version: "1.29" 

iam:
  withOIDC: true

managedNodeGroups:
  - name: ng-1
    amiFamily: AmazonLinux2
    instanceType: m6i.2xlarge
    minSize: 1
    maxSize: 9
    desiredCapacity: 9
    volumeSize: 256
    ssh:
      allow: true # will use ~/.ssh/id_rsa.pub as the default ssh key
    iam:
      attachPolicyARNs:
        - arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
        - arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
        - arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
        - arn:aws:iam::aws:policy/AmazonS3FullAccess
        # This is needed for FSx NetAPP ONTAP
        - arn:aws:iam::aws:policy/AmazonFSxFullAccess 
      
      withAddonPolicies:
        imageBuilder: true
      
    preBootstrapCommands:
      - "sudo yum install -y nfs-utils"        # For NFS support for ontap-nas
      - "sudo yum install -y iscsi-initiator-utils"  # For iSCSI support for ontap-san
      - "sudo systemctl enable iscsid"         # Enable iSCSI service on startup
      - "sudo systemctl start iscsid"          # Start iSCSI service
      - "sudo yum install nvme-cli -y" # For NVMe support for ontap-san
      - "sudo yum install linux-modules-extra-$(uname -r)"
      - "sudo modprobe nvme-tcp"  


addons:
  - name: aws-ebs-csi-driver
    wellKnownPolicies:      # add IAM and service account
      ebsCSIController: true


Then create the cluster by using the eksctl cli tool.

eksctl create cluster -f cluster_def.yaml

For the sake of benchmarking, we label the nodes as described in the following table.

We set 3 machines out of 9 to be able to provide NVMEs from Amazon FSx for NetApp ONTAP. 4 machines out of 9 are selected to run Hopsworks services. Finally, the remaining 2 machines are used to launch locust benchmarking workers. Notice that the NVME labeled machines are also used for Hopsworks services. Thus, it makes the Hopsworks deployment effective in 7 m6i.2xlarge EKS nodes.

If you prefer to run such labeling automatically, please execute the following script.

# Get nodes in zone eu-central-1b and mark them as nvme
for node in $(kubectl get nodes -o jsonpath='{.items[*].metadata.name}'); do
    # use label topology.kubernetes.io/zone
    zone=$(kubectl get node $node -o jsonpath='{.metadata.labels.topology\.kubernetes\.io/zone}')
    # label is hw-group=nvme 
    if [ "$zone" == "eu-central-1b" ]; then
        kubectl label node $node hw-group=nvme && echo "labeled as nvme"
    else
        kubectl label node $node hw-group=hw && echo "labeled as hw"
    fi
done
# label two of the hw-group=hw nodes and label them as locust
COUNT=0
for node in $(kubectl get nodes -o jsonpath='{.items[*].metadata.name}'); do
    if [ "$COUNT" -eq 2 ]; then
        break
    fi
    hw_group=$(kubectl get node $node -o jsonpath='{.metadata.labels.hw-group}')
    if [ "$hw_group" == "hw" ]; then
        kubectl label node $node hw-group=locust --overwrite && echo "labeled as locust"
        COUNT=$((COUNT+1))
    fi
done


Create the Amazon FSx for NetApp ONTAP File System

Find the FSx category in the AWS portal. Then create an Amazon FSx for NetApp ONTAP file system like the image below shows. Create the file system with 3G/s throughput.

Create Amazon FSx file systems
If the cluster zone is eu-central-1, make sure the preferred network for the filsystem is in eu-central-1b.

When creating the file system, select the VPC of the previously created cluster, that would effectively connect both networks. Thus, making the file system reachable from the EKS cluster nodes. 

After creating the file system (it usually takes near 20 mins), make sure the route table of the file system is injected in the EKS cluster. Select all private subnetworks for all availability zones like the image below shows.

Manage route tables

Checkpoint: If set correctly, check the EKS cluster networking. You should see two more entries in the route table like the image below, one for the manage-FSx-server and one for its virtual machine.

Route tables

Set the credentials for the file system virtual machine. You can do so in the FSx object portal like the image below shows.

Storage virtual machines

The previous credentials are used from the EKS nodes to interact with the file system. This process is done automatically by installing the trident Kubernetes operator. We describe this installation in further steps.

Once you set the credentials in the fsx virtual machine. Create an AWS secret with the same credentials. Use your terminal to run the following code.

aws secretsmanager create-secret --name trident-secret --description "Trident 
CSI credentials" --secret-string 
"{\"username\":\"vsadmin\",\"password\":\"...\"}" --region eu-central-1

Installing the Kubernetes Trident Operator

The Trident Operator from NetApp is a Kubernetes operator that automates the provisioning and management of persistent storage volumes using NetApp storage solutions. It enables seamless integration between Kubernetes workloads and NetApp's storage systems, such as the previously mentioned Amazon FSx for NetApp ONTAP.

To install the trident operator, the first step is to create certain roles and permissions for it in AWS. Create an IAM policy for the trident user. To do so, first create an AWS policy template like the image below as `policy.json`. Notice that there is a placeholder in the snippet below. Copy there the ARN name of the FSX vm credentials that you created in the step above.

{
    "Statement": [
        {
            "Action": [
                "fsx:DescribeFileSystems",
                "fsx:DescribeVolumes",
                "fsx:CreateVolume",
                "fsx:RestoreVolumeFromSnapshot",
                "fsx:DescribeStorageVirtualMachines",
                "fsx:UntagResource",
                "fsx:UpdateVolume",
                "fsx:TagResource",
                "fsx:DeleteVolume"
            ],
            "Effect": "Allow",
            "Resource": "*"
        },
        {
            "Action": "secretsmanager:GetSecretValue",
            "Effect": "Allow",
            "Resource": ""
        }
    ],
    "Version": "2012-10-17"
}

Once you create the policy file. Then run the following code.

aws iam create-policy --policy-name AmazonFSxNCSIDriverPolicy --policy-document 
file://policy.json --description "This policy grants access to Trident CSI 
to FSxN and Secret manager" --region eu-central-1

The previous command outputs the ARN name that needs to be used to link the policy permissions to roles. Save it for the next step.

The next step involves the creation of Kubernetes service account permissions for the trident operator that will be installed. Run the following command with the ARN outputted from the previous step.

 eksctl create iamserviceaccount --name trident-controller --namespace trident 
--cluster eks-netapp-ontap --role-name trident-controller-fsx-role --role-only 
--attach-policy-arn  --approve --region eu-central-1

Install and configure trident from the helm chart. Having the AWS add-on is not recommended because of this issue https://github.com/NetApp/trident/issues/906  and the fact that the AWS add-on installs an old version of the trident operator. Use the snippet below instead.

 helm repo add netapp-trident https://netapp.github.io/trident-helm-chart  
helm install trident  netapp-trident/trident-operator --version 100.2406.1 
--create-namespace --namespace trident

Modify the trident orchestrator Custom Resource Definition (CRD) to work only in nodes in the same availability zone as the FSx. Concretely, we want to only use the nodes labeled as NVME in the very first step of this blog post. Set the fields, controllerPluginNodeSelector, nodePluginNodeSelector, cloudIdentity and cloudProvider. 

First, get the ARN name of the fsx role we created previously

aws iam list-roles | grep fsx-trident

Then, edit the CRD to ensure the following data.

# You can use Lens for a better edition experience
cloudIdentity: "eks.amazonaws.com/role-arn: "
cloudProvider: "AWS" 
controllerPluginNodeSelector:
    topology.kubernetes.io/zone: eu-central-1b
nodePluginNodeSelector:
    topology.kubernetes.io/zone: eu-central-1b

Checkpoint: Check the trident deployment by executing the following command. Since we limit the number of NetAPP nodes, you should see only 3 trident-node deployments.

kubectl get pods -n trident

# 
trident-controller-5cb86cfcc7-84xqd   6/6     Running   0             54s
trident-node-linux-kj8vd              2/2     Running   0             54s
trident-node-linux-qmv9v              1/2     Running   1 (18s ago)   54s
trident-node-linux-zjwbw              1/2     Running   1 (20s ago)   54s
trident-operator-799447549c-cxdrd     1/1     Running   0             94s

Enable Trident Backends and Kubernetes Storage Classes

For trident to provide NVME volumes based on Kubernetes definitions, we need to install the Backend CRDs. Create a file with the snippet below. That creates a storage backend to support ontap-san (NVME) ontap-nas (NFS). The protocol is specified in the storageDriverName

For this PoC we use topology aware backends and storage classes since the FSx is assumed to work in the same availability zone of the nodes.

apiVersion: trident.netapp.io/v1
kind: TridentBackendConfig
metadata:
  name: backend-tbc-ontap-nas
spec:
  version: 1
  storageDriverName: ontap-san # for NVME
  backendName: tbc-ontap-nas
  supportedTopologies:
- topology.kubernetes.io/region: eu-central-1
  topology.kubernetes.io/zone: eu-central-1b
  svm: 
  aws:
    fsxFilesystemID: 
  credentials:
    name: ""
    type: awsarn
---
# FOR NVME
apiVersion: trident.netapp.io/v1
kind: TridentBackendConfig
metadata:
  name: backend-tbc-ontap-san
spec:
  version: 1
  storageDriverName: ontap-san
  backendName: tbc-ontap-san
  supportedTopologies:
  - topology.kubernetes.io/region: eu-central-1
    topology.kubernetes.io/zone: eu-central-1b
  svm: fsx
  sanType: nvme
  useREST: true
  aws:
    fsxFilesystemID: 
  credentials:    
    name: ""
    type: awsarn

apiVersion: trident.netapp.io/v1
kind: TridentBackendConfig
metadata:
  name: backend-tbc-ontap-san-economy2
spec:
  version: 1
  storageDriverName: "ontap-san-economy"
  backendName: tbc-ontap-san-economy2
  svm: fsx
  supportedTopologies:
    - topology.kubernetes.io/region: eu-central-1
      topology.kubernetes.io/zone: eu-central-1b
  aws:
    fsxFilesystemID: 
  
  credentials:    
    name: ""
    type: awsarn


Notice there are two placeholders in the snippet above, <GET THE FILESYSTEM ID FROM THE AWS CONSOLE> and <THE SECRET NAME ARN CREATED FOR ACCESSING THE FSX VM> . The first one you can get from the AWS console portal. The second placeholder can be filled with the secret ARN for the fsx virtual machine we created previously.

After completing the snippet above, run the following command.

kubectl apply -f backend.yaml

Create the Kubernetes Storage Classes 

The last step enables the Kubernetes storage classes to use the trident backends. Create a file with the following snippet.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ontap-gold
provisioner: csi.trident.netapp.io
allowedTopologies:
- matchLabelExpressions:
- key: topology.kubernetes.io/zone
  values:
  - eu-central-1b
- key: topology.kubernetes.io/region
  values:
  - eu-central-1
parameters:
  backendType: "ontap-san"
  media: "ssd"
  provisioningType: "thin"
  snapshots: "true"
---
# FOR NVME
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: protection-gold6
provisioner: csi.trident.netapp.io
volumeBindingMode: WaitForFirstConsumer
allowedTopologies:
- matchLabelExpressions:
  - key: topology.kubernetes.io/zone
    values:
    - eu-central-1b
  - key: topology.kubernetes.io/region
    values:
    - eu-central-1
parameters:
  backendType: "ontap-san"
  fsType: "ext4"

Then add the storage classes to the Kubernetes cluster.

kubectl apply -f storageclass.yaml

Benchmarking

After the EKS cluster and the Amazon FSx for NetApp ONTAP filesystem are deployed we can install and test Hopsworks.

The main character of this benchmark is RonDB. We have two setups for this benchmark. First the volumes used by RonDB are the standard EBS volumes of EKS with 3G/s throughput. Second, we reinstall the cluster and set the NetAPP ONTAP NVMEs to the RonDB volumes.

Installing Hopsworks

To install hopsworks, first create a values.yaml file to set up where the services will be located regarding the node labels we have set previously. We have prepared the following values.yaml file. Notice that we use an S3 bucket named netapp_poc. Ensure such a bucket is created and that it is empty in the AWS portal.

Moreover, the snippet below will be used for both benchmarking scenarios. Notice we commented out the storage classes used by RonDB. To test the NVME, we just uncomment them, reinstall the cluster and run the benchmarks again.

global:
  _hopsworks:
    storageClassName: null
    cloudProvider: "AWS"
    managedDockerRegistery:
      enabled: true
      domain: ".dkr.ecr.eu-central-1.amazonaws.com"
      namespace: "netapp_poc"
      credHelper:
        enabled: true
        secretName: &awsregcred "awsregcred"    

    managedObjectStorage:
      enabled: true
      s3:
        bucket: 
          name: &bucket "netapp-poc"
        region: ®ion "eu-central-1"
        endpoint: &awsendpoint "https://s3-accesspoint.eu-central-1.amazonaws.com"
        secret:
          name: &awscredentialsname "aws-credentials"
          acess_key_id: &awskeyid "access-key-id"
          secret_key_id: &awsaccesskey "secret-access-key"
    minio:
      enabled: false

hopsworks:
  # Not in the same machine as mysql
  nodeSelector:
    hw-group: hw
  variables:
    docker_operations_managed_docker_secrets: *awsregcred
    # We *need* to put awsregcred here because this is the list of
    # Secrets that are copied from hopsworks namespace to Projects namespace
    # during project creation.
    docker_operations_image_pull_secrets: "awsregcred"
  dockerRegistry:
    preset:
      usePullPush: false
      secrets:
        - *awsregcred

certs-operator:
  ca:
    httpTimeout: 120s
# Less consul workers
consul:
  consul:
    server:
      enabled: true
      replicas: 1

hopsfs:
  datanode:
    count: 3
    storage:
      # storageClassName: 
      size: 256Gi

    nodeSelector:
      hw-group: hw
  namenode:
    nodeSelector:
      hw-group: hw

  objectStorage:
    enabled: true
    provider: "S3"
    s3:
      bucket: 
        name: *bucket
      region: *region

# Taken from the large bench test
rondb:
  
  # Go to NVME type of machines
  nodeSelector:
    mgmd:
      hw-group: nvme
    ndbmtd:
      hw-group: nvme
    rdrs:
      hw-group: nvme

  clusterSize:
    activeDataReplicas: 1
    numNodeGroups: 1
    minNumMySQLServers: 1
    maxNumMySQLServers: 1
    minNumRdrs: 1
    maxNumRdrs: 1

  resources:
    limits:
      cpus:
        mgmds: 0.2
        ndbmtds: 6
        mysqlds: 6
        rdrs: 2
        benchs: 2
      memory:
        ndbmtdsMiB: 9000
        rdrsMiB: 700
        benchsMiB: 700
    requests:
      cpus:
        mgmds: 0.2
        mysqlds: 6
        rdrs: 2
        benchs: 1
      memory:
        rdrsMiB: 100
        benchsMiB: 100
      storage:
        # If commented will use default ebs storage class
        classes:
          # default: protection-gold6
          # diskColumns: protection-gold6 # nvme
        diskColumnGiB: 32
        redoLogGiB: 32
        undoLogsGiB: 32
        slackGiB: 2

  rondbConfig:
    InitialTablespaceSizeGiB: 10


airflow:
  enabled: false
      

hive:
  nodeSelector:
    hw-group: hw
  metastore:
    deployment:
      replicas: 2
    jvm_resources:
      xms: 7g
      xmx: 7g

olk:
  logstash:
    nodeSelector:
      hw-group: hw

  dashboard:
    nodeSelector:
      hw-group: hw
  
  opensearch:
    nodeSelector:
      hw-group: hw
    backup:
      # We enable it again in the upgrade stage of the CI
      enabled: false

Once the values.yaml file is created, run the following code to deploy Hopsworks.

helm repo add hopsworks https://nexus.hops.works/repository/hopsworks-helm
helm repo update
helm install hopsworks hopsworks/hopsworks --devel --namespace hopsworks 
--timeout=600s --values values.netapp.yaml --create-namespace

The cluster installation takes approximately 30mins. 

Once Hopsworks is deployed, you need to interact with the Hopsworks platform to create proper API keys and projects for the locust benchmark to run.

First, enable the Hopsworks load balancers to get an accessible external IP for the Hopsworks portal. Execute the code below.

helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm -n ingress-nginx install ingress-nginx ingress-nginx/ingress-nginx --create-namespace

You can get the external IP by executing the following command

kubectl get svc -n ingress-nginx ingress-nginx-controller

Once with the IP, access the Hopsworks application. Use the default credentials, user admin@hopsworks.ai and password `admin`. Entering the Hopsworks application, create a project named `test` and an API KEY. Copy the Api Key for the next step.

Create a file named benchmark.yaml.

---
# configmap
apiVersion: v1
kind: ConfigMap
metadata:
  name: locust-tasks
  namespace: hopsworks
data:
  .api_key: |
    
  requirements.txt: |
    nest-asyncio
    markupsafe==2.0.1
    locust==2.23.1
    git+https://github.com/logicalclocks/hopsworks-api@main#egg=hopsworks[python]&subdirectory=python
  hopsworks-config.json: |
    {
        "host": "hopsworks.hopsworks.svc.cluster.local",
        "port": 28181,
        "project": "test",
        "external": false,
        "rows": 1000,
        "schema_repetitions": 1,
        "recreate_feature_group": true,
        "batch_size": 100,   
        "tablespace": "ts_1"
    }
---
# Create a deployment for the head
apiVersion: apps/v1
kind: Deployment
metadata:
  name: locust-bench-head
  namespace: hopsworks
spec:
  replicas: 1
  selector:
    matchLabels:
      app: locust-bench-head
  template:
    metadata:
      labels:
        app: locust-bench-head
    spec:
      nodeSelector:
        hw-group: locust
      securityContext:
        runAsUser: 0
      containers:
      - name: simple-server-sidecar
        image: "python:3.8"
        imagePullPolicy: Always
        ports:
        - containerPort: 8080
        workingDir: /results
        command:
        - "python"
        - "-m"
        - "http.server"
        - "8080"
        volumeMounts:
        - name: results
          mountPath: /results
          
      - name: locust-bench
        image: "docker.hops.works/hopsworks/locust-hsfs:master"
        imagePullPolicy: Always
        env:
          - name: HOPSWORKS_HOSTNAME_VERIFICATION
            value: "False"
          - name: HOPSWORKS_DOMAIN
            value: "hopsworks.hopsworks.svc.cluster.local"
          - name: HOPSWORKS_PORT
            value: "28181"
          - name: MYSQL_HOST
            value: "mysql.service.consul"
          - name: MYSQL_USER
            value: "hopsworksroot"
          - name: MYSQL_PASSWORD
            valueFrom:
              secretKeyRef:
                key: hopsworksroot
                name: mysql-users-secrets
        command:
        - "/bin/sh"
        - "-c"
        - |
          set -e
          #git clone -b hotfix_locust https://github.com/Jacarte/hopsworks-api
          git clone https://github.com/logicalclocks/hopsworks-api
          cd hopsworks-api
          cd locust_benchmark
          # pip install -r requirements.txt
          pip install -r /locust-tasks/requirements.txt
          
          cp /locust-tasks/hopsworks-config.json ./hopsworks_config.json
          cp /locust-tasks/.api_key ./.api_key
          python create_feature_group.py
          locust -f locustfile.py MySQLFeatureVectorLookup  --master --headless --expect-workers 10 -u 50 -r 30 -t 300 -s 30  --html=/results/result.html --csv /results/results.csv 

          echo "Done, tailing /dev/null"
          tail -f /dev/null

        ports:
        - containerPort: 5557

        volumeMounts:
        - name: locust-tasks
          mountPath: /locust-tasks
        - name: results
          mountPath: /results
      volumes:
      - name: locust-tasks
        configMap:
          name: locust-tasks
      - name: results
        emptyDir: {}
      restartPolicy: Always
# service
---
apiVersion: v1
kind: Service
metadata:
  name: locust-bench-head
  namespace: hopsworks
spec:
  selector:
    app: locust-bench-head
  ports:
    - protocol: TCP
      port: 5557
      targetPort: 5557
  type: ClusterIP
# deployment for workers
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: locust-bench-worker
  namespace: hopsworks
spec:
  replicas: 10
  selector:
    matchLabels:
      app: locust-bench-worker
  template:
    metadata:
      labels:
        app: locust-bench-worker
    spec:
      nodeSelector:
        hw-group: locust
      securityContext:
        runAsUser: 0
      containers:
      - name: locust-bench
        image: "docker.hops.works/hopsworks/locust-hsfs:master"
        imagePullPolicy: Always
        env:
          - name: HOPSWORKS_HOSTNAME_VERIFICATION
            value: "False"
          - name: HOPSWORKS_DOMAIN
            value: "hopsworks.hopsworks.svc.cluster.local"
          - name: HOPSWORKS_PORT
            value: "28181"
          - name: MYSQL_HOST
            value: "mysql.service.consul"
          - name: MYSQL_USER
            value: "hopsworksroot"
          - name: MYSQL_PASSWORD
            valueFrom:
              secretKeyRef:
                key: hopsworksroot
                name: mysql-users-secrets
        command:
        - "/bin/sh"
        - "-c"
        - |
          # git clone -b hotfix_locust https://github.com/Jacarte/hopsworks-api
          git clone https://github.com/logicalclocks/hopsworks-api
          cd hopsworks-api
          cd locust_benchmark
          # pip install -r requirements.txt
          pip install -r /locust-tasks/requirements.txt
          
          cp /locust-tasks/hopsworks-config.json ./hopsworks_config.json
          cp /locust-tasks/.api_key ./.api_key

          locust -f locustfile.py MySQLFeatureVectorLookup --worker --master-host locust-bench-head.hopsworks.svc.cluster.local
        
        volumeMounts:
        - name: locust-tasks
          mountPath: /locust-tasks
      volumes:
      - name: locust-tasks
        configMap:
          name: locust-tasks

Once you create the file. Execute the following command to deploy its definitions.

kubectl apply -f benchmark.yaml

The benchmark takes nearly 10mins to be ready to execute. Once it finishes, check the locust head deployment. It has a sidecar in which you can collect the results.

Benchmark Insights

The results indicate that the Median Response Time, Average Response Time, and Requests/s are identical in both of our scenarios. Therefore, using Amazon FSx for NetApp ONTAP provide a more reliable volume provisioning while maintaining the expected performance for a Hopsworks deployment.

Standard EBS Volumes Results

These are the locust benchmark results for regular EBS volumes provided by EKS. This means, no storage class setting in the values.yaml previously discussed.

Request statistics

Standard EBS Volumes Results - request statistics

Response time statistics

Standard EBS Volumes Results - response time statistics

NetAPP ONTAP NVME volumes results

These are the locust benchmark results for NetAPP ONTAP NVME volumes. This means, setting storage class the values.yaml previously discussed as `protection-gold6`.

Request statistics

NetAPP ONTAP NVME volumes results - request statistics

Response time statistics

NetAPP ONTAP NVME volumes results - response time statistics

Conclusion

By following this tutorial, you can evaluate the interoperability between Hopsworks 4.x and Amazon FSx for NetApp ONTAP. Our results demonstrate successful integration with geographically-aware NVMe provisioning, without compromising performance. The benchmark metrics remain unchanged in both scenarios (standard EBS gp3 disks for EKS and Amazon FSx for NetApp ONTAP NVMe disks), underscoring that the primary advantages lie in operational simplicity and more efficient storage management rather than raw speed. Consequently, Amazon FSx for NetApp ONTAP provides more reliable volume provisioning while maintaining the expected performance for a Hopsworks deployment.

References