Module 4 - Cluster Operations

Overview

In this module you be introduced to some standard operational procedures. You will learn how to run multiple GlusterFS Trusted Storage Pools on OpenShift and how to expand and maintain deployments.

Herein, we will use the term pool (GlusterFS terminology) and cluster (heketi terminology) interchangeably.

This module requires that you have completed Module 2.

Running multiple storage pools#

In the previous modules a single GlusterFS cluster was used to supply PersistentVolumes to applications. CNS allows for multiple clusters to run in a single OpenShift deployment, controlled by a central heketi API:

There are several use cases for this:

  1. Provide data isolation between clusters of different tenants

  2. Provide multiple performance tiers of CNS, i.e. HDD-based vs. SSD-based

  3. Run OpenShift across large geo-graphical distances with a CNS cluster per region whereas otherwise latency prohibits synchronous data replication in a stretched setup

Note

The procedures to add an additional CNS cluster to an existing setup is not yet supported by openshift-ansible.

Because we cannot use openshift-ansible as of today we need to run a couple of steps manually that would otherwise be automated.

To deploy a second CNS cluster, aka GlusterFS pool, follow these steps:

⇨ Log in as operator to namespace app-storage

oc login -u operator -n app-storage

Your deployment has 6 OpenShift Application Nodes in total, node-1, node-2 and node-3 currently setup running CNS. We will now set up a second CNS cluster using node-4, node-5 and node-6.

First we need to make sure the firewall on those systems is updated. Without openshift-ansible automating CNS deployment the ports necessary for running GlusterFS are not yet opened.

⇨ First, create a file called configure-firewall.yml and copy&paste the following contents:

configure-firewall.yml:

---

- hosts:
    - node-4.lab
    - node-5.lab
    - node-6.lab

  tasks:

    - name: insert iptables rules required for GlusterFS
      blockinfile:
        dest: /etc/sysconfig/iptables
        block: |
          -A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 24007 -j ACCEPT
          -A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 24008 -j ACCEPT
          -A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 2222 -j ACCEPT
          -A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m multiport --dports 49152:49664 -j ACCEPT
        insertbefore: "^COMMIT"

    - name: reload iptables
      systemd:
        name: iptables
        state: reloaded

...

⇨ Run this small Ansible playbook to apply and reload the firewall configuration on all 3 nodes conveniently:

ansible-playbook configure-firewall.yml

The playbook should complete successfully:

PLAY [node-4.lab,node-5.lab,node-6.lab] ******************************************************************************************

TASK [Gathering Facts] ***********************************************************************************************************
Sunday 24 September 2017  14:02:50 +0000 (0:00:00.056)       0:00:00.056 ******
ok: [node-5.lab]
ok: [node-6.lab]
ok: [node-4.lab]

TASK [insert iptables rules required for GlusterFS] ******************************************************************************
Sunday 24 September 2017  14:02:51 +0000 (0:00:00.859)       0:00:00.916 ******
changed: [node-4.lab]
changed: [node-5.lab]
changed: [node-6.lab]

TASK [reload iptables] ***********************************************************************************************************
Sunday 24 September 2017  14:02:51 +0000 (0:00:00.268)       0:00:01.184 ******
changed: [node-6.lab]
changed: [node-5.lab]
changed: [node-4.lab]

PLAY RECAP ***********************************************************************************************************************
node-4.lab                 : ok=3    changed=2    unreachable=0    failed=0
node-5.lab                 : ok=3    changed=2    unreachable=0    failed=0
node-6.lab                 : ok=3    changed=2    unreachable=0    failed=0

Sunday 24 September 2017  14:02:52 +0000 (0:00:00.334)       0:00:01.519 ******
===============================================================================
Gathering Facts --------------------------------------------------------- 0.86s
reload iptables --------------------------------------------------------- 0.34s
insert iptables rules required for GlusterFS ---------------------------- 0.27s

⇨ Next, we need to apply additional labels to the remaining 3 OpenShift Nodes:

oc label node/node-4.lab glusterfs=storage-host
oc label node/node-5.lab glusterfs=storage-host
oc label node/node-6.lab glusterfs=storage-host

The label will be used to control GlusterFS pod placement and availability. They are part of a DaemonSet definition that is looking for hosts with this particular label.

⇨ Wait for all pods to show 1/1 in the READY column:

 oc get pods -o wide -n app-storage

You can also watch the additional GlusterFS pods deploy in the OpenShift UI, while being logged in as operator in project app-storage, select Applications from the left menu and then Pods:

CNS Deployment

Note

It may take up to 3 minutes for the GlusterFS pods to transition into READY state.

⇨ When done, on the CLI display all GlusterFS pods alongside with the name of the container host they are running on:

oc get pods -o wide -n app-storage -l glusterfs=storage-pod

You will see that now also app nodes node-4, node-5 and node-6 run GlusterFS pods, although they are unitialized and not yet ready to use by CNS yet.

For manual bulk import of new nodes like this, a JSON topology file is used which includes the existing cluster as well as the new, second cluster with a separate set of nodes.

⇨ Create a new file named 2-clusters-topology.json with the content below (use copy&paste):

2-clusters-topology.json:

{
    "clusters": [
        {
            "nodes": [
                {
                    "node": {
                        "hostnames": {
                            "manage": [
                                "node-1.lab"
                            ],
                            "storage": [
                                "10.0.2.201"
                            ]
                        },
                        "zone": 1
                    },
                    "devices": [
                        "/dev/xvdc"
                    ]
                },
                {
                    "node": {
                        "hostnames": {
                            "manage": [
                                "node-2.lab"
                            ],
                            "storage": [
                                "10.0.3.202"
                            ]
                        },
                        "zone": 2
                    },
                    "devices": [
                        "/dev/xvdc"
                    ]
                },
                {
                    "node": {
                        "hostnames": {
                            "manage": [
                                "node-3.lab"
                            ],
                            "storage": [
                                "10.0.4.203"
                            ]
                        },
                        "zone": 3
                    },
                    "devices": [
                        "/dev/xvdc"
                    ]
                }
            ]
        },
        {
            "nodes": [
                {
                    "node": {
                        "hostnames": {
                            "manage": [
                                "node-4.lab"
                            ],
                            "storage": [
                                "10.0.2.204"
                            ]
                        },
                        "zone": 1
                    },
                    "devices": [
                        "/dev/xvdc"
                    ]
                },
                {
                    "node": {
                        "hostnames": {
                            "manage": [
                                "node-5.lab"
                            ],
                            "storage": [
                                "10.0.3.205"
                            ]
                        },
                        "zone": 2
                    },
                    "devices": [
                        "/dev/xvdc"
                    ]
                },
                {
                    "node": {
                        "hostnames": {
                            "manage": [
                                "node-6.lab"
                            ],
                            "storage": [
                                "10.0.4.206"
                            ]
                        },
                        "zone": 3
                    },
                    "devices": [
                        "/dev/xvdc"
                    ]
                }
            ]
        }
    ]
}

The file contains the same content as the dynamically generated JSON structure openshift-ansible used, but with a second cluster specification (beginning at the highlighted line).

When loading this topology to heketi, it will recognize the existing cluster (leaving it unchanged) and start creating the new one, with the same bootstrapping process used to initialize the first cluster.
That is: the glusterd process running in the pods will form a new 3-node cluster and the supplied block storage device /dev/xvdc will be formatted.

⇨ Prepare the heketi CLI tool like previously in Module 2.

HEKETI_POD=$(oc get pods -l glusterfs=heketi-storage-pod -n app-storage -o jsonpath="{.items[0].metadata.name}")
export HEKETI_CLI_SERVER=http://$(oc get route/heketi-storage -o jsonpath='{.spec.host}')
export HEKETI_CLI_USER=admin
export HEKETI_CLI_KEY=$(oc get pod/$HEKETI_POD -o jsonpath='{.spec.containers[0].env[?(@.name=="HEKETI_ADMIN_KEY")].value}')

⇨ Verify there is currently only a single cluster known to heketi

heketi-cli cluster list

Example output:

Clusters:
fb67f97166c58f161b85201e1fd9b8ed

Your ID will be different since it’s auto-generated.

⇨ Save your specific ID of the first cluster with this shell command (and the versatile jq json parser) into an environment variable:

FIRST_CNS_CLUSTER=$(heketi-cli cluster list --json | jq -r '.clusters[0]')

Important

Do not skip above step. The value in the environment variable FIRST_CNS_CLUSTER is required later in this module.

⇨ Load the new topology with the heketi client

heketi-cli topology load --json=2-clusters-topology.json

You should see output similar to the following:

Found node node-1.lab on cluster fb67f97166c58f161b85201e1fd9b8ed
Found device /dev/xvdc
Found node node-2.lab on cluster fb67f97166c58f161b85201e1fd9b8ed
Found device /dev/xvdc
Found node node-3.lab on cluster fb67f97166c58f161b85201e1fd9b8ed
Found device /dev/xvdc
Creating cluster ... ID: 46b205a4298c625c4bca2206b7a82dd3
Creating node node-4.lab ... ID: 604d2eb15a5ca510ff3fc5ecf912d3c0
Adding device /dev/xvdc ... OK
Creating node node-5.lab ... ID: 538b860406870288af23af0fbc2cd27f
Adding device /dev/xvdc ... OK
Creating node node-6.lab ... ID: 7736bd0cb6a84540860303a6479cacb2
Adding device /dev/xvdc ... OK

As indicated from above output a new cluster got created.

⇨ List all clusters:

heketi-cli cluster list

You should see a second cluster in the list:

Clusters:
46b205a4298c625c4bca2206b7a82dd3
fb67f97166c58f161b85201e1fd9b8ed

The second cluster, in this example with the ID 46b205a4298c625c4bca2206b7a82dd3, is an entirely independent GlusterFS deployment. The exact value will be different in your environment.

heketi is now able to differentiate between the clusters with storage provisioning requests when their UUID is specified.

⇨ Save the UUID of the second CNS cluster in an environment variable as follows for easy copy&paste later:

SECOND_CNS_CLUSTER=$(heketi-cli cluster list --json | jq -r ".clusters[] | select(contains(\"$FIRST_CNS_CLUSTER\") | not)")

Now we have two independent GlusterFS clusters managed by the same heketi instance:

Nodes Cluster UUID
First Cluster node-1, node-2, node-3 fb67f97166c58f161b85201e1fd9b8ed
Second Cluster node-4, node-5, node-6 46b205a4298c625c4bca2206b7a82dd3

⇨ Query the updated topology:

heketi-cli topology info

Abbreviated output:

Cluster Id: 46b205a4298c625c4bca2206b7a82dd3

    Volumes:

    Nodes:

      Node Id: 538b860406870288af23af0fbc2cd27f
      State: online
      Cluster Id: 46b205a4298c625c4bca2206b7a82dd3
      Zone: 2
      Management Hostname: node-5.lab
      Storage Hostname: 10.0.3.105
      Devices:
        Id:e481d022cea9bfb11e8a86c0dd8d3499   Name:/dev/xvdc           State:online    Size (GiB):499     Used (GiB):0       Free (GiB):499
            Bricks:

      Node Id: 604d2eb15a5ca510ff3fc5ecf912d3c0
      State: online
      Cluster Id: 46b205a4298c625c4bca2206b7a82dd3
      Zone: 1
      Management Hostname: node-4.lab
      Storage Hostname: 10.0.2.104
      Devices:
        Id:09a25a114c53d7669235b368efd2f8d1   Name:/dev/xvdc           State:online    Size (GiB):499     Used (GiB):0       Free (GiB):499
            Bricks:

      Node Id: 7736bd0cb6a84540860303a6479cacb2
      State: online
      Cluster Id: 46b205a4298c625c4bca2206b7a82dd3
      Zone: 3
      Management Hostname: node-6.lab
      Storage Hostname: 10.0.4.106
      Devices:
        Id:cccadb2b54dccd99f698d2ae137a22ff   Name:/dev/xvdc           State:online    Size (GiB):499     Used (GiB):0       Free (GiB):499
            Bricks:

Cluster Id: fb67f97166c58f161b85201e1fd9b8ed

[...output omitted for brevity...]

heketi formed an new, independent 3-node GlusterFS cluster on those nodes.

⇨ Check running GlusterFS pods

oc get pods -o wide -l glusterfs=storage-pod

From the output you can spot the pod names running on the new cluster’s nodes:

NAME              READY     STATUS    RESTARTS   AGE       IP           NODE
glusterfs-1nvtj   1/1       Running   0          23m       10.0.4.206   node-6.lab
glusterfs-5gvw8   1/1       Running   0          24m       10.0.2.204   node-4.lab
glusterfs-5rc2g   1/1       Running   0          4h        10.0.2.201   node-1.lab
glusterfs-b4wg1   1/1       Running   0          24m       10.0.3.205   node-5.lab
glusterfs-jbvdk   1/1       Running   0          4h        10.0.3.202   node-2.lab
glusterfs-rchtr   1/1       Running   0          4h        10.0.4.203   node-3.lab

Note

Again note that the pod names are dynamically generated and will be different. Look the FQDN of your hosts to determine one of new cluster’s pods.

⇨ Let’s run the gluster peer status command in the GlusterFS pod running on the node node-6.lab:

POD_NUMBER_SIX=$(oc get pods -o jsonpath='{.items[?(@.status.hostIP=="10.0.4.206")].metadata.name}')
oc rsh $POD_NUMBER_SIX gluster peer status

As expected this node only has 2 peers, evidence that it’s running in it’s own GlusterFS pool separate from the first cluster in deployed in Module 2.

Number of Peers: 2

Hostname: node-5.lab
Uuid: 0db9b5d0-7fa8-4d2f-8b9e-6664faf34606
State: Peer in Cluster (Connected)
Other names:
10.0.3.205

Hostname: node-4.lab
Uuid: 695b661d-2a55-4f94-b22e-40a9db79c01a
State: Peer in Cluster (Connected)

Before you can use the second cluster two tasks have to be accomplished so we can use both distinctively:

  1. The StorageClass for the first cluster has to be updated to point the first cluster’s UUID,

  2. A second StorageClass for the second cluster has to be created, pointing to the same heketi API

Why do we need to update the first StorageClass?

When no cluster UUID is specified, heketi serves volume creation requests from any cluster currently registered to it. That would be two now.
In order to request a volume from specific cluster you have to supply the cluster’s UUID to heketi. This is done via a parameter in the StorageClass. The first StorageClass has no UUID specified so far because openshift-ansible did not create it.

Unfortunately you cannot oc patch a StorageClass parameters in OpenShift. So we have to delete it and re-create it. Don’t worry - existing PVCs will remain untouched.

To simplify our work, instead of typing JSON/YAML, we will just export the current StorageClass definition JSON and manipulate it using jq and some clever JSON queries to put the additional clusterid parameter in the right place.

⇨ To do that, run the following command via copy&paste:

oc get storageclass/glusterfs-storage -o json \
| jq ".parameters=(.parameters + {\"clusterid\": \"$FIRST_CNS_CLUSTER\"})" > glusterfs-storage-fast.json

This will result in a file named glusterfs-storage-fast.json looking like the following:

glusterfs-storage-fast.json:

{
  "apiVersion": "storage.k8s.io/v1",
  "kind": "StorageClass",
  "metadata": {
    "creationTimestamp": "2017-09-24T12:45:24Z",
    "name": "glusterfs-storage",
    "resourceVersion": "2697",
    "selfLink": "/apis/storage.k8s.io/v1/storageclasses/glusterfs-storage",
    "uid": "3c107010-a126-11e7-b0a5-025fcde0880f"
  },
  "parameters": {
    "resturl": "http://heketi-storage-app-storage.cloudapps.34.252.58.209.nip.io",
    "restuser": "admin",
    "secretName": "heketi-storage-admin-secret",
    "secretNamespace": "app-storage",
    "clusterid": "fb67f97166c58f161b85201e1fd9b8ed"
  },
  "provisioner": "kubernetes.io/glusterfs"
}

Note the additional clusterid parameter highlighted. It’s the first cluster’s UUID as known by heketi. The exact values will be different in your environment. The rest of the definition remains the same.

⇨ Delete the existing StorageClass definition in OpenShift

oc delete storageclass/glusterfs-storage

⇨ Add the StorageClass again:

oc create -f glusterfs-storage-fast.json

Step 1 complete. The existing StorageClass is “updated”. PVC using the StorageClass glusterfs-storage will now specifically get served by the first CNS cluster, and only the first cluster.

To relieve you from manually editing JSON files, we will again use some jq magic to generate the correct JSON structure for our second StorageClass, this time using the second CNS cluster’s ID and a different name glusterfs-storage-slow

⇨ Run the following command:

oc get storageclass/glusterfs-storage -o json \
| jq ".parameters=(.parameters + {\"clusterid\": \"$SECOND_CNS_CLUSTER\"})" \
| jq '.metadata.name = "glusterfs-storage-slow"' > glusterfs-storage-slow.json

This creates a file called glusterfs-storage-slow.json, looking similar to the below:

glusterfs-storage-slow.json:

{
  "apiVersion": "storage.k8s.io/v1",
  "kind": "StorageClass",
  "metadata": {
    "creationTimestamp": "2017-09-24T15:12:34Z",
    "name": "glusterfs-storage-slow",
    "resourceVersion": "12722",
    "selfLink": "/apis/storage.k8s.io/v1/storageclasses/glusterfs-storage",
    "uid": "cb16946d-a13a-11e7-b0a5-025fcde0880f"
  },
  "parameters": {
    "clusterid": "46b205a4298c625c4bca2206b7a82dd3",
    "resturl": "http://heketi-storage-app-storage.cloudapps.34.252.58.209.nip.io",
    "restuser": "admin",
    "secretName": "heketi-storage-admin-secret",
    "secretNamespace": "app-storage"
  },
  "provisioner": "kubernetes.io/glusterfs"
}

Again note the clusterid in the parameters section referencing the second cluster’s UUID will as well as the update.

⇨ Add the new StorageClass:

oc create -f glusterfs-storage-slow.json

This creates the StorageClass named glusterfs-storage-slow and because we copied the settings from the first one it’s now also set as system-wide default (yes, OpenShift allows you to do that).

⇨ Use the oc patch command to fix this:

oc patch storageclass glusterfs-storage-slow \
-p '{"metadata": {"annotations": {"storageclass.kubernetes.io/is-default-class": "false"}}}'

⇨ Display all StorageClass objects to verify:

oc get storageclass

That’s it. You now have 2 StorageClass definitions, one for each CNS cluster, managed by the same heketi instance.

NAME                          TYPE
glusterfs-storage (default)   kubernetes.io/glusterfs
glusterfs-storage-slow        kubernetes.io/glusterfs

Let’s verify both StorageClasses are working as expected:

⇨ Create the following two files containing PVCs issued against either of both GlusterFS pools via their respective StorageClass:

cns-pvc-fast.yml:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: my-fast-container-storage
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 5Gi
  storageClassName: glusterfs-storage

cns-pvc-slow.yml:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: my-slow-container-storage
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 7Gi
  storageClassName: glusterfs-storage-slow

⇨ Create both PVCs:

oc create -f cns-pvc-fast.yml
oc create -f cns-pvc-slow.yml

Check their provisioning state after a few seconds:

oc get pvc

They should both be in bound state after a couple of seconds:

NAME                        STATUS    VOLUME                                     CAPACITY   ACCESSMODES   STORAGECLASS             AGE
my-fast-container-storage   Bound     pvc-bfbf3f72-a13d-11e7-b0a5-025fcde0880f   5Gi        RWX           glusterfs-storage        6s
my-slow-container-storage   Bound     pvc-c045c082-a13d-11e7-b0a5-025fcde0880f   7Gi        RWX           glusterfs-storage-slow   6s

⇨ If you check again the GlusterFS pod on on node-6.lab running as part of the second cluster…

oc rsh $POD_NUMBER_SIX gluster vol list

…you will see a new volume has been created

vol_755b4434cf9062104123e0d9919dd800

The other volume has been created on the first cluster.

⇨ If you were to check GlusterFS on node-1.lab

POD_NUMBER_ONE=$(oc get pods -o jsonpath='{.items[?(@.status.hostIP=="10.0.2.201")].metadata.name}')
oc rsh $POD_NUMBER_ONE gluster vol list

…you will also see a new volume has been created, alongside the volumes from the previous exercises and the heketidbstorage volume:

heketidbstorage
[...output omitted... ]
vol_8eb957320215fe8801748b239d524808

If you now compare the PV objects that have been created:

⇨ … the first PV using StorageClass glusterfs-storage:

FAST_PV=$(oc get pvc/my-fast-container-storage -o jsonpath="{.spec.volumeName}")
oc get pv/$FAST_PV -o jsonpath="{.spec.glusterfs.path}"

⇨ … and the second PV using StorageClass glusterfs-storage-slow:

SLOW_PV=$(oc get pvc/my-slow-container-storage -o jsonpath="{.spec.volumeName}")
oc get pv/$SLOW_PV -o jsonpath="{.spec.glusterfs.path}"

… you will notice that they match the volumes found in the CNS clusters from within their pods respectively.

This is how you use multiple, parallel GlusterFS pools/clusters on a single OpenShift cluster with a single heketi instance. Whereas the first pool is created with openshift-ansible subsequent pools/cluster are created with the heketi-cli client.

Clean up the PVCs and the second StorageClass in preparation for the next section.

⇨ Delete both PVCs (and therefore their volume)

oc delete pvc/my-fast-container-storage
oc delete pvc/my-slow-container-storage

⇨ Delete the second StorageClass

oc delete storageclass/glusterfs-storage-slow

Deleting a CNS cluster#

Since we want to re-use node-4, node-5 and node-6 for the next section we need to delete it the GlusterFS pools on top of them first.

This is a process that involves multiple steps of manipulating the heketi topology with the heketi-cli client.

⇨ Make sure the client is still properly configured via environment variables:

echo $HEKETI_CLI_SERVER
echo $HEKETI_CLI_USER
echo $HEKETI_CLI_KEY

⇨ We also require the environment variables storing the UUIDs of both CNS clusters:

echo $FIRST_CNS_CLUSTER
echo $SECOND_CNS_CLUSTER

⇨ First display the entire system topology as it is known to heketi:

heketi-cli topology info

You will get detailled infos about both clusters.
The portions of interest for the second clusters we are about to delete are highlighted:

Cluster Id: 46b205a4298c625c4bca2206b7a82dd3

    Volumes:

    Nodes:

        Node Id: 538b860406870288af23af0fbc2cd27f
        State: online
        Cluster Id: 46b205a4298c625c4bca2206b7a82dd3
        Zone: 2
        Management Hostname: node-5.lab
        Storage Hostname: 10.0.3.105
        Devices:
            Id:e481d022cea9bfb11e8a86c0dd8d3499   Name:/dev/xvdc           State:online    Size (GiB):499     Used (GiB):0       Free (GiB):499
                Bricks:

        Node Id: 604d2eb15a5ca510ff3fc5ecf912d3c0
        State: online
        Cluster Id: 46b205a4298c625c4bca2206b7a82dd3
        Zone: 1
        Management Hostname: node-4.lab
        Storage Hostname: 10.0.2.104
        Devices:
            Id:09a25a114c53d7669235b368efd2f8d1   Name:/dev/xvdc           State:online    Size (GiB):499     Used (GiB):0       Free (GiB):499
                Bricks:

        Node Id: 7736bd0cb6a84540860303a6479cacb2
        State: online
        Cluster Id: 46b205a4298c625c4bca2206b7a82dd3
        Zone: 3
        Management Hostname: node-6.lab
        Storage Hostname: 10.0.4.106
        Devices:
            Id:cccadb2b54dccd99f698d2ae137a22ff   Name:/dev/xvdc           State:online    Size (GiB):499     Used (GiB):0       Free (GiB):499
            Bricks:

The hierachical dependencies in this topology works as follows: Clusters > Nodes > Devices.
Assuming there are no volumes present these need to be deleted in reverse order.

To make navigating this process easier and avoid mangling with anonymous UUID values we will use some simple scripting.

⇨ This is how you get all nodes IDs of the second cluster:

heketi-cli cluster info $SECOND_CNS_CLUSTER --json | jq -r '.nodes[]'

For example:

538b860406870288af23af0fbc2cd27f
604d2eb15a5ca510ff3fc5ecf912d3c0
7736bd0cb6a84540860303a6479cacb2

⇨ Let’s put this in a variable so we can iterate over it:

NODES=$(heketi-cli cluster info $SECOND_CNS_CLUSTER --json | jq -r '.nodes[]')

⇨ This is how you get information about a node

heketi-cli node info ${NODES[0]}
Node Id: 538b860406870288af23af0fbc2cd27f
State: online
Cluster Id: 38cba86da51146a0ef9747383bd44476
Zone: 2
Management Hostname: node-5.lab
Storage Hostname: 10.0.3.205
Devices:
Id:e481d022cea9bfb11e8a86c0dd8d3499   Name:/dev/xvdc           State:online    Size (GiB):499     Used (GiB):0       Free (GiB):499

⇨ Let’s iterate over this NODES array and extract all device IDs:

for node in ${NODES} ; do heketi-cli node info $node --json | jq -r '.devices[].id' ; done

… for example:

e481d022cea9bfb11e8a86c0dd8d3499
09a25a114c53d7669235b368efd2f8d1
cccadb2b54dccd99f698d2ae137a22ff

⇨ Let’s put this in a a variable too, so we can easily iterate over that:

DEVICES=$(for node in ${NODES} ; do heketi-cli node info $node --json | jq -r '.devices[].id' ; done)

⇨ Let’s loop over this DEVICES array and delete the device by it’s ID in heketi:

for device in $DEVICES ; do heketi-cli device delete $device ; done

Example output:

Device 538b860406870288af23af0fbc2cd27f deleted
Device 604d2eb15a5ca510ff3fc5ecf912d3c0 deleted
Device 7736bd0cb6a84540860303a6479cacb2 deleted

⇨ Since the nodes have no devices anymore we can delete those as well (you can’t delete a node with a device still attached):

for node in ${NODES} ; do heketi-cli node delete $node ; done

Example output:

Node 4ff85abd2674c89e79c1f7c7f8ee1be4 deleted
Node ed9c045f10a5c1f9057d07880543a461 deleted
Node fd6ddca52c788e2d764fada1f4da2ce4 deleted

⇨ Finally, without any nodes in the second cluster, you can also delete it (it won’t work if there are nodes left):

heketi-cli cluster delete $SECOND_CNS_CLUSTER

⇨ Confirm the cluster is gone:

heketi-cli cluster list

⇨ Verify the new topology known by heketi now only containing a single cluster.

heketi-cli topology info

This deleted all heketi database entries about the cluster. However the GlusterFS pods are still running, since they are controlled directly by OpenShift and the DaemonSet.

They can be stopped by removing the labels OpenShift uses to determine GlusterFS pod placement for CNS.

⇨ Remove the labels from the last 3 OpenShift nodes like so:

oc label node/node-4.lab glusterfs-
oc label node/node-5.lab glusterfs-
oc label node/node-6.lab glusterfs-

Contrary to the output of these commands the label glusterfs is actually removed (indicated by the minus sign).

⇨ Verify that all GlusterFS pods running on node-4, node-5 and node-6 are indeed terminated:

oc get pods -o wide -n app-storage -l glusterfs=storage-pod

Note

It can take up to 2 minutes for the pods to terminate.

You should be back down to 3 GlusterFS pods, e.g.

NAME              READY     STATUS    RESTARTS   AGE       IP           NODE
glusterfs-5rc2g   1/1       Running   0          5h        10.0.2.201   node-1.lab
glusterfs-jbvdk   1/1       Running   0          5h        10.0.3.202   node-2.lab
glusterfs-rchtr   1/1       Running   0          5h        10.0.4.203   node-3.lab

Expanding a GlusterFS pool#

Instead of creating additional GlusterFS pools in CNS on OpenShift it is also possible to expand existing pools. This is useful the increase capacity, performance and resiliency of the storage system.

This works similar to creating additional pools, with bulk-import via the topology file. Only this time with nodes added to the existing cluster structure in JSON.

Since manipulating JSON can be error-prone create a new file called expanded-cluster.json with contents as below:

expanded-cluster.json:

{
    "clusters": [
        {
            "nodes": [
                {
                    "node": {
                        "hostnames": {
                            "manage": [
                                "node-1.lab"
                            ],
                            "storage": [
                                "10.0.2.201"
                            ]
                        },
                        "zone": 1
                    },
                    "devices": [
                        "/dev/xvdc"
                    ]
                },
                {
                    "node": {
                        "hostnames": {
                            "manage": [
                                "node-2.lab"
                            ],
                            "storage": [
                                "10.0.3.202"
                            ]
                        },
                        "zone": 2
                    },
                    "devices": [
                        "/dev/xvdc"
                    ]
                },
                {
                    "node": {
                        "hostnames": {
                            "manage": [
                                "node-3.lab"
                            ],
                            "storage": [
                                "10.0.4.203"
                            ]
                        },
                        "zone": 3
                    },
                    "devices": [
                        "/dev/xvdc"
                    ]
                },
                {
                    "node": {
                        "hostnames": {
                            "manage": [
                                "node-4.lab"
                            ],
                            "storage": [
                                "10.0.2.204"
                            ]
                        },
                        "zone": 1
                    },
                    "devices": [
                        "/dev/xvdc"
                    ]
                },
                {
                    "node": {
                        "hostnames": {
                            "manage": [
                                "node-5.lab"
                            ],
                            "storage": [
                                "10.0.3.205"
                            ]
                        },
                        "zone": 2
                    },
                    "devices": [
                        "/dev/xvdc"
                    ]
                },
                {
                    "node": {
                        "hostnames": {
                            "manage": [
                                "node-6.lab"
                            ],
                            "storage": [
                                "10.0.4.206"
                            ]
                        },
                        "zone": 3
                    },
                    "devices": [
                        "/dev/xvdc"
                    ]
                }
            ]
        }
    ]
}

The difference between this file to the 2-clusters-topology.json is that we now have 6 nodes in a single cluster instead of 2 clusters, with 3 nodes each.

⇨ Again, apply the expected labels to the remaining 3 OpenShift Nodes:

oc label node/node-4.lab glusterfs=storage-host
oc label node/node-5.lab glusterfs=storage-host
oc label node/node-6.lab glusterfs=storage-host

⇨ Wait for all pods to show 1/1 in the READY column:

 oc get pods -o wide -n app-storage -l glusterfs=storage-pod

Note

It may take up to 3 minutes for the GlusterFS pods to transition into READY state.

This confirms all GlusterFS pods are ready to receive remote commands:

NAME              READY     STATUS    RESTARTS   AGE       IP           NODE
glusterfs-0lr75   1/1       Running   0          4m        10.0.4.106   node-6.lab
glusterfs-1dxz3   1/1       Running   0          4m        10.0.3.105   node-5.lab
glusterfs-5rc2g   1/1       Running   0          5h        10.0.2.101   node-1.lab
glusterfs-8nrn0   1/1       Running   0          4m        10.0.2.104   node-4.lab
glusterfs-jbvdk   1/1       Running   0          5h        10.0.3.102   node-2.lab
glusterfs-rchtr   1/1       Running   0          5h        10.0.4.103   node-3.lab

⇨ Ensure the environment variables for operating heketi-cli are still in place:

echo $HEKETI_CLI_SERVER
echo $HEKETI_CLI_USER
echo $HEKETI_CLI_KEY

⇨ Now load the new topology:

heketi-cli topology load --json=expanded-cluster.json

The output indicated that the existing cluster was expanded, rather than creating a new one:

Found node node-1.lab on cluster fb67f97166c58f161b85201e1fd9b8ed
    Found device /dev/xvdc
Found node node-2.lab on cluster fb67f97166c58f161b85201e1fd9b8ed
    Found device /dev/xvdc
Found node node-3.lab on cluster fb67f97166c58f161b85201e1fd9b8ed
    Found device /dev/xvdc
Creating node node-4.lab ... ID: 544158e53934a3d351b874b7d915e8d4
    Adding device /dev/xvdc ... OK
Creating node node-5.lab ... ID: 645b6edd4044cb1dd828f728d1c3eb81
    Adding device /dev/xvdc ... OK
Creating node node-6.lab ... ID: 3f39ebf3c8c82531a7ba447135742776
    Adding device /dev/xvdc ... OK

⇨ Verify the their new peers are now part of the first CNS cluster:

POD_NUMBER_ONE=$(oc get pods -o jsonpath='{.items[?(@.status.hostIP=="10.0.2.201")].metadata.name}')
oc rsh $POD_NUMBER_ONE gluster peer status

You should now have a GlusterFS consisting of 6 nodes:

Number of Peers: 5

Hostname: 10.0.3.202
Uuid: c6a6d571-fd9b-4bd8-aade-e480ec2f8eed
State: Peer in Cluster (Connected)

Hostname: 10.0.4.203
Uuid: 46044d06-a928-49c6-8427-a7ab37268fed
State: Peer in Cluster (Connected)

Hostname: 10.0.2.204
Uuid: 62abb8b9-7a68-4658-ac84-8098a1460703
State: Peer in Cluster (Connected)

Hostname: 10.0.3.205
Uuid: 5b44b6ea-6fb5-4ea9-a6f7-328179dc6dda
State: Peer in Cluster (Connected)

Hostname: 10.0.4.206
Uuid: ed39ecf7-1f5c-4934-a89d-ee1dda9a8f98
State: Peer in Cluster (Connected)

With this you have expanded the existing pool. New PVCs will start to use capacity from the additional nodes.

Important

In this lab, with this expansion, you now have a GlusterFS pool with mixed media types (both size and speed). It is recommended to have the same media type per pool.
If you like to offer multiple media types for CNS in OpenShift, use separate pools and separate StorageClass objects as described in the previous section.


Adding a device to a node#

Instead of adding entirely new nodes you can also add new storage devices for CNS to use on existing nodes.

It is again possible to do this by loading an updated topology file. Alternatively to bulk-loading via JSON you are also able to do this directly with the heketi-cli utility. This also applies to the previous sections in this module.

For this purpose node-3.lab has an additional, so far unused block device /dev/xvdd.

⇨ To use the heketi-cli make sure the environment variables are still set:

echo $HEKETI_CLI_SERVER
echo $HEKETI_CLI_USER
echo $HEKETI_CLI_KEY

⇨ Determine the UUUI heketi uses to identify node-6.lab in it’s database and save it in an environment variable:

NODE_ID_SIX=$(heketi-cli topology info --json | jq -r ".clusters[] | select(.id==\"$FIRST_CNS_CLUSTER\") | .nodes[] | select(.hostnames.manage[0] == \"node-6.lab\") | .id")

⇨ Query the node’s available devices:

heketi-cli node info $NODE_ID_SIX

The node has one device available:

Node Id: 3f39ebf3c8c82531a7ba447135742776
State: online
Cluster Id: eb909a08c8e8fd0bf80499fbbb8a8545
Zone: 3
Management Hostname: node-6.lab
Storage Hostname: 10.0.4.206
Devices:
Id:62cbae7a3f6faac38a551a614419cca3   Name:/dev/xvdd           State:online    Size (GiB):509     Used (GiB):0       Free (GiB):509

⇨ Add the device /dev/xvdd to the node using the UUID noted earlier.

heketi-cli device add --node=$NODE_ID_SIX --name=/dev/xvdd

The device is registered in heketi’s database.

Device added successfully

⇨ Query the node’s available devices again and you’ll see a second device.

heketi-cli node info $NODE_ID_SIX

That node now has 2 devices. The new device will be used by subsequent PVC being served by this cluster.

Node Id: 3f39ebf3c8c82531a7ba447135742776
State: online
Cluster Id: eb909a08c8e8fd0bf80499fbbb8a8545
Zone: 3
Management Hostname: node-6.lab
Storage Hostname: 10.0.4.206
Devices:
Id:62cbae7a3f6faac38a551a614419cca3   Name:/dev/xvdd           State:online    Size (GiB):509     Used (GiB):0       Free (GiB):509
Id:cc594d7f5ce59ab2a991c70572a0852f   Name:/dev/xvdc           State:online    Size (GiB):499     Used (GiB):0       Free (GiB):499

Replacing a failed device#

One of heketi’s advantages is the automation of otherwise tedious manual tasks, like replacing a faulty brick in GlusterFS to repair degraded volumes.
We will simulate this use case now.

⇨ Make sure you are operator in OpenShift and using the project my-test-project

oc login -u operator -n my-test-project

⇨ Create the file cns-large-pvc.yml with content below:

cns-large-pvc.yml:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: my-large-container-store
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 200Gi
  storageClassName: glusterfs-storage

⇨ Create this request for a large volume:

oc create -f cns-large-pvc.yml

The requested capacity in this PVC is larger than any single brick on nodes node-1.lab, node-2.lab and node-3.lab so it will be created from the bricks of the other 3 nodes which have larger bricks (500 GiB).

Where are now going to determine a PVCs physical backing device on CNS. This is done with the following relationships between the various entities of GlusterFS, heketi and OpenShift in mind:

PVC -> PV -> heketi volume -> GlusterFS volume -> GlusterFS brick -> Physical Device

⇨ First, get the PV

oc describe pvc/my-large-container-store

Note the PVs name:

    Name:       my-large-container-store
    Namespace:  my-test-project
    StorageClass:   app-storage
    Status:     Bound
    Volume:     pvc-078a1698-4f5b-11e7-ac96-1221f6b873f8
    Labels:     <none>
    Capacity:   200Gi
    Access Modes:   RWO
    No events.

⇨ Get the GlusterFS volume name of this PV, use your PVs name here, e.g.

oc describe pv/pvc-078a1698-4f5b-11e7-ac96-1221f6b873f8

The GlusterFS volume name as it used by GlusterFS:

    Name:       pvc-078a1698-4f5b-11e7-ac96-1221f6b873f8
    Labels:     <none>
    StorageClass:   app-storage
    Status:     Bound
    Claim:      my-test-project/my-large-container-store
    Reclaim Policy: Delete
    Access Modes:   RWO
    Capacity:   200Gi
    Message:
    Source:
        Type:       Glusterfs (a Glusterfs mount on the host that shares a pod's lifetime)
        EndpointsName:  glusterfs-dynamic-my-large-container-store
        Path:       vol_3ff9946ddafaabe9745f184e4235d4e1
        ReadOnly:       false
    No events.

Let’s programmatically determine and safe the relevant information so you don’t have to type all this stuff.
We need: the PV name, the respective GlusterFS volume’s, the name of the GlusterFS pod on the node node-6.lab and that node’s id in heketi and IP address in environment variables:

LARGE_PV=$(oc get pvc/my-large-container-store -o jsonpath="{.spec.volumeName}")
LARGE_GLUSTER_VOLUME=$(oc get pv/$LARGE_PV -o jsonpath="{.spec.glusterfs.path}")
POD_NUMBER_SIX=$(oc get pods -n app-storage -o jsonpath='{.items[?(@.status.hostIP=="10.0.4.206")].metadata.name}')
NODE_ID_SIX=$(heketi-cli topology info --json | jq -r ".clusters[] | select(.id==\"$FIRST_CNS_CLUSTER\") | .nodes[] | select(.hostnames.manage[0] == \"node-6.lab\") | .id")
NODE_IP_SIX=$(oc get pod/$POD_NUMBER_SIX -n app-storage -o jsonpath="{.status.hostIP}")

echo "LARGE_PV             = $LARGE_PV"
echo "LARGE_GLUSTER_VOLUME = $LARGE_GLUSTER_VOLUME"
echo "POD_NUMBER_SIX       = $POD_NUMBER_SIX"
echo "NODE_ID_SIX          = $NODE_ID_SIX"
echo "NODE_IP_SIX          = $NODE_IP_SIX"

⇨ Change to the CNS namespace

oc project app-storage

⇨ Log on to one of the GlusterFS pods

oc rsh $POD_NUMBER_SIX gluster vol info $LARGE_GLUSTER_VOLUME

The output indicates this volume is indeed backed by, among others, node-6.lab (see highlighted line)

Volume Name: vol_3ff9946ddafaabe9745f184e4235d4e1
Type: Replicate
Volume ID: 774ae26f-bd3f-4c06-990b-57012cc5974b
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.0.3.205:/var/lib/heketi/mounts/vg_e1b93823a2906c6758aeec13930a0919/brick_b3d5867d2f86ac93fce6967128643f85/brick
Brick2: 10.0.2.204:/var/lib/heketi/mounts/vg_3c3489a5779c1c840a82a26e0117a415/brick_6323bd816f17c8347b3a68e432501e96/brick
Brick3: 10.0.4.206:/var/lib/heketi/mounts/vg_62cbae7a3f6faac38a551a614419cca3/brick_a6c92b6a07983e9b8386871f5b82497f/brick
Options Reconfigured:
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on

⇨ Safe the brick directory served by node-6.lab in an environment variable:

BRICK_DIR=$(echo -n $(oc rsh $POD_NUMBER_SIX gluster vol info $LARGE_GLUSTER_VOLUME | grep $NODE_IP_SIX) | cut -d ':' -f 3 | tr -d $'\r' )
echo $BRICK_DIR

⇨ Using the full path of brick you can cross-check with heketi’s topology on which device it is based on:

heketi-cli topology info | grep -B2 $BRICK_DIR

Among other data grep will show the physical backing device of this brick’s mount path:

Id:62cbae7a3f6faac38a551a614419cca3   Name:/dev/xvdd           State:online    Size (GiB):499     Used (GiB):201     Free (GiB):298
            Bricks:
                Id:a6c92b6a07983e9b8386871f5b82497f   Size (GiB):200     Path: /var/lib/heketi/mounts/vg_62cbae7a3f6faac38a551a614419cca3/brick_a6c92b6a07983e9b8386871f5b82497f/brick

In this case it’s /dev/xvdd of node-6.lab.

Note

The device might be different for you. This is subject to heketi’s dynamic scheduling.

We will now proceed to disable and delete this device. For that we have to find and use it’s UUID in heketi.

Safe the heketi device’s ID from the brick on node-6.lab using the following definition of an environment variable, again by leveraging jq to parse the JSON output of heketi topology info:

FAILED_DEVICE_ID=$(heketi-cli topology info --json | jq ".clusters[] | select(.id==\"$FIRST_CNS_CLUSTER\") | .nodes[] | select(.hostnames.manage[0] == \"node-6.lab\") | .devices " | jq -r ".[] | select (.bricks[0].path ==\"$BRICK_DIR\") | .id")

⇨ Check the device ID that you have selected:

echo $FAILED_DEVICE_ID

Let’s assume this device on node-6.lab has failed and needs to be replaced.

In such a case you’ll take the device’s ID and go through the following steps:

⇨ First, disable the device in heketi

heketi-cli device disable $FAILED_DEVICE_ID

This will take the device offline and exclude it from future volume creation requests.

⇨ Now remove the device in heketi

heketi-cli device remove $FAILED_DEVICE_ID

You will notice this command takes a while.
That’s because it will trigger a brick-replacement in GlusterFS. The command will block and heketi in the background will transparently create new bricks for each brick on the device to be deleted. The replacement operation will be conducted with the new bricks replacing all bricks on the device to be deleted. During this time the data remains accessible.

The new bricks, if possible, will automatically be created in zones different from the remaining bricks to maintain equal balancing and cross-zone availability.

⇨ Finally, you are now able to delete the device in heketi entirely

heketi-cli device delete $FAILED_DEVICE_ID

⇨ Check again the volumes topology directly from GlusterFS

oc rsh $POD_NUMBER_SIX gluster vol info $LARGE_GLUSTER_VOLUME

You will notice that the brick from node-6.lab is now a different mount path, because it was backed by a new device.

⇨ Use the following to programmatically determine the new device heketi used to replace the one you just deleted:

NEW_BRICK_DIR=$(echo -n $(oc rsh $POD_NUMBER_SIX gluster vol info $LARGE_GLUSTER_VOLUME | grep $NODE_IP_SIX) | cut -d ':' -f 3 | tr -d $'\r' )
NEW_DEVICE=$(heketi-cli topology info --json | jq ".clusters[] | select(.id==\"$FIRST_CNS_CLUSTER\") | .nodes[] | select(.hostnames.manage[0] == \"node-6.lab\") | .devices " | jq -r ".[] | select (.bricks[0].path ==\"$NEW_BRICK_DIR\") | .name")

echo $NEW_DEVICE

If you cross-check again the new bricks mount path with the heketi topology you will see it’s indeed coming from a different device. The remaining device in node-6.lab, in this case /dev/xvdc

Tip

Device removal while maintaining volume health is possible in heketi as well. Simply delete all devices of the node in question as discussed above. Then the device can be deleted from heketi with heketi-cli device delete <device-uuid>