Note: The recipes in this document need a kubernetes (k8s) provider for the examples. This document will use IBM Cloud Private (ICP) as the k8s platform. Other providers will be slightly different, but the principles and general methodology should be the same for any k8s provider. Recipes may be provided for various linux flavors, again, the principles are the same, specific details may vary.

Introduction

Kubernetes can consume storage solutions deployed either as a part of a cluster (internal storage) or storage provided by an external service (external storage).

Deploying a workload storage solution as a part of a cluster (internal storage) will limit access to the storage to workloads running inside the cluster. This will be more secure than deploying an external instance of the storage provider, but also limits who can consume the storage unless steps are taken to expose the storage outside the cluster.

The use case will determine which you choose to deploy.

If your objective is to have a large storage solution which can be consumed by multiple kubernetes clusters or non-kubernetes workloads, then your solution should include an instance of the storage provider which is external to the k8s cluster and then integrate your k8s instance with that storage provider via storage classes.

If your objective is to have a secure storage solution which is only supporting a single k8s cluster, then you will want to deploy a storage provider inside the cluster and point your storage classes to the internal instance.

Another alternative is to have a single k8s cluster which implements an internal storage provider but exposes the API service outside the cluster. This could then be used as a storage provider for other k8s clusters.

It should be noted that the recipes in this document will describe implementing block and filesystem storage for workloads only (not platform storage) and, other than S3 storage providers, will not consider object storage.

Things to Consider Prior to Choosing a Storage Solution

As with all cloud resources, high availability is of primary concern. When creating a storage provider either internally or externally, the instance should be architected such that an entire node can fail and the storage provider can survive the failure.

k8s is designed to be highly elastic and fungible. Nodes and processes can fail and k8s will just redeploy them elsewhere assuming adequate available resources.

Consider a situation where an internal GlusterFS instance has been created in a smaller k8s environment and three storage nodes are provisioned. If the utilization of the overall storage environment reaches is high and the implementation of the storage provider does not have enough resources to spread the load well, data could be lost and/or the storage provider could go down altogether.

Just like master nodes, storage nodes should run with anti-affinity rules such that no two storage nodes are running on the same physical infrastructure. This way the loss of a single physical node does not take out 2/3 of your available capacity.

As a rule, a larger volume of storage nodes spread accross a larger number of physical nodes will be a better solution with a larger tolerance for failure.

One should also avoid running multiple Ceph OSDs or Gluster Bricks on the same node. It does no good to have 9 GlusterFS bricks in your environment if they are running on only three nodes and one of the nodes fails. In this case, you have still lost 30% of your capacity with one failure.

Different storage technologies can handle various amounts of loss (see product documentation for specific details for your chosen technology). A solution architect should be aware of those details and ensure whatever solution they choose to implement can handle the loss of at least one node.

The number of nodes you choose to tolerate may be more than one based on business need, but any solution should take advantage of the traits of the k8s platform and ensure the storage infrastructure is properly architected.

Recipes

Recipe 1: Integrating with an External NFS server

IMPORTANT: Whereas NFS is probably the easiest storage provider to configure for k8s and is useful in a development environment, IBM does not recommend using NFS for workload storage in production for a number of reasons.

The standard NFS storage provider for k8s is not dynamic and requires NFS PVs be manually created (or scripted) and manual processes are prone to errors and can cause a delay for end users if no PVs exist with the needed attributes. This requires an operator to manually intervene in the DevOps process which is against basic DevOps principles.
NFS is not secure.
NFS is slow.

If you need an NFS storage provider for dev purposes or it is required for some other reason, the following will help you make sure it is effective in a kubernetes environment.

Creating the NFS Server

Instructions for installing an NFS server differ slightly based on the operating system and instructions for doing so are ubiquitous on the internet. It is a fairly trivial exercise and will not be reproduced here. There are, however, a few recommendations for how to configure it once it is installed.

Recommendations for the NFS server:

Use LVM device mapping so that additional storage space can be added if needed.
Do not give global mount permissions for NFS paths. This could allow different users on different clusters to mount the same path and overwrite each other. Each k8s cluster should have its own mount path.
If sharing the NFS server with other non-k8s services, each other server should have its own mount permissions and mount path with no globol mount permissions. If a user can mount an NFS path via k8s and set the PV attribute to "reclaim", when the workload is removed k8s will run rm -rf / at the mount path which will result in permanently deleting anything on that path. Extreme caution should be exercised to prevent this possibility.
Make sure attributes on the mount path include "sync" to prevent sync errors when consuming ReadWriteMany (RWX) PVs.

  admin@nfs-server:~$ cat /etc/exports
  /storage/cluster1 1.2.3.4(rw,no_subtree_check,sync,insecure,no_root_squash)

Consuming NFS storage

Create a new .yaml file with contents similar to the following:

# myNfsVolume1.yaml
kind: PersistentVolume
apiVersion: v1
metadata:
  name: myNfsVolume1
spec:
  capacity:
    storage: 5Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Recycle
  nfs:
    path: /storage/cluster1/myNfsVolume1
    server: 1.2.3.4

This PV will have a total of 5GB of storage space, allow only one pod to have access, and when the deployment that is consuming it is deleted, its contents will be wiped and it will be placed back in the pool.

Install your new PV using kubectl:

  kubectl create -f myNfsVolume1.yaml

Recipe 2: Deploying an External Ceph Instance and Integrating with ICP

Ceph is short for "cephalopod", a class of mollusks of which the octopus is a member. The octopus is used as the logo for Ceph and this name was chosen because the parallel processing nature of both the octopus and the software.

There are two ways of integrating ICP with Ceph. The Ceph Rook open source project provides support for running the entire Ceph environment within a k8s cluster, utilizing local (hostpath) storage as raw disk for the OSD nodes. This method will install all of Ceph as workloads running within the cluster. This is the subject of a separate recipe and this recipe will focus on installing Ceph external to your k8s clsuter.

For a more robust solution we will deploy a Ceph implementation outside of ICP and use that as the provider for a storage class which will allow any application to consume Ceph rbd dynamic storage.

Doing so requires an external CEPH infrastructure. This document will walk through installing Ceph and integrating it with ICP for dynamic storage provisioning.

Ceph Architecture

We will use a distributed storage architecture. We will have three management nodes and three storage/compute nodes. Each storage/compute node has one disk with the operating system (/dev/sda) and two available raw disks for Ceph to consume (/dev/sdb, and /dev/sdc).

Storage Architecture

Each node is connected to the network via two Mellanox ConnectX-4 cards configured for bonded 802.3ad link aggregation for 200Gb/s combined throughput. This provides for a hyperconverged and highly available architecture for both storage and data network traffic. OSD nodes do not host management functions and vice versa.

The network architecture connecting these nodes is similar to that depicted in this diagram taken from the Cumulus documentation.

Network Architecture

All management nodes are redundant for high availability and management nodes should not run on the same physical nodes as storage nodes, so to implement this solution, you should use at least two servers (or VMs), one for hosting management nodes and one for hosting storage nodes.

In a highly available environment, it is recommended to run three management nodes and at least three storage nodes so that it can handle the loss of any single node.

In this document we will use ubuntu 18.04 as our host operating system.

The hosts are named node1 - node6, respectively, and node1 will be our admin node. node2 and node3 will be additional management nodes for HA and node4 - node6 will be compute/storage nodes.

We will first install Ceph and test creating and mounting a block device. Then we will integrate with ICP.

Prepare for Ceph installation

Download and install ceph-deploy

IMPORTANT: Run the following commands as the root user

Add the release Key

  wget -q -O- 'https://download.ceph.com/keys/release.asc' | sudo apt-key add -

Add the CEPH packages to your repository

  echo deb https://download.ceph.com/debian-{ceph-stable-release}/ $(lsb_release -sc) main | sudo tee /etc/apt/sources.list.d/ceph.list

replace {ceph-stable-release} with the release you would like to install e.g. mimic - debian-mimic

Install ntp on all nodes

  apt-get install -y ntp

If there is a local ntp server on your network, update /etc/ntp.conf with your local pool server and restart ntp.

Install python on all nodes

  apt-get install -y python

Update and install

  apt-get update
  apt-get install -y ceph-deploy

Create a ceph-deploy user on all nodes.

  useradd -m -s /bin/bash -c "ceph deploy user" ceph-deploy
  echo "ceph-deploy:Passw0rd!" | sudo -S chpasswd

Add ceph-deploy user to passwordless sudo on all nodes

  echo 'ceph-deploy   ALL=(root) NOPASSWD:ALL' |sudo EDITOR='tee -a' visudo

IMPORTANT: Run the following commands as the ceph-deploy user

Enable running Ceph commands easier on other nodes

Login as your ceph-deploy user and create the following file at ~/.ssh/config. You may need to create both the /home/ceph-deploy/.ssh directory and the config file.

  Host node1
   Hostname node1
   User ceph-deploy
Host node2
   Hostname node2
   User ceph-deploy
Host node3
   Hostname node3
   User ceph-deploy
Host node4
    Hostname node4
    User ceph-deploy
 Host node5
    Hostname node5
    User ceph-deploy
 Host node6
    Hostname node6
    User ceph-deploy

Enable passwordless SSH for the ceph-deploy user from the admin node to all other nodes. Execute these commands as the the ceph-deploy user. Accept all defaults.

  ssh-keygen -t rsa -P ''

This will create ssh public and private keys in ~/.ssh . Copy the keys to all other nodes:

  ssh-copy-id -i ~/.ssh/id_rsa ceph-deploy@node1
  ssh-copy-id -i ~/.ssh/id_rsa ceph-deploy@node2
  ssh-copy-id -i ~/.ssh/id_rsa ceph-deploy@node3
  ssh-copy-id -i ~/.ssh/id_rsa ceph-deploy@node4
  ssh-copy-id -i ~/.ssh/id_rsa ceph-deploy@node5
  ssh-copy-id -i ~/.ssh/id_rsa ceph-deploy@node6

It will ask you for the password for the ceph-deploy user, answer with the password you created when you created the user. When this is complete you should be able to execute ssh ceph-deploy@node1 and get from the ceph-deploy user on the admin host to the remote host without providing a password.

IMPORTANT: Make sure you copy the ID back to the local node (node1) as well so the process can ssh back to itself.

Deploy Ceph Execute the following commands as the ceph-deploy user on the admin node.

Create the cluster

From the ceph-deploy user's home directory:

  mkdir mycluster
  cd mycluster
  ceph-deploy new node1

Install Ceph on all nodes

  ceph-deploy install node1 node2 node3 node4 node5 node6

Deploy the initial monitor and gather the keys

  ceph-deploy mon create-initial

Copy the admin config files to all nodes

  ceph-deploy admin node1 node2 node3 node4 node5 node6

Deploy a manager node

  ceph-deploy mgr create node1

Deploy storage nodes

The data should be the raw device name of an unused raw device installed in the host. The final parameter is the hostname. Execute this command once for every raw device and host in the environment.

  ceph-deploy osd create --data /dev/sdb node4
  ceph-deploy osd create --data /dev/sdc node4

  ceph-deploy osd create --data /dev/sdb node5
  ceph-deploy osd create --data /dev/sdc node5

  ceph-deploy osd create --data /dev/sdb node6
  ceph-deploy osd create --data /dev/sdc node6

Install a metadata server

  ceph-deploy mds create node1

Deploy the object gateway (S3/Swift) (optional)

  ceph-deploy rgw create node1

Deploy mgr to standby nodes for HA (optional)

On the admin node edit /home/ceph-deploy/mycluster/ceph.conf file and update the mon_initial_members, mon_host, and public_network values to reflect the additional nodes. The resulting file should look something like this:

  [global]
  fsid = 264349d2-8eb0-4fb3-9992-bbef4c2759cc
  mon_initial_members = node1,node2,node3
  mon_host = 10.10.2.1,10.10.2.2,10.10.2.3
  public_network = 10.10.0.0/16
  auth_cluster_required = cephx
  auth_service_required = cephx
  auth_client_required = cephx

Then deploy the new nodes:

  ceph-deploy --overwrite-conf mon add node2
  ceph-deploy --overwrite-conf mon add node3

Add additional mgr nodes for resiliency

  ceph-deploy --overwrite-conf mgr add node2
  ceph-deploy --overwrite-conf mgr add node3

Check the status of your cluster

  sudo ceph -s

The result should look something like this:

  cluster:
    id:     2fdde238-b426-4042-8cf3-6fc9a151cb9b
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum node1,node2,node3
    mgr: node1(active), standbys: node2, node3
    osd: 6 osds: 6 up, 6 in
    rgw: 1 daemon active

  data:
    pools:   4 pools, 1280 pgs
    objects: 221  objects, 1.2 KiB
    usage:   54 GiB used, 11 TiB / 11 TiB avail
    pgs:     1280 active+clean

You should see HEALTH_OK. If not, look for your error message in the troubleshooting section below.

If you did not install the rados gateway (rgw) then you will not yet see any pools defined.

The likelihood is that your health message will say something like:

  health: HEALTH_WARN
     too few PGs per OSD (3 < min 30)

If you do not see this error, you can skip this section until you do see it (and you will).

A PG is a "placement group" and governs how data is stored in your environment. A full discussion of how this works is beyond the scope of this document, but resolving the warning can be done without knowing all of the details.

For more information on this number see:

http://docs.ceph.com/docs/giant/rados/operations/placement-groups/
https://stackoverflow.com/questions/39589696/ceph-too-many-pgs-per-osd-all-you-need-to-know

There are two numbers that are important to modify to resolve this issue, the first is the PGs and the second is the PGPs. The PG is the number of placement groups available and the PGP is the number that are applied to your implementation. Any time you increase the PGs you should also increase the number of PGPs.

The documentation recommends using PG numbers with powers of 2 (2, 4, 16, 32, 64, 128,...). The simple solution to this issue is to start with a smaller number, apply it and see what the status says. If it is still too small, continue to apply ever larger powers of 2 until the warning goes away.

To change the number of PGs and PGPs, us the following command against every pool in your environment.

To see the pools in your environment use the command:

  sudo ceph osd lspools

Which should result in a list that looks something like this:

  1 .rgw.root
  2 default.rgw.control
  3 default.rgw.meta
  4 default.rgw.log

For each pool in the list execute:

  sudo ceph osd pool set [pool name] pg_num <number>
  sudo ceph osd pool set [pool name] pgp_num <number>

Example:

  sudo ceph osd pool set .rgw.root pg_num 32
  sudo ceph osd pool set .rgw.root pgp_num 32

Then check your status and see if you need to raise it further. Continue increasing the number at the end of that command by powers of 2 until the warning goes away.

Once you have a healthy cluster you can start using your new storage.

The following command will show you all of your storage devices and their status.

Note: OSD = Object Storage Daemon

  sudo ceph osd tree

The result should look something like this:

  -1       192.85042 root default
  -3        87.32849     host node4
   0   ssd   3.63869         osd.0      up  1.00000 1.00000
   1   ssd   3.63869         osd.1      up  1.00000 1.00000
  -5        47.30293     host node5
  24   ssd   3.63869         osd.2     up  1.00000 1.00000
  25   ssd   3.63869         osd.3     up  1.00000 1.00000
  -7        14.55475     host node6
  37   ssd   3.63869         osd.4     up  1.00000 1.00000
  38   ssd   3.63869         osd.5     up  1.00000 1.00000

Test your newly installed Ceph instance

Create and mount a block device

Block devices are the most commonly used types of storage provisioned by Ceph users. Creating and using them is relatively easy once your environment is up and running.

Block devices are known as rbd devices (Rados Block Device). When you create a new block device and attach it to your filesystem it will show up as /dev/rbd0, /eev/rbd1, etc.

Before you can create a block device you need to create a new pool in which they can be stored.

  sudo ceph osd pool create rbd 128 128

NOTE: The two numbers at the end of this command are the PG and PGP for this pool. As a start, you should use the same values you used to get the health warning error to go away. These values may need to be changed based on the size of your environment and number of pools as per the above discussion.

Once your pool has been created you can then create a new image in that pool. An image is block storage on which you can create a filesystem and is analogous to a virtual disk.

  sudo rbd create myimage --size 10240 --image-feature layering

This command will create a new 10GB disk named "myimage" suitable for mounting on your filesystem. The --size parameter is in MB.

To view the images in your pool use sudo rbd ls

Now, ssh to the machine on which you want to mount this image.

Before the storage can be mounted you must install the Ceph client on the target machine.

  sudo apt-get install -y ceph-common ceph-fuse

Create yourself a mount point:

  sudo mkdir /mnt/myimage
  sudo rbd map myimage --name client.admin

myimage is the name of the image you created previously everything else should be exactly as shown.

The result of this command is a new device named /dev/rbd0.

Next, put a filesystem on your new block device :

  sudo mkfs.ext4 -m0 /dev/rbd0

... and mount your new filesystem at your created mount point:

  mount /dev/rbd0 /mnt/myimage

Now, if you do an ls on your newly mounted filesystem you should see a lost+found directory indicating the root of a partition.

Remove your test configuration

Remove your test mount

  umount /mnt/myimage

Remove the rbd image from your Ceph instance

  sudo rbd unmap myimage --name client.admin

Remove the pool

  sudo ceph osd pool delete rbd

Interating ICP with CEPH

Create an rbd pool for use with ICP

  sudo ceph osd pool create icp rbd 1024 1024

Create a new ceph user for use with ICP

  sudo ceph auth get-or-create client.icp mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=icp' -o ceph.client.kube.keyring

To deploy images as this user, you will need to create a keyring file for your worker nodes.

  sudo ceph auth get client.icp > ./ceph.client.icp.keyring

Copy this keyring to /etc/ceph

  cp ./ceph.client.icp.keyring /etc/ceph/

Retrieve the Ceph admin key as base64

  sudo ceph auth get-key client.admin |base64

This should return something like: QVFDSGhYZGIrcmc0SUJBQXd0Yy9pRXIxT1E1ZE5sMmdzRHhlZVE9PQ==

Retrieve the Ceph ICP key as base64

  sudo ceph auth get-key client.icp |base64

This should return something like: QVFERUlYNWJKbzlYR1JBQTRMVnU1N1YvWDhYbXAxc2tseDB6QkE9PQ==

Before you can use ceph with ICP you must have the ceph-common package installed on each worker node

  apt-get install -y ceph-common

You will now need to copy the ceph.client.admin.keyring and ceph.icp.admin.keyring to the /etc/ceph directory on each worker node

  scp root@cephnode:/etc/ceph/ceph.client.admin.keyring /etc/ceph/
  scp root@cephnode:/etc/ceph/ceph.icp.admin.keyring /etc/ceph/

Note: If you do not have remote root login enabled (and you shouldn't) then you will need to copy these files to a temporary location which is accessible to a non-root user and then as root, ssh them to your worker nodes and move them to the /etc/ceph directory. Then delete them from the intermediate location.

Create a new file named ceph-secret.yaml with the following contents:

  apiVersion: v1
  kind: Secret
  metadata:
    name: ceph-secret
    namespace: kube-system
  data:
    key: QVFBOFF2SlZheUJQRVJBQWgvS2cwT1laQUhPQno3akZwekxxdGc9PQ==
  type: kubernetes.io/rbd

Create the secret in ICP

Use the ICP UI to configure your kubectl client and create the 'ceph-secret' secret with the following command:

  kubectl create -f ./ceph-secret.yaml

Create a new file named ceph-user-secret.yaml with the following contents:

  apiVersion: v1
  kind: Secret
  metadata:
    name: ceph-user-secret
    namespace: default
  data:
    key: QVFCbEV4OVpmaGJtQ0JBQW55d2Z0NHZtcS96cE42SW1JVUQvekE9PQ==
  type: kubernetes.io/rbd

Where data.key is the key retrieved from ceph for the client.icp user.

Create the user secret in ICP

Use the ICP UI to configure your kubectl client and create the 'ceph-user-secret' secret in the default namespace with the following command:

  kubectl create -f ./ceph-user-secret.yaml

Important Note: Because this user was created in the 'default' namespace (as noted in metadata.namespace above) This storage class can only be used in the default namespace.

To use Ceph dynamic provisioning in other namespaces you must create the same user secret in every namespace where you want to deploy Ceph dynamic storage.

Because the storage class specifically references "ceph-user-secret" the secret should always have this name no matter what namespace is used.

Create the Ceph RBD Dynamic Storage Class

Create a file named 'ceph-sc.yaml' with the following contents:

  apiVersion: storage.k8s.io/v1beta1
  kind: StorageClass
  metadata:
    name: ceph
    annotations:
      storageclass.beta.kubernetes.io/is-default-class: "true"
  provisioner: kubernetes.io/rbd
  parameters:
    monitors: 10.10.0.1:6789,10.10.0.2:6789,10.10.0.3:6789
    adminId: admin
    adminSecretName: ceph-secret
    adminSecretNamespace: kube-system
    pool: icp
    userId: icp
    userSecretName: ceph-user-secret
    fsType: ext4
    imageFormat: "2"

Where parameters.monitors are the IP addresses and ports of all Ceph monitor nodes, comma separated.

Remove metadata.annotations.storageclass.* if this should not be the default storage class.

IMPORTANT: As noted above, there will need to be a secret named ceph-user-seret in each of the namespaces where will use Ceph dynamic storage provisioning. They should all be the same with the admin key for the userId user.

Test your new storage class by creating a new PV from the ceph pool.

Create a file named ceph-pvc.yaml with the following contents:

  kind: PersistentVolumeClaim
  apiVersion: v1
  metadata:
    name: ceph-claim
  spec:
    accessModes:
      - ReadWriteOnce
    resources:
      requests:
        storage: 2Gi

Create the PV with the following command:

  kubectl create -f ./ceph-pvc.yaml

Check the status of your new PVC:

  kubectl get persistentvolumes

  root@master:/opt/icp/ceph# kubectl get persistentvolumes
  NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS    CLAIM                                       STORAGECLASS               REASON    AGE
  helm-repo-pv                               5Gi        RWO            Delete           Bound     kube-system/helm-repo-pvc                   helm-repo-storage                    6d
  image-manager-10.10.10.1                   20Gi       RWO            Retain           Bound     kube-system/image-manager-image-manager-0   image-manager-storage                6d
  logging-datanode-10.10.10.3                20Gi       RWO            Retain           Bound     kube-system/data-logging-elk-data-0         logging-storage-datanode             6d
  mongodb-10.10.10.1                         20Gi       RWO            Retain           Bound     kube-system/mongodbdir-icp-mongodb-0        mongodb-storage                      6d
  pvc-3b037115-a686-11e8-9387-5254006a2ffe   2Gi        RWO            Delete           Bound     default/ceph-pv-test                        ceph                                 1m

Look for a PV with a storage class of "ceph"

or

In the ICP UI, navigate to Platform->Storage and look for a PV of type "RBD": Ceph PVC

List your created PVs from Ceph:

On your ceph admin or monitor node execute:

  sudo rbd list

You should see something like:

  $ sudo rbd list
  kubernetes-dynamic-pvc-7d5c8c11-a687-11e8-9291-5254006a2ffe

Remove your test PVC with the following command:

  kubectl delete -f ./ceph-pvc.yaml

To use your storage class with a deployment you must install the Ceph client on all schedulable nodes. Execute the following on all ICP worker nodes:

  apt-get install -y ceph-common

Copy /etc/ceph/ceph.conf, /etc/ceph/ceph.client.icp.keyring, and /etc/ceph/ceph.client.admin.keyring from your Ceph admin node to each worker node.

From each worker node as root execute:

  scp root@ceph-admin:/etc/ceph/ceph.conf /etc/ceph/
  scp root@ceph-admin:/etc/ceph/*.keyring /etc/ceph/

or

Copy these files to the boot node and use scp to move them to all worker nodes without having to login once for each worker node (assuming you have configured passwordless ssh from your boot node to all your worker nodes)

Depending on the version of Ceph and kubernetes you are using you may get an error when attempting to deploy a pod using Ceph dynamic storage. The event will look something like this:

  MountVolume.WaitForAttach failed for volume "pvc-f00434db-a8d6-11e8-9387-5254006a2ffe" : rbd: map failed exit status 110, rbd output: rbd: sysfs write failed In some cases useful info is found in syslog - try "dmesg | tail" or so. rbd: map failed: (110) Connection timed out

Further investigation into the syslog file on the worker node should show an entry something like this:

  Aug 25 13:32:50 worker1 kernel: [745415.055916] libceph: mon0 10.10.2.1:6789 feature set mismatch, my 106b84a842a42 < server's 40106b84a842a42, missing 400000000000000

... along with a bunch of other error messages

This error message indicates a missing feature flag in the Ceph client. The feature missing is CRUSH_TUNABLES5

To resolve this issue execute the following command on your Ceph admin or monitor node:

  sudo ceph osd crush tunables hammer

Your pod should now finish provisioning.

Recipe 3: Deploying an Internal Ceph/Rook Instance

Recipe 4: Deploying an Internal GlusterFS Instance

Recipe 5: Deploying an External GlusterFS Instance

Recipe 6: Integrating with an existing VMware vCenter 6.5 Instance

It is important to note that using VMware storage for dynamic storage provisioning uses the VMware API and this is not vSAN. vSAN is a completely separate technology which takes storage local to multiple hypervisors and makes it available as a VMware datastore.

To use VMware for dynamic storage provisioning you need an existing VMware datastore and access to the API server with the needed credentials for creating storage volumes. That is what this tutorial will focus on.

All of the prerequisites noted here must be complete prior to installing ICP.

Configure vSphere for use by ICP

First, to use dynamic storage provisioning on VMware you must create a user with the proper VMware permissions that can be used to interact with VMware.

The IBM knowledgebase for this topic can be found at https://www.ibm.com/support/knowledgecenter/en/SSBS6K_2.1.0.3/manage_cluster/add_vsphere.html.

Prerequisites for creating the vssphere storage can be found at https://www.ibm.com/support/knowledgecenter/SSBS6K_2.1.0.3/installing/cloud_provider_vsphere.html#prereq

It should be noted that as of this writing, only the ReadWriteOnce storage access mode.

The following are important restrictions on using VMware for your dynamic storage.

All IBM® Cloud Private cluster nodes must be under one vSphere VM folder.
All IBM Cloud Private master nodes must be able to access vCenter.
The node host name must be same as the VM name.
Node host names must comply with the regex: [a-z]?(\.[a-z0-9]?)*, and must also comply with the following restrictions:
- They must not begin with numbers.
- They must not use capital letters.
- They must not have any special characters except . and -.
- They must contain at least three characters but no more than 63 characters.
- The disk UUID on the node VMs must be enabled: the disk.EnableUUID value must be set to True.
- The user that is specified in the vSphere cloud configuration must have privileges to interact with vCenter.

You will ned to create a vCenter user to use for the dynamic storage provisioning. We will use "icpadmin" as our user. We also need to create a few roles and assign them to this user so that it has the correct authority. The following information is correct for VMware 6.5 and may not be applicable for other versions.

Role 1: manage-k8s-node-vms

  Privileges:
    * Resource: Assign virtual machine to resource pool
    * Virtual Machine -> Configuration: Add existing disk
    * Virtual Machine -> Configuration: Add new disk
    * Virtual Machine -> Configuration: Add or remove device
    * Virtual Machine -> Configuration: Remove disk
    * Virtual Machine -> Inventory: Create from existing
    * Virtual Machine -> Inventory: Create new
    * Virtual Machine -> Inventory: Create remove

Role 2: manage-k8s-volumes

  Privileges:
    * Datastore -> Allocate space
    * Datastore -> Browse datastore
    * Datastore -> Configure datastore
    * Datastore -> Low level file operations
    * Datastore -> Remove file
    * Datastore -> Update virtual machine files
    * Datastore -> Update virtual machine metadata

Role 3: k8s-system-read-and-spbm-profile-view

  Privileges:
    * Storage views: Configure service
    * Storage views: view

Role 4: ReadOnly

  Privileges: none

Alternatively, you can just create a single role called "icpadmin" which has all of these privileges.

Next, assign the following roles for this user on all the needed vmware objects:

  * Datacenter (only): k8s-system-read-and-spbm-profile-view
  * Cluster (propogate ): manage-k8s-node-vms
  * VM Folder (propogate): manage-k8s-node-vms
  * Target Datastore (only): manage-k8s-volumes
  * Datacenter, Datastore Cluster, and Datastore storage folder: ReadOnly

If you have created one role with all needed privileges, just assign that role to that user for all of the entities noted above: All pertinent Datacenters, clusters, hosts, resource pools, datastores, and folders.

Configure ICP for vSphere storage

On your ICP boot node, open the config.yaml file and add the following (spacing is important, use spaces and not tabs):

  kubelet_nodename: hostname
  cloud_provider: vsphere
  vsphere_conf:
     user: "<vCenter username for vSphere Cloud Provider>"
     password: "<password for vCenter user>"
     server: <vCenter server IP or FQDN>
     port: [vCenter Server Port; default: 443]
     insecure_flag: [set to 1 if vCenter uses a self-signed certificate]
     datacenter: <datacenter name on which Node VMs are deployed>
     datastore: <default datastore to be used for provisioning volumes>
     working_dir: <vCenter VM folder path in which node VMs are located>

Example:

  kubelet_nodename: hostname
  cloud_provider: vsphere
  vsphere_conf:
     user: icpadmin
     password: Passw0rd!
     server: 1.2.3.4
     port: 443
     insecure_flag: 1
     datacenter: CSPLAB
     datastore: ExternalDemo
     working_dir: my-icp-2103

Deploy your ICP instance as per normal.

Once your instance has been deployed you must create a storage class to consume the storage.

First, Create a .yaml file (vsphere.yaml) with the following contents:

  kind: StorageClass
  apiVersion: storage.k8s.io/v1
  metadata:
    name: vsphere
    annotations:
      storageclass.kubernetes.io/is-default-class: "true"
  provisioner: kubernetes.io/vsphere-volume
  parameters:
    diskformat: thin
    datastore: MyDatastore

The diskformat can be thin,zeroedthick, or eagerzeroedthick. The datastore should be changed to specify the name of the datastore where you want your volumes created.

Configure kubectl to point to your ICP instance.

Deploy your new storage class

  kubectl create -f vsphere.yaml

Now, when installing helm charts you can specify dynamic storage using the 'vsphere' storage class and it will dynamically provision your PV to the specified datastore.

Table of Contents