Note: previously, this GMP had stated that 1.4.0 of the Cloud Foundry Genesis Kit would be when the Consul components were officially removed. 1.4.0 was released on May 2nd, 2019, without this removal. This GMP has been updated to instead target the 1.5.0 release.
Overview
With BOSH DNS a mature product and an integral component in many BOSH deployments, we have begun replacing Consul service discovery with BOSH DNS. The Cloud Foundry Genesis Kit v1.3 is the first step in this two-step phase. v1.3 colocates BOSH DNS and disables Consul DNS, but does not remove Consul entirely. The result is a seamless upgrade from v1.2.X while also leveraging BOSH DNS for service discovery.
It is our plan to fully remove Consul from the Cloud Foundry Genesis
Kit in v1.5, which will include the deletion of all consul_etcd
virtual machines. This is the second step of the two-step phase, which
enables a zero-downtime upgrade path from Consul DNS to BOSH DNS.
However, if an operator wishes to remove all Consul related services
from their Cloud Foundry deployment without waiting for v1.5, they may
do so with a temporary feature flag implemented just for v1.3 called
migrate-1.3-without-consul
. This GMP will explain the steps
necessary to use this feature.
Impact
The impact of this feature is minimal to the runtime and stability of your Cloud Foundry deployment. The decision to either apply or not apply this feature to your deployments will not impact future upgradability of this environment. No extra steps are necessary to maintain your Cloud Foundry environment after this GMP is completed.
There is no downtime incurred by applying this feature.
The Process
Determining Eligibility
Operators who wish to remove all Consul services from their Cloud
Foundry deployment are only able to do so if their Genesis Cloud
Foundry is version v1.3. To determine the version of the deployed
Cloud Foundry environment, run the following command within your
cf-deployments
folder:
genesis info [environment name]
Where [environment name]
is the name of the Cloud Foundry
environment you wish to check.
The result will look like this:
CF Deployment for Environment 'genesis-lab'
Last deployed about 30 minutes ago (08:32PM on Dec 19, 2018 UTC)
by David
to BOSH genesis-lab
based on kit cf/1.3
using Genesis v2.6.12
with manifest .genesis/manifests/buffalo-lab.yml (redacted)
The based on kit
line must read cf/1.3
. If it does not, please
upgrade your Cloud Foundry installation with the v1.3 Cloud Foundry
Genesis Kit before starting.
If your based on kit
line reads cf/1.3
, and you wish to remove
Consul entirely, please continue reading this guide.
Removing Consul
After ensuring the environment you wish to remove Consul from is on
v1.3, edit your Genesis Environment Manifest and append
migrate-1.3-without-consul
to the feature list array. The end result
will be something similar to:
kit: name: cf version: 1.3 features: - local-blobstore - local-ha-db - migrate-1.3-without-consul params: env: genesis-lab [...]
Once the feature has been added to your Genesis Environment Manifest,
apply the changes with a genesis deploy
:
genesis deploy [environment name]
This will remove any consul_agents
colocated on VMs, as well as
removing all consul_etcd
VMs currently deployed. There will be no
downtime during this deployment.
Deployments With Multiple Cell Instance Groups
If your deployment has multiple instance groups for cells (possibly for isolation segments), then you will need to add additional DNS aliases, or else the Auctioneer will not be able to locate Rep instances on the additional instance groups. As a result, applications will not be deployed to the additional cell instance groups.
To map the correct DNS aliases to the additional cell instance groups, add the following to your environment file.
addons:
- name: bosh-dns-aliases
jobs:
- name: bosh-dns-aliases
release: bosh-dns-aliases
properties:
aliases:
- domain: '_.cell.service.cf.internal'
targets:
- (( append ))
- query: '_'
instance_group: extra-cell-instance-group-1-name
deployment: (( grab name ))
network: (( grab params.cf_runtime_network ))
domain: bosh
- query: '_'
instance_group: extra-cell-instance-group-2-name
deployment: (( grab name ))
network: (( grab params.cf_runtime_network ))
domain: bosh
Add a block under the targets array for each additional instance
group that you have. Change the instance_group
value to the name
of each additional cell instance group. To be clear, you do not
need to add one for the instance group named cell.
Verification & Closing Notes (Final step)
Once the deployment is complete, you can verify that Consul is no longer used in your environment by running:
bosh -e [environment name] -d [deployment name] vms
Where [environment name]
is the name of your BOSH director alias,
and [deployment name]
is the name of your Cloud Foundry environment.
If you do not see any consul_etcd
VMs listed, the feature was
successfully applied.
It is necessary to keep the migrate-1.3-without-consul
feature until
the v1.5 update is released. Genesis Cloud Foundry Kit v1.5 will
remind you to remove the feature, as it will no longer be needed.
Help & Support
If you have concerns about the impact of this migration process, or need assistance running through it, please don't hesitate to find us in #help on Slack.