HOT Session: Quality Gates

Services HOT Session: Quality Gates

Welcome to the second APAC Services Hands on Training session. This session will focus on quality gates.

Recap of Yesterday
What Next For Your Customer?

ace logo

Please ensure:

You can still login to your DT Managed environment
You can SSH into your instance
You have a customer tag working which tags the process groups and services accordingly for each customer
Verify that you have consistent traffic to your customer services in Dynatrace
Your Dynatrace API Token has the following permissions

api token permissions

Today we will:

Dynatrace SLO Capability vs. Keptn
Explain and Install Keptn
Define a Service Level Indicator using Dynatrace as the metric provider
Define a Service Level Objective
Create a quality gate with Keptn for a customer in their environment
Receive quality evaluations for our customer environment
Interact with Keptn via the CLI and API
Use third party tooling (JIRA) to more easily consume our quality reports

You will be working in groups but this is not a race or competition!
I will be asking for group progress reports, not personal progress reports
The training content will be available online to your group
The session will not be recorded in case we share screens. I will record the session seperately

Before Lunch

There will be a brief introduction section then we'll have a 10 minute break
When we return, I will open the breakout rooms and assign you to groups
In your group, you will work through the content at your own pace. COMMUNICATE AND HELP EACH OTHER!

Lunch

5 minutes before lunch I will recall everyone to the main room
As a group I'll ask you to tell me which slide number you're on so that I can judge progress
Lunch will be at the halfway mark (2hrs after the start of the session). Lunch will be 30 minutes

After Lunch

After lunch we will all meet in the main room. Teams that are further along can volunteer to assist teams who require assistance
Everyone will return to their working groups & continue working
If your group finishes please message me on Zoom / Slack / in person. I will join your room and we will discuss next steps

Throughout the Day

I will be moving between the rooms to assist teams
If your team is significantly ahead of others, I may ask you to assist other teams

A quality gate is a definition of quality that a piece of software should meet. If it does not meet the quality criteria, the software should not proceed and should be returned to the developer for a fix.

A quality gate should be a concrete, non-negotiable contract of quality for a given service.

A quality gate can combine many metrics that go to build up your quality signature. Your definition of "quality" is not restricted to performance metrics.

Think: Apart from performance metrics, which other metrics could denote "quality"?
Key Terms
- Service Level Indicator (SLI)
- Service Level Objective (SLO)
- Service Level Agreement (SLA)
- Error Budget

Think: In Product vs. Open Source

- Production: In Product (SLO & Error Budget tracking)

- CI/CD: Keptn (DT Supported version soon?)

dt-vs-keptn

Dynatrace SLO screen

dynatrace-slo-screen

ops-problem-meme

Today we will be using Keptn to build quality gates but...

⚠️ A Quality Gate does not require Keptn, but Keptn makes defining one easier

✔️ Tip: Discuss the concept and advantages of quality gates with customers. Don't focus on Keptn

✔️ Tip: Quality gates can (and should) encompass metrics from any tools (via Dynatrace?)

✔️ Tip: Quality gates are NOT only performance / availability based

keptn logo

Think of Keptn as an intelligent middleware that receives events from your environment and passes those events to "services" which then react to those events.

Think Events

It helps to think in terms of conceptual events and not specific tooling. Consider process of running and evaluating a quality gate:

Assuming my service is deployed and I have traffic running against the service, I need to:

Define the metrics I care about
Define what the thresholds should be
Define tool(s) I'll use to provide those metrics
Have a way to receive an event (which tells the tool to start a quality evaluation)
Have some mechanism that knows how to pull the metrics from particular tooling
Pull the metrics from tools(s)
Judge the retrieved metrics against my definition of quality
Output a quality decision which defines what state my test was in (pass, warning or fail)
React to that decision in some way

The bridge of a ship is the control room. The Keptn's bridge is your control room to oversee everything happening inside Keptn.

Access your bridge by going to http://keptn.VMIP.nip.io/bridge

Bridge username: keptn
Bridge password: dynatrace

keptns bridge

Take a 10 minute break and we will get hands when we return.

Hints for practical: All URLs, usernames and passwords are stored in ~/installOutput.txt

Tell Keptn which tool it should use to retrieve Service Level Indicators (SLIs).

Install the dynatrace-sli-service. Remove https:// and any trailing slashes from DT_TENANT.

Set some environment variables:

export DT_API_TOKEN=***
export DT_TENANT=dtmanaged.dynatrace.training/e/***

Check that you've set both of these correctly:

echo $DT_API_TOKEN
echo $DT_TENANT

Now create the secret. Keptn will use these details to authenticate with Dynatrace.

kubectl -n keptn create secret generic dynatrace --from-literal="DT_API_TOKEN=$DT_API_TOKEN" --from-literal="DT_TENANT=$DT_TENANT"

Install the service:

kubectl apply -n keptn -f https://raw.githubusercontent.com/keptn-contrib/dynatrace-sli-service/0.7.1/deploy/service.yaml

Verify that the pod is running in the keptn namespace. Look for the dynatrace-sli-service pod:

kubectl get pods -n keptn

NAME                                     READY   STATUS    RESTARTS   AGE
...                                      ...     ...       ...        ...
dynatrace-sli-service-595564cb65-xpx2j   2/2     Running   0          18s

We need to model our customer system inside Keptn. Keptn has 3 levels of configuration:

Project

The top level grouping. In our case, it makes sense to create one project per customer.

Stage

This corresponds to our logical stages. Our customers have two stages: staging and production

Service

Typically this models the microservice. Our customers have one service in each environment: The web service.

Create a new file called shipyard.yaml. A shipyard file is the way that Keptn models the stages inside your project.

In our case, we want one stage: staging

stages:
  - name: "staging"

Now create a Keptn project for Customer A and use the shipyard file you defined in the previous step:

keptn create project customer-a --shipyard=shipyard.yaml

$ keptn create project customer-a --shipyard=shipyard.yaml
...
Starting to create project
ID of Keptn context: ...
Project customer-a created
Stage staging created
Project successfully created

customer-a-project

Now we create our staging-web service for customer-a:

keptn create service staging-web --project=customer-a

$ keptn create service staging-web --project=customer-a
Starting to create service
ID of Keptn context: ...
Creating new Keptn service staging-web in stage staging

customer-a-project

Tell Keptn to use the dynatrace-sli-service to receive metrics from Dynatrace for the customer-a project:

keptn configure monitoring dynatrace --project=customer-a

Our quality gate will evaluate a single SLI:

The 95th percentile response time for the relevant service (eg. customer-a in staging)

First make sure you understand how this metric is pulled out of Dynatrace.

Use the Dynatrace metrics API v2 to pull the 95th percentile figure for the web service in staging for customer-a.

Navigate to Settings > Integration > Dynatrace API > Environment API v2
Use the Authorize button with your Dynatrace API token
Use the Metrics set of endpoints
Open the GET /metrics/query dropdown

Set the metricSelector to:

builtin:service.response.time:percentile(95)

Set the entitySelector to:

type(SERVICE),tag(customer:customer-a),tag([KUBERNETES]stage:staging)

{
  "totalCount": 1,
  "nextPageKey": null,
  "result": [{
      "metricId": "builtin:service.response.time:percentile(95)",
      "data": [{
          "dimensions": [ "SERVICE-..."],
          "timestamps": [
            ...
            1606888800000
          ],
          "values": [
            ...
            837.3563350144964,
            921.5641993975936
          ]
        }]
    }]
}

Validate

Navigate to your customer-a service in staging in Dynatrace and notice that the SERVICE-* ID matches the dimension in the REST API call. Proof that you've pulled the metrics for the correct service.

Store this metric as code so that we can tell Keptn to use it.

Here we can use some special variables:

$PROJECT refers to the project name (in this case customer-a)
$STAGE refers to the Keptn stage (in this case staging-web)

Create a new file called sli.yaml. Do not modify the content below:

---
spec_version: '1.0'
indicators:
  response_time_p95: "builtin:service.response.time:percentile(95)?scope=type(SERVICE),tag(customer:$PROJECT),tag([KUBERNETES]stage:$STAGE)"

Add this SLI file to the relevant Keptn project and stage.

The --resource parameter points to the sli.yaml file you created above.
The --resourceUri must be set to --resourceUri=dynatrace/sli.yaml (Keptn is hardcoded to look for this value).

Add File to Keptn

keptn add-resource --project=customer-a --stage=staging --service=staging-web --resource=sli.yaml --resourceUri=dynatrace/sli.yaml

$ keptn add-resource --project=customer-a --stage=staging --service=staging-web --resource=sli.yaml --resourceUri=dynatrace/sli.yaml
Adding resource sli.yaml to service staging-web in stage staging in project customer-a
Resource has been uploaded.

So far, we've told Keptn:

Where to look for metrics (the dynatrace-sli-service)
Provided credentials to connect to the metrics provider (stored in the k8s secret)
Provided Keptn with what our SLI definition (95th percentile response time)

We haven't told Keptn:

What good looks like for our SLI

Create & Upload SLO file

Create a new file called slo.yaml with the following content:

spec_version: '1.0'
comparison:
  compare_with: "single_result"
  include_result_with_score: "pass"
  aggregate_function: avg
objectives:
- sli: response_time_p95
  pass:
  - criteria:
    - "<=+10%"
    - "<200"
  warning:
  - criteria:
    - "<=500"
total_score:
  pass: "90%"
  warning: "50%"

Add File to Keptn

Add a file to Keptn:

keptn add-resource --project=customer-a --stage=staging --service=staging-web --resource=slo.yaml --resourceUri=slo.yaml

You'll see a success message:

$ keptn add-resource --project=customer-a --stage=staging --service=staging-web --resource=slo.yaml --resourceUri=slo.yaml
Adding resource slo.yaml to service staging-web in stage staging in project customer-a
Resource has been uploaded.

Trigger an evaluation using the keptn command line:

keptn send event start-evaluation --project=customer-a --stage=staging --service=staging-web --timeframe=2m

Refresh the Keptn's bridge and notice that the evaluation is successful:

first keptn evaluation

Time to push version 2 of our code to Customer A in staging.

kubectl set image -n customer-a deployment/staging-web front-end=adamgardnerdt/perform-demo-app:v2 --record

Refresh the customer-a staging URL and you should see a green banner.

customer a staging v2

Notice that the page takes longer to load. There is a delay on this page. This delay will cause our quality gate to fail.

Wait for a few minutes for Dynatrace to receive new data before progressing to the next step.

Request a new quality evaluation from Keptn. This time, it should fail because the page is taking too long to load.

keptn send event start-evaluation --project=customer-a --stage=staging --service=staging-web --timeframe=2m

failed build

So far we've relied on the keptn CLI to run evaluations. That's not usually the way things are done. More likely you will want to integrate Keptn into your shell scripts or build pipelines.

For this, we have a few options but first we'll look at the API.

Retrieve the Keptn API key:

kubectl get secret keptn-api-token -n keptn -ojsonpath={.data.keptn-api-token} | base64 --decode

For convenience, the demo system saves it for you in ~/installOutput.txt:

cat ~/installOutput.txt

Navigate to the Keptn API page:

http://keptn.VMIP.nip.io/api

Authenticate with your token and experiment with the GET endpoints.

Use the evaluation endpoint to request a new Keptn evaluation. This is the equivalent of this CLI command:

keptn send event start-evaluation --project=customer-a --stage=staging --service=staging-web --timeframe=2m

Your details will be:

project = customer-a
stage = staging
service = staging-web
timeframe = 2m

The minimum payload body is:

{
  "timeframe": "2m"
}

If you have an API utility such as Postman you can also try a POST request to:

http://keptn.YOURIP.nip.io/api/v1/project/PROJECTNAME/stage/STAGENAME/service/SERVICENAME/evaluation

Header values:
x-token: YOURKEPTNAPIKEY
Content-Type: application/json

Record the `keptnContext`

However you choose to call the Keptn API, you receive a 200 OK response and a payload which contains a value called keptnContext

{
  "keptnContext": "ee4fb3ac-8a7b-48d2-bc35-a784fb1d4b43",
  "token": "***"
}

Keptn will run the evaluation asynchronously. It may take some time to complete the evaluation so Keptn provides an ID by which you can retrieve your evaluation at a later time. keptnContext is that ID.

Retrieve Evalation

Using the Select a definition dropdown, go to mongodb-datastore and use the GET /event with your Keptn Context to pull all events with that Keptn context.

Notice that you receive multiple events. In fact, using only the Keptn context ID, you get the full Purepath of events which corresponds to what you see in the bridge.

Every event in the chain shares the same Keptn Context. Use the context to grab a complete history of that "chain of events":

bridge events

So far we have:

Installed the Keptn tool
Configured Keptn to pull metrics from an SLI provider (Dynatrace)
Defined our project structure in Keptn (project, service and stage)
Configured the metrics we care about (SLIs)
Defined what "quality" means for those metrics (SLO)
Used Keptn to execute evaluations and received our result (pass, warning or fail)
Interacted with Keptn through the Command Line Interface (CLI) and the API.

Keptn Core

As you know, Keptn is event based. For the purposes of this session, you can consider Keptn's core to be responsible for receiving and placing events onto a topic in a publisher / subscriber type model.

These events can then be used by Keptn's services.

All possible events are listed here

Keptn Services

Keptn services are additional pieces of functionality (think of them like apps) that listen for one (or more) events, react to those events and (optionally) emit events.

Anyone can create new Keptn services.

Basic Workflow Example

keptn-services-basic

Complex Workflow Example

keptn-services-complex

For example, when we ask Keptn to start an evaluation, we send the following event: sh.keptn.event.start-evaluation

Keptn services that are configured to listen for that event can then react.

Keptn's service architecture makes it completely flexible in terms of what happens and when.

Out of the Box Services

You have already been using Keptn services - some are installed for you by default. Look again at an evaluation in Keptn's bridge. Notice that there are a number of services already mentioned:

ootb keptn services

Take a look at what's already installed with:

kubectl get deployments -n keptn

Here are mine:

NAME                       READY   UP-TO-DATE   AVAILABLE   AGE
bridge                     1/1     1            1           47h
dynatrace-sli-service      1/1     1            1           45h
eventbroker-go             1/1     1            1           47h
api-service                1/1     1            1           47h
api-gateway-nginx          1/1     1            1           47h
mongodb                    1/1     1            1           47h
lighthouse-service         1/1     1            1           47h
shipyard-service           1/1     1            1           47h
mongodb-datastore          1/1     1            1           47h
remediation-service        1/1     1            1           47h
configuration-service      1/1     1            1           47h

We have now interacted with Keptn via the CLI and the API. But a more realistic scenario would be:

A developer requests an evaluation using some tool (shell script, pipeline run etc.)
Keptn evaluates the code and generates an evaluation result
The developer wishes to be notified and consume the results of this evaluation in a tool of her / his choosing

Discuss: How could we achieve this?

Our developer has decided that they want JIRA tickets for each evaluation. We've looked and found a Keptn service which does just that: the JIRA Service.

💡 You will need a free trial JIRA account to proceed. Sign up here

Login to JIRA and setup your account. You can skip all the optional questions and invites.
Choose an ID. This will form part of your URL: https://YOURID.atlassian.net
When asked to create a project, make a note of your project key
Create a Kanban type project
Generate a JIRA API Key generate one here

By now you should know:

Your JIRA username (the email address you signed up with)
Your JIRA URL (no trailing slash!): https://SOMETHING.atlassian.net
Your project Key. If you've forgotten it's in the URL too.
Your Keptn base URL is (no trailing slash): http://keptn.VMIP.nip.io
Your Keptn bridge URL is (no trailing slash): http://keptn.VMIP.nip.io/bridge

Follow the instructions on the JIRA Service readme.

Ask Keptn for an evaluation, either via the API or CLI.

keptn send event start-evaluation --project=customer-a --stage=staging --service=staging-web --timeframe=2m

You can check the progress of the evaluation with:

keptn get event evaluation-done --keptn-context ***

JIRA Results

When your evaluation is completed, refresh the JIRA board and you should see a new ticket in your backlog.

jira ticket

You've successfully created a quality gate as code and

Do you have any thoughts, ideas, comments or questions?

Here are some ideas to extend your research:

Onboard customers B & C to Keptn
Extend the quality gate to use additional / different metrics
Use different metrics per customer
Receive Dynatrace Events for Quality Gate results

Useful Links:

Services HOT Session: Quality Gates

Before Lunch

Lunch

After Lunch

Throughout the Day

Think: In Product vs. Open Source

- Production: In Product (SLO & Error Budget tracking)

- CI/CD: Keptn (DT Supported version soon?)

Dynatrace SLO screen

⚠️ A Quality Gate does not require Keptn, but Keptn makes defining one easier

✔️ Tip: Discuss the concept and advantages of quality gates with customers. Don't focus on Keptn

✔️ Tip: Quality gates can (and should) encompass metrics from any tools (via Dynatrace?)

✔️ Tip: Quality gates are NOT only performance / availability based

Think Events

Project

Stage

Service

Validate

Add File to Keptn

Create & Upload SLO file

Add File to Keptn

Record the keptnContext

Retrieve Evalation

Keptn Core

Keptn Services

Basic Workflow Example

Complex Workflow Example

Out of the Box Services

Discuss: How could we achieve this?

JIRA Results

Do you have any thoughts, ideas, comments or questions?

Record the `keptnContext`