Observability and OpenTelemetry – CH1 Introduction 1 – 5

  Observability

1: Introduction

2: What is Observability?

Observability is a measure of how well internal states of a system can be inferred from knowledge of its external outputs.

i.e. Understand what is going on internally based on what you can see externally. (Kind of a fancy word for “Monitoring”)

Telemetry data (aka Signals):

  • Logs
  • Traces
  • Metrics

What is OpenTelemetry?

OpenTelemetry is an open source project that provides Observability by collecting telemetry data from your applications.

It allows you to export Telemetry data so you can observe it later on.

 

3: What is OpenTelemetry

OpenTelemetry is:

  • Open source
  • By CNCF (Cloud Native Computing Foundation)
  • Second most active project
    • Kubernetes is #1
    • Others include Prometheus
  • Vendor agnostic

OpenTelemetry is:

  • A Specification
  • Implementation in SDK (Software Development Kit)
    • As many as 12 languages supported
  • Backend Implementation
    • The OpenTelemetry Collector
    • Export to a database

4. Cloud-Native Applications

Cloud Native applications use a variety of distributed applications. Each require their own methods for monitoring.

  • Lambda functions
  • Containerized functions
  • etc.

Requirement:How fast can we

  • Find the incident
  • Understand the incident
  • Fix the incident

Example:

  1. App 1 cannot talk to the database
  2. Communication between App 2 & App 1 causes some failure in the DB
  3. Communication between App 3 & App 1 causes some failure in the DB

Only using Logs and Metrics would prove to be difficult

Signals

Logs: The application story
Metrics: Overall system health
Traces: The context, the “distributed” path between apps.

 

5. What is a distributed trace?

Breakdown:

  • Left side: Preview
    • Parent/Child relationships
    • Order in which calls were made to other services
      • Which operation happened
      • Why it happened (Context)
  • Right side: Performance (Spend)
    • Breakdown
      • Overall process took 516ms
      • users-service took 67ms to respond
      • Once users-service completed,
        • orders-service was called, then called stock-service
    • Can see
      • How long each operation took
      • if services were performed in parallel or in sequence

 

 

LEAVE A COMMENT