Skip to main content

Monitoring & Alerting

Overview

The Monitoring & Alerting feature provides a unified dashboard where you can observe the health and performance of all your cloud resources in one place. Instead of navigating to each resource individually, you get an at-a-glance view of CPU usage, memory consumption, and availability status for every resource in your account.

Key capabilities:

  • Unified dashboard -- Monitor instances, managed databases, load balancers, and VPN gateways from a single page.
  • Real-time metrics -- CPU and memory usage with color-coded indicators, auto-refreshing every 15 seconds.
  • Expandable detail views -- Click any resource row to reveal detailed performance charts.
  • Alert rules -- Define thresholds for any metric and get notified when they are breached.
  • Multi-channel notifications -- Send alerts to Slack, Discord, Telegram, or custom webhooks.

Monitoring Dashboard

Accessing the Dashboard

Navigate to Monitoring in the sidebar. The dashboard opens showing all your resources with their current status and metrics.

Resource Type Tabs

Use the tabs at the top of the page to filter by resource type:

TabResources shown
AllEvery resource across all types
InstancesVirtual machine instances
DatabasesManaged database clusters
Load BalancersLoad balancer services
VPN GatewaysVPN gateway services

Time Range Selector

Choose the time range for metric charts using the selector in the top-right corner. Available ranges include 1 hour, 6 hours, 24 hours, 7 days, and 30 days. This affects the charts displayed when you expand a resource row.

Understanding the Indicators

Each resource row shows CPU and memory usage with color-coded progress bars:

ColorRangeMeaning
GreenBelow 60%Normal usage
Yellow60% -- 80%Elevated usage
RedAbove 80%High usage, may need attention

Status Detection

  • Online -- The resource is running and actively reporting metrics.
  • Offline -- No metric data has been received. The resource may be stopped, unreachable, or not yet provisioned.

Expanding Rows for Detail

Click on any resource row to expand it and view detailed performance charts for that resource. Charts display historical CPU and memory usage over the selected time range.

Auto-Refresh

The dashboard automatically refreshes metrics every 15 seconds. You do not need to manually reload the page to see updated values.


Notification Channels

Before creating alert rules, set up at least one notification channel to receive alerts.

Supported Channel Types

TypeDescription
SlackPosts alerts to a Slack channel via an incoming webhook URL
DiscordPosts alerts to a Discord channel via a webhook URL
TelegramSends alerts to a Telegram chat via Bot API
WebhookSends a JSON payload to any HTTP endpoint

Creating a Notification Channel

  1. Navigate to Monitoring and click the Notification Channels tab
  2. Click Create Channel
  3. Select the channel Type
  4. Enter a Name to identify the channel
  5. Fill in the configuration for your chosen type:

Slack:

  • Webhook URL -- Create an incoming webhook in your Slack workspace settings and paste the URL here.

Discord:

  • Webhook URL -- In your Discord server, go to channel settings > Integrations > Webhooks, create a webhook, and paste the URL.

Telegram:

  • Bot Token -- Create a bot via @BotFather and paste the token.
  • Chat ID -- The numeric ID of the chat, group, or channel where alerts should be sent.

Webhook:

  • URL -- The HTTP endpoint that will receive POST requests with a JSON payload containing alert details.

Testing a Channel

After creating a channel, click the Test button to send a test notification. Verify that the message arrives at the expected destination before using the channel in alert rules.

Multiple Channels

You can create as many notification channels as needed. Alert rules can be configured to notify one or more channels simultaneously.


Alert Rules

Alert rules let you define conditions that trigger notifications when a metric crosses a threshold.

Creating an Alert Rule

  1. Navigate to Monitoring and click the Alert Rules tab
  2. Click Create Rule
  3. Configure the rule:
    • Name -- A descriptive name for the rule
    • Resource Type -- Select the type of resource to monitor (Instance, Database, Load Balancer, VPN Gateway)
    • Resource -- Choose a specific resource, or select All to apply the rule to every resource of that type
    • Metric -- The metric to monitor (see table below)
    • Operator -- Greater than or less than
    • Threshold -- The value that triggers the alert
    • Duration -- How long the condition must persist before the alert fires (e.g., 5 minutes)
    • Reminder Interval -- How often to resend notifications while the alert remains active (e.g., every 30 minutes)
    • Notification Channels -- Select one or more channels to notify
  4. Click Create

Available Metrics by Resource Type

Resource TypeAvailable Metrics
InstanceCPU Usage (%), Memory Usage (%), Status (up/down)
DatabaseCPU Usage (%), Memory Usage (%), Status (up/down)
Load BalancerCPU Usage (%), Memory Usage (%), Status (up/down)
VPN GatewayCPU Usage (%), Memory Usage (%), Status (up/down)

Status (Up/Down) Alerts

Status alerts detect when a resource stops reporting metrics. If no data is received for the configured duration, the resource is considered down and the alert fires. When metrics resume, the alert automatically resolves.

Duration

The duration setting prevents false alarms from brief spikes. The condition must be continuously true for the entire duration before the alert transitions to Firing state. For example, with a 5-minute duration, a CPU spike that lasts only 2 minutes will not trigger an alert.

Reminder Interval

While an alert is in Firing state, notifications are resent at the configured reminder interval. This ensures that active issues are not forgotten. Acknowledging the alert stops the reminders.


Alert States

Each alert rule tracks its current state:

StateMeaning
OKThe metric is within normal range. No action needed.
FiringThe threshold has been breached for the configured duration. Notifications have been sent.
AcknowledgedA user has acknowledged the alert. Reminder notifications are stopped, but the system continues to monitor the metric.
ResolvedThe metric has returned to normal range. This transition happens automatically -- no user action is required.

Acknowledging an Alert

When an alert is in Firing state, click the Acknowledge button to indicate you are aware of the issue. This stops reminder notifications from being sent. The alert will automatically transition to Resolved once the metric returns to normal.

Acknowledging an alert does not silence it permanently -- if the metric recovers and then breaches the threshold again, a new alert cycle begins.


Alert History

The Alert History tab shows a log of all alert events across your account.

Each entry includes:

  • Timestamp -- When the event occurred
  • Alert Rule -- The rule that triggered the event
  • Resource -- The affected resource
  • Action -- The state transition (e.g., OK to Firing, Firing to Acknowledged, Firing to Resolved)
  • Details -- The metric value at the time of the event

Filtering

Use the filters at the top of the history table to narrow results:

  • Resource Type -- Show events for a specific resource type only
  • Action -- Show only specific state transitions (e.g., only Firing events)

Retention

Alert history is retained for 90 days. Older entries are automatically removed.


Troubleshooting

"No data" shown for a resource

The resource is not reporting metrics. Verify that:

  • The instance or service is currently running (not stopped or suspended).
  • The hypervisor hosting the resource is online and reporting metrics.
  • Sufficient time has passed since the resource was created for metrics to be collected (allow up to 2 minutes).

Notifications not arriving

  • Use the Test button on the notification channel to confirm delivery.
  • For Slack/Discord: verify the webhook URL is correct and the webhook has not been deleted from the workspace/server.
  • For Telegram: confirm the bot token is valid and the bot has been added to the target chat. Ensure the chat ID is correct (use a numeric ID, not a username).
  • For Webhooks: check that the endpoint is reachable, returns a 2xx status code, and accepts POST requests with a JSON body.

Alert stuck in "Firing" state

  • If the underlying issue has been resolved but the alert remains firing, verify that the resource is actively reporting metrics. The alert evaluator needs fresh data to detect recovery.
  • Click Acknowledge to stop reminder notifications while you investigate.
  • If the metric has genuinely returned to normal, the alert will resolve automatically on the next evaluation cycle (within a few minutes).

Alert fires too frequently

  • Increase the Duration setting to require the condition to persist longer before firing. This filters out brief spikes.
  • Adjust the Threshold to a less sensitive value.
  • Increase the Reminder Interval to reduce the frequency of repeated notifications.