Monitoring & Alerting
Overview
The Monitoring & Alerting feature provides a unified dashboard where you can observe the health and performance of all your cloud resources in one place. Instead of navigating to each resource individually, you get an at-a-glance view of CPU usage, memory consumption, and availability status for every resource in your account.
Key capabilities:
- Unified dashboard -- Monitor instances, managed databases, load balancers, and VPN gateways from a single page.
- Real-time metrics -- CPU and memory usage with color-coded indicators, auto-refreshing every 15 seconds.
- Expandable detail views -- Click any resource row to reveal detailed performance charts.
- Alert rules -- Define thresholds for any metric and get notified when they are breached.
- Multi-channel notifications -- Send alerts to Slack, Discord, Telegram, or custom webhooks.
Monitoring Dashboard
Accessing the Dashboard
Navigate to Monitoring in the sidebar. The dashboard opens showing all your resources with their current status and metrics.
Resource Type Tabs
Use the tabs at the top of the page to filter by resource type:
| Tab | Resources shown |
|---|---|
| All | Every resource across all types |
| Instances | Virtual machine instances |
| Databases | Managed database clusters |
| Load Balancers | Load balancer services |
| VPN Gateways | VPN gateway services |
Time Range Selector
Choose the time range for metric charts using the selector in the top-right corner. Available ranges include 1 hour, 6 hours, 24 hours, 7 days, and 30 days. This affects the charts displayed when you expand a resource row.
Understanding the Indicators
Each resource row shows CPU and memory usage with color-coded progress bars:
| Color | Range | Meaning |
|---|---|---|
| Green | Below 60% | Normal usage |
| Yellow | 60% -- 80% | Elevated usage |
| Red | Above 80% | High usage, may need attention |
Status Detection
- Online -- The resource is running and actively reporting metrics.
- Offline -- No metric data has been received. The resource may be stopped, unreachable, or not yet provisioned.
Expanding Rows for Detail
Click on any resource row to expand it and view detailed performance charts for that resource. Charts display historical CPU and memory usage over the selected time range.
Auto-Refresh
The dashboard automatically refreshes metrics every 15 seconds. You do not need to manually reload the page to see updated values.
Notification Channels
Before creating alert rules, set up at least one notification channel to receive alerts.
Supported Channel Types
| Type | Description |
|---|---|
| Slack | Posts alerts to a Slack channel via an incoming webhook URL |
| Discord | Posts alerts to a Discord channel via a webhook URL |
| Telegram | Sends alerts to a Telegram chat via Bot API |
| Webhook | Sends a JSON payload to any HTTP endpoint |
Creating a Notification Channel
- Navigate to Monitoring and click the Notification Channels tab
- Click Create Channel
- Select the channel Type
- Enter a Name to identify the channel
- Fill in the configuration for your chosen type:
Slack:
- Webhook URL -- Create an incoming webhook in your Slack workspace settings and paste the URL here.
Discord:
- Webhook URL -- In your Discord server, go to channel settings > Integrations > Webhooks, create a webhook, and paste the URL.
Telegram:
- Bot Token -- Create a bot via @BotFather and paste the token.
- Chat ID -- The numeric ID of the chat, group, or channel where alerts should be sent.
Webhook:
- URL -- The HTTP endpoint that will receive POST requests with a JSON payload containing alert details.
Testing a Channel
After creating a channel, click the Test button to send a test notification. Verify that the message arrives at the expected destination before using the channel in alert rules.
Multiple Channels
You can create as many notification channels as needed. Alert rules can be configured to notify one or more channels simultaneously.
Alert Rules
Alert rules let you define conditions that trigger notifications when a metric crosses a threshold.
Creating an Alert Rule
- Navigate to Monitoring and click the Alert Rules tab
- Click Create Rule
- Configure the rule:
- Name -- A descriptive name for the rule
- Resource Type -- Select the type of resource to monitor (Instance, Database, Load Balancer, VPN Gateway)
- Resource -- Choose a specific resource, or select All to apply the rule to every resource of that type
- Metric -- The metric to monitor (see table below)
- Operator -- Greater than or less than
- Threshold -- The value that triggers the alert
- Duration -- How long the condition must persist before the alert fires (e.g., 5 minutes)
- Reminder Interval -- How often to resend notifications while the alert remains active (e.g., every 30 minutes)
- Notification Channels -- Select one or more channels to notify
- Click Create
Available Metrics by Resource Type
| Resource Type | Available Metrics |
|---|---|
| Instance | CPU Usage (%), Memory Usage (%), Status (up/down) |
| Database | CPU Usage (%), Memory Usage (%), Status (up/down) |
| Load Balancer | CPU Usage (%), Memory Usage (%), Status (up/down) |
| VPN Gateway | CPU Usage (%), Memory Usage (%), Status (up/down) |
Status (Up/Down) Alerts
Status alerts detect when a resource stops reporting metrics. If no data is received for the configured duration, the resource is considered down and the alert fires. When metrics resume, the alert automatically resolves.
Duration
The duration setting prevents false alarms from brief spikes. The condition must be continuously true for the entire duration before the alert transitions to Firing state. For example, with a 5-minute duration, a CPU spike that lasts only 2 minutes will not trigger an alert.
Reminder Interval
While an alert is in Firing state, notifications are resent at the configured reminder interval. This ensures that active issues are not forgotten. Acknowledging the alert stops the reminders.
Alert States
Each alert rule tracks its current state:
| State | Meaning |
|---|---|
| OK | The metric is within normal range. No action needed. |
| Firing | The threshold has been breached for the configured duration. Notifications have been sent. |
| Acknowledged | A user has acknowledged the alert. Reminder notifications are stopped, but the system continues to monitor the metric. |
| Resolved | The metric has returned to normal range. This transition happens automatically -- no user action is required. |
Acknowledging an Alert
When an alert is in Firing state, click the Acknowledge button to indicate you are aware of the issue. This stops reminder notifications from being sent. The alert will automatically transition to Resolved once the metric returns to normal.
Acknowledging an alert does not silence it permanently -- if the metric recovers and then breaches the threshold again, a new alert cycle begins.
Alert History
The Alert History tab shows a log of all alert events across your account.
Each entry includes:
- Timestamp -- When the event occurred
- Alert Rule -- The rule that triggered the event
- Resource -- The affected resource
- Action -- The state transition (e.g., OK to Firing, Firing to Acknowledged, Firing to Resolved)
- Details -- The metric value at the time of the event
Filtering
Use the filters at the top of the history table to narrow results:
- Resource Type -- Show events for a specific resource type only
- Action -- Show only specific state transitions (e.g., only Firing events)
Retention
Alert history is retained for 90 days. Older entries are automatically removed.
Troubleshooting
"No data" shown for a resource
The resource is not reporting metrics. Verify that:
- The instance or service is currently running (not stopped or suspended).
- The hypervisor hosting the resource is online and reporting metrics.
- Sufficient time has passed since the resource was created for metrics to be collected (allow up to 2 minutes).
Notifications not arriving
- Use the Test button on the notification channel to confirm delivery.
- For Slack/Discord: verify the webhook URL is correct and the webhook has not been deleted from the workspace/server.
- For Telegram: confirm the bot token is valid and the bot has been added to the target chat. Ensure the chat ID is correct (use a numeric ID, not a username).
- For Webhooks: check that the endpoint is reachable, returns a 2xx status code, and accepts POST requests with a JSON body.
Alert stuck in "Firing" state
- If the underlying issue has been resolved but the alert remains firing, verify that the resource is actively reporting metrics. The alert evaluator needs fresh data to detect recovery.
- Click Acknowledge to stop reminder notifications while you investigate.
- If the metric has genuinely returned to normal, the alert will resolve automatically on the next evaluation cycle (within a few minutes).
Alert fires too frequently
- Increase the Duration setting to require the condition to persist longer before firing. This filters out brief spikes.
- Adjust the Threshold to a less sensitive value.
- Increase the Reminder Interval to reduce the frequency of repeated notifications.