SSH Check Usage
Utilize SSH to check the status of devices that are blind spots in monitoring, including SAN switches.
SSH Check Integration provides the same functionality as the content of the improvement tasks.
Considerations for Configuration
A host that can communicate over the network with the monitoring target device.
Configured on the Datadog Agent. Considering the additional load generated beyond host monitoring, select an agent from a host with lower service priority.
Note: If too many instances are registered, it may cause an overhead on the Agent. It is recommended to register a maximum of 4 instances per host.
SSH Check Integration Configuration
Linux
Configuration file path: /etc/datadog-agent/conf.d/ssh_check.d/conf.yaml
Example conf.yaml configuration
The path of the private_key_file must be accessible by the Datadog account (dd-agent or ddagentuser). Copy the existing SSH connection id_rsa file to the required path for use.
Integration of the Checks type is managed by the check_runners module (default: 4). Registering too many instances may cause overhead on the Agent.
Restart the Agent after configuration
Service Check Monitor Configuration
① Pick a Service Check
- Select a service check that can be monitored. In this case, choose ssh.can_connect.② Pick monitor scope: Set scope based on tags
- Set monitoring scope
: Supports selecting scope based on tags from all hosts with the same service check.
When selecting scope conditions, they operate with AND logic.
To target all hosts, select ‘All Monitored Hosts’.
- Apply Exclusion Conditions
: Supports selecting exclusion scope based on tags.
When applying exclusion conditions, they operate with OR logic.
③ Set alert conditions: Configure alert trigger conditions▶ Set Alert Trigger Conditions
- If the Alert status persists because the situation is lifted and not resolved after an Alert occurs,
- When setting up SSH Check monitoring, select Check Alert.
a. Check Alert: Set alert conditions for each individual service. Adjust based on consecutive alert failures.
b. Cluster Alert: Set alert conditions based on the failure ratio of service checks within a cluster group.
▶ Set Group-by Conditions for Alert Triggering.
- Select the host.
▶ Set Alert Triggering and Resolution Conditions
- Configure conditions for Critical/OK states.
- Set consecutive failure counts for Critical / success counts for OK.
- Select 5 consecutive failures for Critical (alert triggers after 5 consecutive failures).
▶ Do not notify / Notify settings
- Configure notifications when no data is collected.
- Default is ‘Do not notify’. When set to Notify, a No-data alarm is triggered if no data is received within the set time.
▶ Auto Resolve Alert Setting
- If an alert state persists even after the issue is resolved, this function automatically resolves it after the set time.
- Default is ‘Never’, meaning auto-resolve is disabled. To enable auto-resolve, select the desired time.
This is a function that automatically resolves after a set time has passed.
- The default is ‘Never’, which does not automatically resolve. If you want to automatically resolve,
select a time.④ Notify your team: Propagation Settings
▶ Alert Title
- This is the title of the message propagated when an alert occurs.
- Example: [SSH Check][Critical] {{host.name}} ssh is not responding.
▶ Alert Message
- This is the content of the message propagated when an alert occurs.
- Example
{{#is_alert}} Occurrence Time (KST): {{local_time 'last_triggered_at' 'Asia/Seoul'}} [Critical] {{host.name}} ssh has not responded 5 times in a row. Please check. {{/is_alert}} {{#is_alert_recovery}} Occurrence Time (KST): {{local_time 'last_triggered_at' 'Asia/Seoul'}} [Critical Resolved] {{host.name}} ssh response has returned to normal. {{/is_alert_recovery}} |
▶ Use Message Template Variables
- You can check the usage of templates and variables available in the alert title and message body.
▶ Notify your services and your team members settings
- Notification channels such as opsgenie / slack / TEAMS / webhook and email are displayed.
Please set the channel or target email to propagate the alert.
▶ Content displayed settings (Message content settings)
- Configure whether to include automatically added content such as query/snapshot in the message.
▶ Include Triggering tags in notification title settings
- Displays the tag related to the target of the alert in the title when an alert occurs.
▶ Aggregation settings
- Since alerts are generated per SSH check target host, Multi Alert - Host should be selected.
▶ Renotification settings
- If an alert (Warning) or Nodata persists, it will resend alerts at the selected time interval.
▶ Tags settings
- Set tags for monitors that can be used when searching in Manage Monitors or configuring Downtime schedule.
▶ Priority settings
- Set the severity (importance) of alerts from P1 to P5.
⑤ Define permission and audit notifications
▶ Restrict editing settings
- Set the editing permissions for alerts.
- If a role is selected, all users with that role will be able to edit.
▶ Test Notifications
- Clicking the button will send a test alert to the selected channel.
▶ Create
- Clicking the button will save the configured settings.