1
1 Comment

Smart Incident Response with n8n, Prometheus and Lambda

Your server hits 85% CPU at 3 AM from a scheduled backup, and your phone buzzes. You check, see it's nothing urgent, and go back to bed annoyed.

What if your system could check the time, evaluate severity, and decide whether to wake you up? That's what we're building.

  • Sometimes fix things automatically

The 3-Step Smart Response System

  1. Receive alerts from monitoring
  2. Analyze context - time, severity, business impact
  3. Route intelligently - SMS for emergencies, Slack for business hours

Building It with n8n (Free Tool)

Step 1: Smart Alert Processing

const now = new Date();
const isBusinessHours = now.getUTCHours() >= 9 && now.getUTCHours() < 17;
const severity = alert.labels.severity;

return {
  shouldWakeUp: severity === 'critical' && !isBusinessHours,
  route: isBusinessHours ? 'slack' : (severity === 'critical' ? 'sms' : 'log')
};

Step 2: Smart Routing Rules

  • Critical + After Hours → SMS
  • Critical + Business Hours → Slack urgently
  • Warning → Slack (no urgency)
  • Info → Log for morning review

Step 3: Auto-Resolution with Lambda from AI-agent decision

AI-agent decision prompt:

Analyze the following Prometheus alert to determine if it should be auto-resolved by restarting the EC2 instance to handle issues like high CPU usage, especially when the team is unavailable. The context is:

- Alert Name: {{ $node["Code"].json["alertname"] }}
- Severity: {{ $node["Code"].json["severity"] }}
- Duration: {{ $node["Code"].json.durationMinutes }} minutes
- Business Hours: {{ $node["Code"].json["isBusinessHours"] }} (true if 9 AM–5 PM UTC, false otherwise)
- Description: {{ $node["Code"].json["description"] }}

Extract the CPU usage (X%) from the description, formatted as: "On <instance> at <alertname>: CPU usage is X%, Memory available is Y%, Swap usage is Z%, Disk I/O is A s, Network received is B MB/s, Latency is C s".

Decide to auto-resolve (restart the EC2 instance) if:
1. CPU usage > 80% AND outside business hours (isBusinessHours is false).
2. CPU usage > 90% AND duration < 5 minutes.
3. Severity is "critical" AND outside business hours (isBusinessHours is false).

Return only the following JSON object, with no additional text, explanations, or markdown:
{
  "shouldAutoResolve": boolean,
  "reason": "Explanation of the reason why this action should or should not be auto-resolved, referencing CPU usage, duration, severity, and business hours if relevant."
}

- If shouldAutoResolve is true, a Lambda function will be triggered to restart the EC2 instance.
- If shouldAutoResolve is false, no restart will occur.
- Keep the reason concise and clear, referencing the specific criteria met or not met.
- If CPU usage cannot be extracted, assume 0% and include it in the reason.

For common issues, let the system auto-fix:

  • CPU > 90% for 5+ minutes outside hours → Auto-restart
  • Memory leak patterns → Clear cache automatically
  • Disk full → Clean temp files

Your 15-Minute Setup

  1. Install n8n (one Docker command)
  2. Create webhook endpoint for alerts
  3. Add time-based routing logic
  4. Connect your monitoring
  5. Test with controlled alerts

Total cost: 0$

Sample code

https://github.com/Bubobot-Team/automation-workflow-monitoring

https://github.com/Bubobot-Team/monitoring-stack

Check our blog for detailed setup: https://bubobot.com/blog/automated-incident-response-workflows-with-n8n-and-monitoring-tools

posted to Icon for group Developers
Developers
on May 27, 2025
  1. 1

    Feel free to share you expected workflows, we're improving more to share with the community!

Trending on Indie Hackers
Stop Spamming Reddit for MRR. It’s Killing Your Brand (You need Claude Code for BuildInPublic instead) User Avatar 197 comments What happened after my AI contract tool post got 70+ comments User Avatar 166 comments Where is your revenue quietly disappearing? User Avatar 60 comments How to build a quick and dirty prototype to validate your idea User Avatar 54 comments The Quiet Positioning Trick Small Products Use to Beat Bigger Ones User Avatar 40 comments I Thought AI Made Me Faster. My Metrics Disagreed. User Avatar 40 comments