1
1 Comment

Smart Incident Response with n8n, Prometheus and Lambda

Your server hits 85% CPU at 3 AM from a scheduled backup, and your phone buzzes. You check, see it's nothing urgent, and go back to bed annoyed.

What if your system could check the time, evaluate severity, and decide whether to wake you up? That's what we're building.

  • Sometimes fix things automatically

The 3-Step Smart Response System

  1. Receive alerts from monitoring
  2. Analyze context - time, severity, business impact
  3. Route intelligently - SMS for emergencies, Slack for business hours

Building It with n8n (Free Tool)

Step 1: Smart Alert Processing

const now = new Date();
const isBusinessHours = now.getUTCHours() >= 9 && now.getUTCHours() < 17;
const severity = alert.labels.severity;

return {
  shouldWakeUp: severity === 'critical' && !isBusinessHours,
  route: isBusinessHours ? 'slack' : (severity === 'critical' ? 'sms' : 'log')
};

Step 2: Smart Routing Rules

  • Critical + After Hours → SMS
  • Critical + Business Hours → Slack urgently
  • Warning → Slack (no urgency)
  • Info → Log for morning review

Step 3: Auto-Resolution with Lambda from AI-agent decision

AI-agent decision prompt:

Analyze the following Prometheus alert to determine if it should be auto-resolved by restarting the EC2 instance to handle issues like high CPU usage, especially when the team is unavailable. The context is:

- Alert Name: {{ $node["Code"].json["alertname"] }}
- Severity: {{ $node["Code"].json["severity"] }}
- Duration: {{ $node["Code"].json.durationMinutes }} minutes
- Business Hours: {{ $node["Code"].json["isBusinessHours"] }} (true if 9 AM–5 PM UTC, false otherwise)
- Description: {{ $node["Code"].json["description"] }}

Extract the CPU usage (X%) from the description, formatted as: "On <instance> at <alertname>: CPU usage is X%, Memory available is Y%, Swap usage is Z%, Disk I/O is A s, Network received is B MB/s, Latency is C s".

Decide to auto-resolve (restart the EC2 instance) if:
1. CPU usage > 80% AND outside business hours (isBusinessHours is false).
2. CPU usage > 90% AND duration < 5 minutes.
3. Severity is "critical" AND outside business hours (isBusinessHours is false).

Return only the following JSON object, with no additional text, explanations, or markdown:
{
  "shouldAutoResolve": boolean,
  "reason": "Explanation of the reason why this action should or should not be auto-resolved, referencing CPU usage, duration, severity, and business hours if relevant."
}

- If shouldAutoResolve is true, a Lambda function will be triggered to restart the EC2 instance.
- If shouldAutoResolve is false, no restart will occur.
- Keep the reason concise and clear, referencing the specific criteria met or not met.
- If CPU usage cannot be extracted, assume 0% and include it in the reason.

For common issues, let the system auto-fix:

  • CPU > 90% for 5+ minutes outside hours → Auto-restart
  • Memory leak patterns → Clear cache automatically
  • Disk full → Clean temp files

Your 15-Minute Setup

  1. Install n8n (one Docker command)
  2. Create webhook endpoint for alerts
  3. Add time-based routing logic
  4. Connect your monitoring
  5. Test with controlled alerts

Total cost: 0$

Sample code

https://github.com/Bubobot-Team/automation-workflow-monitoring

https://github.com/Bubobot-Team/monitoring-stack

Check our blog for detailed setup: https://bubobot.com/blog/automated-incident-response-workflows-with-n8n-and-monitoring-tools

posted to Icon for group Developers
Developers
on May 27, 2025
  1. 1

    Feel free to share you expected workflows, we're improving more to share with the community!

Trending on Indie Hackers
We just hit our first 35 users in week one of our beta User Avatar 51 comments From Ideas to a Content Factory: The Rise of SuperMaker AI User Avatar 27 comments Why Early-Stage Founders Should Consider Skipping Prior Art Searches for Their Patent Applications User Avatar 21 comments Codenhack Beta — Full Access + Referral User Avatar 17 comments I built eSIMKitStore — helping travelers stay online with instant QR-based eSIMs 🌍 User Avatar 15 comments Building something...? User Avatar 12 comments