close

DEV Community

Luna Commsnet
Luna Commsnet

Posted on

Self-Hosted Monitoring Stack: Zabbix + Grafana for Home Infrastructure

Self-Hosted Monitoring Stack: Zabbix + Grafana for Home Infrastructure

Published: June 15, 2026 | CommsNet


You know that feeling when something breaks and you only find out because the website is down? That's not monitoring — that's embarrassment detection. Real monitoring tells you before things break. It shows you the memory leak that started three hours ago, the disk that's filling at 2% per day, the SSL certificate expiring in 12 days.

Enterprise monitoring platforms (Datadog, New Relic, Splunk) cost hundreds to thousands per month. For a homelab, that's absurd. But running blind is worse. The answer: self-hosted Zabbix for data collection and alerting, paired with Grafana for visualization. Together, they give you enterprise-grade observability at the cost of the electricity to run them.

In this article, I'll walk through deploying a complete Zabbix + Grafana monitoring stack on Proxmox, configuring agents across VLANs, building dashboards that actually tell you something, and setting up alerts that wake you up when they matter — not at 3 AM for a transient spike.


Why Zabbix + Grafana?

The Monitoring Landscape

Solution Cost Data Ownership Complexity Alerting Dashboards
Datadog $15-23/host/mo Cloud (theirs) Low Excellent Excellent
Prometheus + Grafana Free Self-hosted Medium Good Excellent
Zabbix + Grafana Free Self-hosted Medium-High Excellent Excellent (with Grafana)
Netdata Free Self-hosted Low Basic Good (built-in)
Uptime Kuma Free Self-hosted Low Basic Basic

Why Not Just Prometheus?

Prometheus is the darling of the cloud-native world, and for good reason. But for homelab monitoring, Zabbix has advantages:

  1. Agent-based collection works across VLANs — Prometheus pull-based scraping struggles with firewall rules between VLANs. Zabbix agents push data to the server (or use active checks), making firewall rules simpler.
  2. Auto-discovery — Zabbix can discover hosts, interfaces, and services automatically. With Prometheus, you're writing prometheus.yml targets by hand.
  3. Built-in templates — Zabbix has 400+ out-of-the-box templates for everything from Linux to pfSense to Proxmox to SNMP devices. Prometheus requires exporters for everything.
  4. Trigger logic — Zabbix triggers support expressions like "average of last 5 minutes > threshold AND last value > threshold". Prometheus alerting rules are powerful but harder to compose.
  5. Grafana integration — Zabbix data in Grafana gives you the best of both: Zabbix collection + Grafana visualization.

Where Grafana Fits

Zabbix has its own dashboards, but they look like 2005. Grafana is the visualization layer:

  • Beautiful, customizable dashboards
  • Unified view across multiple data sources
  • Annotation layers (deploy events, maintenance windows)
  • Alerting with deduplication and routing
  • Mobile-responsive (check your homelab from your phone)

Architecture

┌────────────────────────────────────────────────────────────────┐
│                    Monitoring Architecture                       │
│                                                                  │
│  ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────────┐  │
│  │ Proxmox  │   │ pfSense  │   │ Docker   │   │ IoT Devices  │  │
│  │ Agent    │   │ Agent    │   │ Agent    │   │ SNMP         │  │
│  │ (VLAN20) │   │ (VLAN10) │   │ (VLAN20) │   │ (VLAN30)    │  │
│  └────┬─────┘   └────┬─────┘   └────┬─────┘   └──────┬───────┘  │
│       │              │              │                 │          │
│       └──────────────┴──────┬───────┴─────────────────┘          │
│                             │                                     │
│                    ┌────────▼────────┐                            │
│                    │  Zabbix Server  │                            │
│                    │  (VLAN 20)      │                            │
│                    │  - Collection  │                            │
│                    │  - Alerting    │                            │
│                    │  - Triggers    │                            │
│                    └────────┬──────┘                            │
│                             │                                     │
│                    ┌────────▼────────┐                            │
│                    │    Grafana      │                            │
│                    │  (VLAN 20)      │                            │
│                    │  - Dashboards   │                            │
│                    │  - Visualization│                            │
│                    │  - Alert UI     │                            │
│                    └─────────────────┘                            │
│                                                                  │
│  Alert Channels: Telegram, Email, Webhook                       │
└────────────────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Network Considerations (VLAN-Aware)

Following our zero-trust VLAN architecture from the previous article:

  • Zabbix Server lives on VLAN 20 (Servers)
  • Zabbix Agents on VLAN 10 (Management) push data to server via active checks
  • SNMP polling from Zabbix to VLAN 30 (IoT) requires explicit firewall allow rules
  • Grafana on VLAN 20, with an optional reverse proxy on VLAN 50 (Services) if you want external access

Firewall rules needed:

# Allow Zabbix agents → Zabbix server (active checks)
ALLOW  VLAN10VLAN20  TCP 10051"Management agents → Zabbix"
ALLOW  VLAN20VLAN20  TCP 10051"Server agents → Zabbix"

# Allow Zabbix server → IoT (SNMP polling, if desired)
ALLOW  VLAN20VLAN30  UDP 161"Zabbix SNMP poll IoT"

# Allow Grafana access from Management VLAN
ALLOW  VLAN10VLAN20  TCP 3000"MGMT → Grafana dashboard"
Enter fullscreen mode Exit fullscreen mode

Deployment on Proxmox

Step 1: Create the Zabbix Server LXC

Proxmox LXC containers are perfect for monitoring — low overhead, fast startup, full Linux userspace.

# Download Debian 12 LXC template
pveam download local debian-12-standard_12.2-1_amd64.tar.zst

# Create container
pct create 200 local:vztmpl/debian-12-standard_12.2-1_amd64.tar.zst \
  --hostname zabbix \
  --memory 4096 \
  --swap 2048 \
  --cores 2 \
  --storage local-lvm \
  --rootfs local-lvm:32 \
  --net0 name=eth0,bridge=vmbr0.20,ip=10.0.20.10/24,gw=10.0.20.1 \
  --unprivileged 1 \
  --onboot 1 \
  --start 1
Enter fullscreen mode Exit fullscreen mode

Why LXC instead of VM: Zabbix doesn't need its own kernel. LXC gives you 95% of a VM's isolation with 5% of the overhead. Your monitoring shouldn't be the heaviest thing on the host.

Step 2: Install Zabbix Server + PostgreSQL

# Enter the container
pct enter 200

# Install PostgreSQL
apt update && apt install -y postgresql postgresql-contrib

# Create Zabbix database
sudo -u postgres createuser --pwprompt zabbix
sudo -u postgres createdb -O zabbix -E Unicode -T template0 zabbix

# Add Zabbix repository
wget https://repo.zabbix.com/zabbix/7.2/debian/pool/main/z/zabbix-release/zabbix-release_7.2-1+debian12_all.deb
dpkg -i zabbix-release_7.2-1+debian12_all.deb
apt update

# Install Zabbix server, frontend, and agent
apt install -y zabbix-server-pgsql zabbix-frontend-php zabbix-apache-conf zabbix-sql-scripts zabbix-agent2

# Import initial schema
zcat /usr/share/zabbix-sql-scripts/postgresql/server.sql.gz | \
  sudo -u zabbix psql zabbix

# Configure Zabbix server
cat > /etc/zabbix/zabbix_server.conf << 'EOF'
DBHost=localhost
DBName=zabbix
DBUser=zabbix
DBPassword=YOUR_POSTGRES_PASSWORD_HERE
LogFile=/var/log/zabbix/zabbix_server.log
LogFileSize=50
DebugLevel=3
StartPollers=5
StartPollersUnreachable=2
StartTrappers=5
StartDiscoverers=2
StartHTTPPollers=2
CacheSize=64M
HistoryCacheSize=32M
TrendCacheSize=8M
ValueCacheSize=32M
Timeout=10
EOF

# Start services
systemctl restart zabbix-server zabbix-agent2 apache2
systemctl enable zabbix-server zabbix-agent2 apache2
Enter fullscreen mode Exit fullscreen mode

Step 3: Install Grafana

# Add Grafana repository
apt install -y apt-transport-https software-properties-common
wget -q -O /usr/share/keyrings/grafana.key https://apt.grafana.com/gpg.key

echo "deb [signed-by=/usr/share/keyrings/grafana.key] https://apt.grafana.com stable main" | \
  tee /etc/apt/sources.list.d/grafana.list

apt update && apt install -y grafana

# Configure Grafana
cat > /etc/grafana/grafana.ini << 'EOF'
[server]
http_addr = 10.0.20.11
http_port = 3000
domain = grafana.commsnet.local

[security]
admin_user = admin
admin_password = CHANGE_ME_IMMEDIATELY

[database]
type = sqlite3

[analytics]
reporting_enabled = false
check_for_updates = false

[auth.anonymous]
enabled = false
EOF

systemctl restart grafana-server
systemctl enable grafana-server
Enter fullscreen mode Exit fullscreen mode

Step 4: Connect Grafana to Zabbix

Install the Zabbix data source plugin in Grafana:

grafana-cli plugins install alexanderzobnin-zabbix-app
systemctl restart grafana-server
Enter fullscreen mode Exit fullscreen mode

In Grafana UI (Configuration → Plugins → Zabbix):

  1. Enable the Zabbix app plugin
  2. Add data source:
    • Name: Zabbix
    • Type: Zabbix API
    • URL: http://10.0.20.10/zabbix/api_jsonrpc.php
    • Username: Admin
    • Password: Your Zabbix admin password
    • Trends: Enable (use trends for long-term graphs)

Configuring Zabbix Agents

Agent on Proxmox Host

# On the Proxmox host itself
apt install -y zabbix-agent2

cat > /etc/zabbix/zabbix_agent2.conf << 'EOF'
Server=10.0.20.10
ServerActive=10.0.20.10
Hostname=proxmox-host
LogFile=/var/log/zabbix/zabbix_agent2.log
DebugLevel=3

# Custom metrics for Proxmox
UserParameter=pve.cluster.status,/usr/bin/pvesh get /cluster/status --output json 2>/dev/null | grep -c '"online"'
UserParameter=pve.vm.count,/usr/bin/qm list 2>/dev/null | wc -l
UserParameter=pve.ct.count,/usr/bin/pct list 2>/dev/null | wc -l
UserParameter=pve.storage.used[*],/usr/bin/pvesm status --storage $1 --output json 2>/dev/null | grep -o '"used":[0-9]*' | cut -d: -f2
UserParameter=pve.storage.total[*],/usr/bin/pvesm status --storage $1 --output json 2>/dev/null | grep -o '"total":[0-9]*' | cut -d: -f2
EOF

systemctl restart zabbix-agent2
systemctl enable zabbix-agent2
Enter fullscreen mode Exit fullscreen mode

Agent on pfSense

pfSense has a Zabbix agent package:

System → Package Manager → Available Packages → pfSense-zabbix-agent
Enter fullscreen mode Exit fullscreen mode

Configuration:

Zabbix Server IP: 10.0.20.10
Zabbix Server Port: 10051
Hostname: pfsense
Enable active checks: Yes
Enter fullscreen mode Exit fullscreen mode

pfSense-specific items to monitor:

  • CARP status (if using HA)
  • Gateway quality (packet loss, latency, jitter)
  • State table utilization
  • DHCP lease counts per VLAN
  • Firewall rule denials per VLAN (from our zero-trust setup)
  • OpenVPN/WireGuard client counts
  • Interface traffic per VLAN

Agent on Docker Hosts

# docker-compose.yml for Zabbix agent
version: '3.8'
services:
  zabbix-agent:
    image: zabbix/zabbix-agent2:latest
    container_name: zabbix-agent
    restart: unless-stopped
    environment:
      - ZBX_SERVER_HOST=10.0.20.10
      - ZBX_HOSTNAME=docker-host-01
      - ZBX_ACTIVE_ALLOW=true
    volumes:
      - /:/hostfs:ro
      - /var/run/docker.sock:/var/run/docker.sock:ro
    network_mode: host
    privileged: true
Enter fullscreen mode Exit fullscreen mode

Docker-specific metrics:

# Additional UserParameters for Docker
UserParameter=docker.container.count,/usr/bin/docker ps -q | wc -l
UserParameter=docker.container.running,/usr/bin/docker ps --filter status=running -q | wc -l
UserParameter=docker.image.count,/usr/bin/docker images -q | wc -l
UserParameter=docker.volume.count,/usr/bin/docker volume ls -q | wc -l
Enter fullscreen mode Exit fullscreen mode

Zabbix Host Configuration

Adding Hosts in Zabbix UI

Configuration → Hosts → Create Host:

Host Templates Groups Interface Proxy
proxmox-host Linux by Zabbix agent, Proxmox VE by Zabbix Servers Agent: 10.0.20.5:10051 None
pfsense pfSense by Zabbix Network Agent: 10.0.10.1:10051 None
docker-host-01 Linux by Zabbix agent, Docker by Zabbix Servers Agent: 10.0.20.20:10051 None
unifi-switch SNMP Generic, Ubiquiti Switch Network SNMP: 10.0.10.2:161 None
cisco-2960 SNMP Generic, Cisco Switch Network SNMP: 10.0.10.3:161 None

Template Customization

Zabbix templates are good out of the box but need tuning for homelab scale:

Linux by Zabbix agent — Adjust trigger thresholds:

Trigger Default Homelab Adjusted Reason
CPU load > 5min per core 5 per core 80% sustained 10min Homelab CPUs burst, don't alert on spikes
Available memory < 20% 20% 10% Homelab hosts use more memory; 20% is too sensitive
Disk space < 20% 20% 10% Small disks fill faster; 20% on 100GB = 20GB free
Swap usage > 50% 50% 80% Some swap usage is normal in homelabs

pfSense template — Add custom items:

# Custom pfSense items via UserParameter
UserParameter=pfsense.gateway.loss[*],/usr/local/bin/php -r "require '/etc/inc/util.inc'; echo get_gateway_loss('$1');"
UserParameter=pfsense.dhcp.leases[*],/usr/local/bin/php -r "require '/etc/inc/util.inc'; echo count_dhcp_leases('$1');"
UserParameter=pfsense.firmware.version,/usr/local/bin/php -r "require '/etc/inc/util.inc'; echo get_firmware_version();"
Enter fullscreen mode Exit fullscreen mode

Building Grafana Dashboards

Dashboard 1: Infrastructure Overview

The single pane of glass for your entire homelab:

Panels:

Row: "Host Status" ────────────────────────────────
  [Stat]  Hosts Up          zabbix: hosts.count{status=0}
  [Stat]  Hosts Down        zabbix: hosts.count{status=1}
  [Stat]  Active Triggers   zabbix: triggers.count{value=1}

Row: "System Health" ──────────────────────────────
  [Time Series]  CPU Usage per Host    zabbix: system.cpu.util{host=*}
  [Gauge]        Memory % per Host     zabbix: vm.memory.util{host=*}
  [Time Series]  Disk I/O per Host     zabbix: vfs.dev.read{host=*}, vfs.dev.write{host=*}

Row: "Network" ─────────────────────────────────────
  [Time Series]  Traffic per VLAN      zabbix: net.if.in{host=pfsense,if=VLAN*}
  [Stat]         Firewall Denials/h    zabbix: pf.deny.count
  [Table]        Top Talkers          zabbix: net.if.total{host=*}

Row: "Storage" ─────────────────────────────────────
  [Gauge]   Proxmox Storage Used  zabbix: pve.storage.used[*]
  [Bar]     Docker Disk Usage     zabbix: vfs.fs.size{host=docker*,fs=/var/lib/docker}
Enter fullscreen mode Exit fullscreen mode

Dashboard 2: pfSense Network Security

Dedicated to monitoring the zero-trust firewall:

Panels:

Row: "Firewall Activity" ──────────────────────────
  [Time Series]  Denials per VLAN/hour    zabbix: pf.deny{vlan=*}
  [Table]        Top Denied Sources       zabbix: pf.deny.src{groupby=src_ip}
  [Time Series]  Allow vs Deny Ratio      zabbix: pf.allow / pf.deny

Row: "Gateway Quality" ────────────────────────────
  [Time Series]  Packet Loss %             zabbix: pfsense.gateway.loss[*]
  [Time Series]  Latency ms               zabbix: pfsense.gateway.latency[*]
  [Stat]         Gateway Status            zabbix: pfsense.gateway.status

Row: "DHCP Leases" ────────────────────────────────
  [Stat]     MGMT Leases       zabbix: pfsense.dhcp.leases[MGMT]
  [Stat]     Server Leases     zabbix: pfsense.dhcp.leases[SERVERS]
  [Stat]     IoT Leases        zabbix: pfsense.dhcp.leases[IOT]
  [Stat]     Guest Leases      zabbix: pfsense.dhcp.leases[GUEST]
  [Table]    New Leases (24h)   zabbix: pfsense.dhcp.new_leases
Enter fullscreen mode Exit fullscreen mode

Dashboard 3: Proxmox Virtualization

Panels:

Row: "Cluster Health" ─────────────────────────────
  [Stat]    Cluster Status          zabbix: pve.cluster.status
  [Stat]    VMs Running             zabbix: pve.vm.count
  [Stat]    CTs Running             zabbix: pve.ct.count

Row: "Resource Usage" ─────────────────────────────
  [Gauge]    CPU Total              zabbix: system.cpu.util{host=proxmox*}
  [Gauge]    Memory Total           zabbix: vm.memory.util{host=proxmox*}
  [Bar]      Storage per Pool      zabbix: pve.storage.used[*] / pve.storage.total[*]

Row: "VM/CT Details" ─────────────────────────────
  [Table]    All VMs + CPU/Mem/Disk  zabbix: pve.vm.{cpu,mem,disk}[*]
  [Table]    All CTs + CPU/Mem/Disk  zabbix: pve.ct.{cpu,mem,disk}[*]
Enter fullscreen mode Exit fullscreen mode

Grafana Variables for Reusable Dashboards

Set up template variables so dashboards work across all hosts:

Variable: $host
  Type: Query
  Query: zabbix: hosts*
  Multi-value: Yes
  Include All: Yes

Variable: $vlan
  Type: Custom
  Values: MGMT, SERVERS, IOT, GUEST, SERVICES

Variable: $interval
  Type: Interval
  Values: 1m,5m,10m,30m,1h,6h,1d
  Auto: Yes
Enter fullscreen mode Exit fullscreen mode

Alerting: Wake Me When It Matters

Zabbix Triggers → Grafana Alerts

The alerting pipeline:

Zabbix Agent → Zabbix Server (trigger fires) → Grafana Alert Rule → Notification Policy → Channel
Enter fullscreen mode Exit fullscreen mode

Critical Alerts (Wake Me Up)

Alert Trigger Expression Severity Channel
Host down nodata(5m) Disaster Telegram + Email
Disk > 90% last(/{HOST}/vfs.fs.size[pct])>90 High Telegram
pfSense down nodata(3m) on pfSense Disaster Telegram + Email
Gateway packet loss > 10% last(/{HOST}/pfsense.gateway.loss)>10 High Telegram
Zabbix server down Internal zabbix trigger Disaster Email (fallback)

Warning Alerts (Check in Morning)

Alert Trigger Expression Severity Channel
CPU > 80% sustained avg(10m)>80 Warning Dashboard only
Memory > 85% last(/{HOST}/vm.memory.util)>85 Warning Dashboard only
Certificate expiring < 14 days last(/{HOST}/cert.days_left)<14 Warning Email digest
Docker container stopped last(/{HOST}/docker.container.running)<expected Warning Dashboard only

Information Alerts (Weekly Digest)

Alert Trigger Expression Severity Channel
New DHCP lease on MGMT VLAN Event log match Info Weekly digest
Firmware update available diff(/{HOST}/pfsense.firmware.version) Info Weekly digest
Storage growth rate > 5%/week trend(7d)>5 Info Weekly digest

Telegram Alert Integration

Grafana supports Telegram natively. Create a bot via @BotFather:

Grafana → Alerting → Contact points → Add Contact Point
  Type: Telegram
  BOT API Token: YOUR_BOT_TOKEN
  Chat ID: YOUR_CHAT_ID

Notification Policy:
  Group by: alertname, severity
  Group wait: 30s
  Group interval: 5m
  Repeat interval: 4h

  Route: severity=disaster → Telegram immediately
  Route: severity=high → Telegram, 5m repeat
  Route: severity=warning → Email digest, 1d repeat
  Route: severity=info → Weekly email
Enter fullscreen mode Exit fullscreen mode

Performance Tuning

Zabbix Housekeeper

Zabbix's built-in housekeeper is notoriously slow with PostgreSQL. Replace it with partitioned tables:

-- Connect to Zabbix database
sudo -u postgres psql zabbix

-- Enable partitioning extension
CREATE EXTENSION IF NOT EXISTS timescaledb;

-- Convert history tables to hypertables (TimescaleDB)
SELECT create_hypertable('history', 'clock', chunk_time_interval => 86400);
SELECT create_hypertable('history_uint', 'clock', chunk_time_interval => 86400);
SELECT create_hypertable('history_str', 'clock', chunk_time_interval => 86400);
SELECT create_hypertable('trends', 'clock', chunk_time_interval => 2592000);
SELECT create_hypertable('trends_uint', 'clock', chunk_time_interval => 2592000);
Enter fullscreen mode Exit fullscreen mode

Disable Zabbix internal housekeeper (TimescaleDB handles it now):

# /etc/zabbix/zabbix_server.conf
DisableHousekeeping=1
Enter fullscreen mode Exit fullscreen mode

Set retention policies:

-- Keep raw history for 14 days
SELECT add_retention_policy('history', INTERVAL '14 days');
SELECT add_retention_policy('history_uint', INTERVAL '14 days');
SELECT add_retention_policy('history_str', INTERVAL '14 days');

-- Keep trends for 2 years
SELECT add_retention_policy('trends', INTERVAL '2 years');
SELECT add_retention_policy('trends_uint', INTERVAL '2 years');
Enter fullscreen mode Exit fullscreen mode

Database Size Estimates

Monitoring Items History/Day 14-Day History 2-Year Trends Total DB Size
5 hosts ~500 ~15 MB ~210 MB ~200 MB ~500 MB
10 hosts ~1000 ~30 MB ~420 MB ~400 MB ~1 GB
20 hosts ~2000 ~60 MB ~840 MB ~800 MB ~2 GB

A homelab with 10-20 hosts will use 1-2 GB of storage over 2 years. That's nothing.


Backup Strategy

Your monitoring data is valuable — it contains your baseline, your history, your incident timeline. Back it up.

Zabbix Database Backup

#!/bin/bash
# zabbix-backup.sh — Daily Zabbix database backup
BACKUP_DIR="/mnt/nas/backups/zabbix"
DATE=$(date +%Y-%m-%d)
RETENTION_DAYS=30

# PostgreSQL dump
sudo -u postgres pg_dump zabbix | gzip > "${BACKUP_DIR}/zabbix_${DATE}.sql.gz"

# Zabbix config
tar czf "${BACKUP_DIR}/zabbix_config_${DATE}.tar.gz" \
  /etc/zabbix/ /etc/grafana/

# Cleanup old backups
find "${BACKUP_DIR}" -name "zabbix_*.sql.gz" -mtime +${RETENTION_DAYS} -delete
find "${BACKUP_DIR}" -name "zabbix_config_*.tar.gz" -mtime +${RETENTION_DAYS} -delete

echo "Backup complete: ${DATE}"
Enter fullscreen mode Exit fullscreen mode

Grafana Dashboard Export

Grafana dashboards should be version-controlled:

#!/bin/bash
# grafana-export.sh — Export all dashboards as JSON
GRAFANA_URL="http://10.0.20.11:3000"
API_KEY="YOUR_GRAFANA_API_KEY"
OUTPUT_DIR="/home/commstech/grafana-dashboards"

# Get all dashboard UIDs
DASHBOARDS=$(curl -s -H "Authorization: Bearer ${API_KEY}" \
  "${GRAFANA_URL}/api/search?type=dash-db" | \
  jq -r '.[] | .uid')

# Export each dashboard
for UID in ${DASHBOARDS}; do
  curl -s -H "Authorization: Bearer ${API_KEY}" \
    "${GRAFANA_URL}/api/dashboards/uid/${UID}" | \
    jq '.dashboard' > "${OUTPUT_DIR}/${UID}.json"
done

echo "Exported $(echo ${DASHBOARDS} | wc -w) dashboards"
Enter fullscreen mode Exit fullscreen mode

Commit these to git. Your dashboards are code.


Cost Summary

Item Cost Notes
Zabbix Server (LXC on Proxmox) $0 Already have hardware
Grafana (LXC on Proxmox) $0 Already have hardware
PostgreSQL + TimescaleDB $0 Open source
Zabbix Agents $0 Open source
Storage (2 GB over 2 years) $0 Negligible
Telegram bot for alerts $0 Free tier
Total Monthly Cost $0 Self-hosted, zero subscriptions

Compare to Datadog at $15/host/month for 10 hosts = $150/month = $1,800/year. You're saving $1,800/year by self-hosting.


Dashboard Screenshots Description

Since this is a text article, here's what your dashboards should look like:

Infrastructure Overview Dashboard

  • Top row: Three large stat panels — green "5 Hosts Up", red "0 Hosts Down", orange "2 Active Warnings"
  • Middle left: Time series graph showing CPU usage for all hosts over last 1 hour, with 80% threshold line
  • Middle right: Gauge panels showing memory usage per host (color-coded: green < 60%, yellow 60-80%, red > 80%)
  • Bottom left: Network traffic stacked area chart per VLAN
  • Bottom right: Storage usage horizontal bar chart per Proxmox pool

pfSense Security Dashboard

  • Top row: Firewall deny rate time series — should show consistent low rate, any spike is suspicious
  • Middle: Table of top 10 denied source IPs with last attempt time
  • Bottom: DHCP lease count per VLAN as small bar charts, with "new in 24h" annotation

Next Steps

With monitoring in place, you can now:

  1. Set up automated remediation — Zabbix can run scripts on alert (restart a service, clear a cache)
  2. Add log monitoring — Forward syslog from pfSense, Proxmox, and Docker to Zabbix
  3. Implement capacity planning — Use trend data to predict when you'll run out of disk/CPU/memory
  4. Add synthetic monitoring — Zabbix web scenarios to check your services are actually responding
  5. Integrate with Home Assistant — Send Zabbix alerts to your home automation for visual/audio alerts

Key Takeaways

  1. Self-hosted monitoring costs nothing but electricity — Zabbix + Grafana is enterprise-grade, free, and yours
  2. Zabbix for collection, Grafana for visualization — each tool does what it does best
  3. Active checks work across VLANs — agents push data, no need to open inbound ports
  4. Tune your triggers for homelab scale — enterprise defaults are too sensitive for home infrastructure
  5. TimescaleDB partitioning is essential — the built-in housekeeper will kill your database performance
  6. Alert on what matters, ignore what doesn't — disaster = wake me, warning = check in morning, info = weekly digest
  7. Version-control your dashboards — they're infrastructure code, not click-and-hope
  8. Back up your monitoring data — it's your operational history, and losing it means losing your baselines

CommsNet — Building infrastructure that respects your privacy and your intelligence.

Follow on Medium and Dev.to for more homelab, networking, and self-hosting content.

Top comments (0)