Self-Hosted Monitoring Stack: Zabbix + Grafana for Home Infrastructure
Published: June 15, 2026 | CommsNet
You know that feeling when something breaks and you only find out because the website is down? That's not monitoring — that's embarrassment detection. Real monitoring tells you before things break. It shows you the memory leak that started three hours ago, the disk that's filling at 2% per day, the SSL certificate expiring in 12 days.
Enterprise monitoring platforms (Datadog, New Relic, Splunk) cost hundreds to thousands per month. For a homelab, that's absurd. But running blind is worse. The answer: self-hosted Zabbix for data collection and alerting, paired with Grafana for visualization. Together, they give you enterprise-grade observability at the cost of the electricity to run them.
In this article, I'll walk through deploying a complete Zabbix + Grafana monitoring stack on Proxmox, configuring agents across VLANs, building dashboards that actually tell you something, and setting up alerts that wake you up when they matter — not at 3 AM for a transient spike.
Why Zabbix + Grafana?
The Monitoring Landscape
| Solution | Cost | Data Ownership | Complexity | Alerting | Dashboards |
|---|---|---|---|---|---|
| Datadog | $15-23/host/mo | Cloud (theirs) | Low | Excellent | Excellent |
| Prometheus + Grafana | Free | Self-hosted | Medium | Good | Excellent |
| Zabbix + Grafana | Free | Self-hosted | Medium-High | Excellent | Excellent (with Grafana) |
| Netdata | Free | Self-hosted | Low | Basic | Good (built-in) |
| Uptime Kuma | Free | Self-hosted | Low | Basic | Basic |
Why Not Just Prometheus?
Prometheus is the darling of the cloud-native world, and for good reason. But for homelab monitoring, Zabbix has advantages:
- Agent-based collection works across VLANs — Prometheus pull-based scraping struggles with firewall rules between VLANs. Zabbix agents push data to the server (or use active checks), making firewall rules simpler.
-
Auto-discovery — Zabbix can discover hosts, interfaces, and services automatically. With Prometheus, you're writing
prometheus.ymltargets by hand. - Built-in templates — Zabbix has 400+ out-of-the-box templates for everything from Linux to pfSense to Proxmox to SNMP devices. Prometheus requires exporters for everything.
- Trigger logic — Zabbix triggers support expressions like "average of last 5 minutes > threshold AND last value > threshold". Prometheus alerting rules are powerful but harder to compose.
- Grafana integration — Zabbix data in Grafana gives you the best of both: Zabbix collection + Grafana visualization.
Where Grafana Fits
Zabbix has its own dashboards, but they look like 2005. Grafana is the visualization layer:
- Beautiful, customizable dashboards
- Unified view across multiple data sources
- Annotation layers (deploy events, maintenance windows)
- Alerting with deduplication and routing
- Mobile-responsive (check your homelab from your phone)
Architecture
┌────────────────────────────────────────────────────────────────┐
│ Monitoring Architecture │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────┐ │
│ │ Proxmox │ │ pfSense │ │ Docker │ │ IoT Devices │ │
│ │ Agent │ │ Agent │ │ Agent │ │ SNMP │ │
│ │ (VLAN20) │ │ (VLAN10) │ │ (VLAN20) │ │ (VLAN30) │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └──────┬───────┘ │
│ │ │ │ │ │
│ └──────────────┴──────┬───────┴─────────────────┘ │
│ │ │
│ ┌────────▼────────┐ │
│ │ Zabbix Server │ │
│ │ (VLAN 20) │ │
│ │ - Collection │ │
│ │ - Alerting │ │
│ │ - Triggers │ │
│ └────────┬──────┘ │
│ │ │
│ ┌────────▼────────┐ │
│ │ Grafana │ │
│ │ (VLAN 20) │ │
│ │ - Dashboards │ │
│ │ - Visualization│ │
│ │ - Alert UI │ │
│ └─────────────────┘ │
│ │
│ Alert Channels: Telegram, Email, Webhook │
└────────────────────────────────────────────────────────────────┘
Network Considerations (VLAN-Aware)
Following our zero-trust VLAN architecture from the previous article:
- Zabbix Server lives on VLAN 20 (Servers)
- Zabbix Agents on VLAN 10 (Management) push data to server via active checks
- SNMP polling from Zabbix to VLAN 30 (IoT) requires explicit firewall allow rules
- Grafana on VLAN 20, with an optional reverse proxy on VLAN 50 (Services) if you want external access
Firewall rules needed:
# Allow Zabbix agents → Zabbix server (active checks)
ALLOW VLAN10 → VLAN20 TCP 10051 — "Management agents → Zabbix"
ALLOW VLAN20 → VLAN20 TCP 10051 — "Server agents → Zabbix"
# Allow Zabbix server → IoT (SNMP polling, if desired)
ALLOW VLAN20 → VLAN30 UDP 161 — "Zabbix SNMP poll IoT"
# Allow Grafana access from Management VLAN
ALLOW VLAN10 → VLAN20 TCP 3000 — "MGMT → Grafana dashboard"
Deployment on Proxmox
Step 1: Create the Zabbix Server LXC
Proxmox LXC containers are perfect for monitoring — low overhead, fast startup, full Linux userspace.
# Download Debian 12 LXC template
pveam download local debian-12-standard_12.2-1_amd64.tar.zst
# Create container
pct create 200 local:vztmpl/debian-12-standard_12.2-1_amd64.tar.zst \
--hostname zabbix \
--memory 4096 \
--swap 2048 \
--cores 2 \
--storage local-lvm \
--rootfs local-lvm:32 \
--net0 name=eth0,bridge=vmbr0.20,ip=10.0.20.10/24,gw=10.0.20.1 \
--unprivileged 1 \
--onboot 1 \
--start 1
Why LXC instead of VM: Zabbix doesn't need its own kernel. LXC gives you 95% of a VM's isolation with 5% of the overhead. Your monitoring shouldn't be the heaviest thing on the host.
Step 2: Install Zabbix Server + PostgreSQL
# Enter the container
pct enter 200
# Install PostgreSQL
apt update && apt install -y postgresql postgresql-contrib
# Create Zabbix database
sudo -u postgres createuser --pwprompt zabbix
sudo -u postgres createdb -O zabbix -E Unicode -T template0 zabbix
# Add Zabbix repository
wget https://repo.zabbix.com/zabbix/7.2/debian/pool/main/z/zabbix-release/zabbix-release_7.2-1+debian12_all.deb
dpkg -i zabbix-release_7.2-1+debian12_all.deb
apt update
# Install Zabbix server, frontend, and agent
apt install -y zabbix-server-pgsql zabbix-frontend-php zabbix-apache-conf zabbix-sql-scripts zabbix-agent2
# Import initial schema
zcat /usr/share/zabbix-sql-scripts/postgresql/server.sql.gz | \
sudo -u zabbix psql zabbix
# Configure Zabbix server
cat > /etc/zabbix/zabbix_server.conf << 'EOF'
DBHost=localhost
DBName=zabbix
DBUser=zabbix
DBPassword=YOUR_POSTGRES_PASSWORD_HERE
LogFile=/var/log/zabbix/zabbix_server.log
LogFileSize=50
DebugLevel=3
StartPollers=5
StartPollersUnreachable=2
StartTrappers=5
StartDiscoverers=2
StartHTTPPollers=2
CacheSize=64M
HistoryCacheSize=32M
TrendCacheSize=8M
ValueCacheSize=32M
Timeout=10
EOF
# Start services
systemctl restart zabbix-server zabbix-agent2 apache2
systemctl enable zabbix-server zabbix-agent2 apache2
Step 3: Install Grafana
# Add Grafana repository
apt install -y apt-transport-https software-properties-common
wget -q -O /usr/share/keyrings/grafana.key https://apt.grafana.com/gpg.key
echo "deb [signed-by=/usr/share/keyrings/grafana.key] https://apt.grafana.com stable main" | \
tee /etc/apt/sources.list.d/grafana.list
apt update && apt install -y grafana
# Configure Grafana
cat > /etc/grafana/grafana.ini << 'EOF'
[server]
http_addr = 10.0.20.11
http_port = 3000
domain = grafana.commsnet.local
[security]
admin_user = admin
admin_password = CHANGE_ME_IMMEDIATELY
[database]
type = sqlite3
[analytics]
reporting_enabled = false
check_for_updates = false
[auth.anonymous]
enabled = false
EOF
systemctl restart grafana-server
systemctl enable grafana-server
Step 4: Connect Grafana to Zabbix
Install the Zabbix data source plugin in Grafana:
grafana-cli plugins install alexanderzobnin-zabbix-app
systemctl restart grafana-server
In Grafana UI (Configuration → Plugins → Zabbix):
- Enable the Zabbix app plugin
- Add data source:
- Name: Zabbix
- Type: Zabbix API
-
URL:
http://10.0.20.10/zabbix/api_jsonrpc.php - Username: Admin
- Password: Your Zabbix admin password
- Trends: Enable (use trends for long-term graphs)
Configuring Zabbix Agents
Agent on Proxmox Host
# On the Proxmox host itself
apt install -y zabbix-agent2
cat > /etc/zabbix/zabbix_agent2.conf << 'EOF'
Server=10.0.20.10
ServerActive=10.0.20.10
Hostname=proxmox-host
LogFile=/var/log/zabbix/zabbix_agent2.log
DebugLevel=3
# Custom metrics for Proxmox
UserParameter=pve.cluster.status,/usr/bin/pvesh get /cluster/status --output json 2>/dev/null | grep -c '"online"'
UserParameter=pve.vm.count,/usr/bin/qm list 2>/dev/null | wc -l
UserParameter=pve.ct.count,/usr/bin/pct list 2>/dev/null | wc -l
UserParameter=pve.storage.used[*],/usr/bin/pvesm status --storage $1 --output json 2>/dev/null | grep -o '"used":[0-9]*' | cut -d: -f2
UserParameter=pve.storage.total[*],/usr/bin/pvesm status --storage $1 --output json 2>/dev/null | grep -o '"total":[0-9]*' | cut -d: -f2
EOF
systemctl restart zabbix-agent2
systemctl enable zabbix-agent2
Agent on pfSense
pfSense has a Zabbix agent package:
System → Package Manager → Available Packages → pfSense-zabbix-agent
Configuration:
Zabbix Server IP: 10.0.20.10
Zabbix Server Port: 10051
Hostname: pfsense
Enable active checks: Yes
pfSense-specific items to monitor:
- CARP status (if using HA)
- Gateway quality (packet loss, latency, jitter)
- State table utilization
- DHCP lease counts per VLAN
- Firewall rule denials per VLAN (from our zero-trust setup)
- OpenVPN/WireGuard client counts
- Interface traffic per VLAN
Agent on Docker Hosts
# docker-compose.yml for Zabbix agent
version: '3.8'
services:
zabbix-agent:
image: zabbix/zabbix-agent2:latest
container_name: zabbix-agent
restart: unless-stopped
environment:
- ZBX_SERVER_HOST=10.0.20.10
- ZBX_HOSTNAME=docker-host-01
- ZBX_ACTIVE_ALLOW=true
volumes:
- /:/hostfs:ro
- /var/run/docker.sock:/var/run/docker.sock:ro
network_mode: host
privileged: true
Docker-specific metrics:
# Additional UserParameters for Docker
UserParameter=docker.container.count,/usr/bin/docker ps -q | wc -l
UserParameter=docker.container.running,/usr/bin/docker ps --filter status=running -q | wc -l
UserParameter=docker.image.count,/usr/bin/docker images -q | wc -l
UserParameter=docker.volume.count,/usr/bin/docker volume ls -q | wc -l
Zabbix Host Configuration
Adding Hosts in Zabbix UI
Configuration → Hosts → Create Host:
| Host | Templates | Groups | Interface | Proxy |
|---|---|---|---|---|
| proxmox-host | Linux by Zabbix agent, Proxmox VE by Zabbix | Servers | Agent: 10.0.20.5:10051 | None |
| pfsense | pfSense by Zabbix | Network | Agent: 10.0.10.1:10051 | None |
| docker-host-01 | Linux by Zabbix agent, Docker by Zabbix | Servers | Agent: 10.0.20.20:10051 | None |
| unifi-switch | SNMP Generic, Ubiquiti Switch | Network | SNMP: 10.0.10.2:161 | None |
| cisco-2960 | SNMP Generic, Cisco Switch | Network | SNMP: 10.0.10.3:161 | None |
Template Customization
Zabbix templates are good out of the box but need tuning for homelab scale:
Linux by Zabbix agent — Adjust trigger thresholds:
| Trigger | Default | Homelab Adjusted | Reason |
|---|---|---|---|
| CPU load > 5min per core | 5 per core | 80% sustained 10min | Homelab CPUs burst, don't alert on spikes |
| Available memory < 20% | 20% | 10% | Homelab hosts use more memory; 20% is too sensitive |
| Disk space < 20% | 20% | 10% | Small disks fill faster; 20% on 100GB = 20GB free |
| Swap usage > 50% | 50% | 80% | Some swap usage is normal in homelabs |
pfSense template — Add custom items:
# Custom pfSense items via UserParameter
UserParameter=pfsense.gateway.loss[*],/usr/local/bin/php -r "require '/etc/inc/util.inc'; echo get_gateway_loss('$1');"
UserParameter=pfsense.dhcp.leases[*],/usr/local/bin/php -r "require '/etc/inc/util.inc'; echo count_dhcp_leases('$1');"
UserParameter=pfsense.firmware.version,/usr/local/bin/php -r "require '/etc/inc/util.inc'; echo get_firmware_version();"
Building Grafana Dashboards
Dashboard 1: Infrastructure Overview
The single pane of glass for your entire homelab:
Panels:
Row: "Host Status" ────────────────────────────────
[Stat] Hosts Up zabbix: hosts.count{status=0}
[Stat] Hosts Down zabbix: hosts.count{status=1}
[Stat] Active Triggers zabbix: triggers.count{value=1}
Row: "System Health" ──────────────────────────────
[Time Series] CPU Usage per Host zabbix: system.cpu.util{host=*}
[Gauge] Memory % per Host zabbix: vm.memory.util{host=*}
[Time Series] Disk I/O per Host zabbix: vfs.dev.read{host=*}, vfs.dev.write{host=*}
Row: "Network" ─────────────────────────────────────
[Time Series] Traffic per VLAN zabbix: net.if.in{host=pfsense,if=VLAN*}
[Stat] Firewall Denials/h zabbix: pf.deny.count
[Table] Top Talkers zabbix: net.if.total{host=*}
Row: "Storage" ─────────────────────────────────────
[Gauge] Proxmox Storage Used zabbix: pve.storage.used[*]
[Bar] Docker Disk Usage zabbix: vfs.fs.size{host=docker*,fs=/var/lib/docker}
Dashboard 2: pfSense Network Security
Dedicated to monitoring the zero-trust firewall:
Panels:
Row: "Firewall Activity" ──────────────────────────
[Time Series] Denials per VLAN/hour zabbix: pf.deny{vlan=*}
[Table] Top Denied Sources zabbix: pf.deny.src{groupby=src_ip}
[Time Series] Allow vs Deny Ratio zabbix: pf.allow / pf.deny
Row: "Gateway Quality" ────────────────────────────
[Time Series] Packet Loss % zabbix: pfsense.gateway.loss[*]
[Time Series] Latency ms zabbix: pfsense.gateway.latency[*]
[Stat] Gateway Status zabbix: pfsense.gateway.status
Row: "DHCP Leases" ────────────────────────────────
[Stat] MGMT Leases zabbix: pfsense.dhcp.leases[MGMT]
[Stat] Server Leases zabbix: pfsense.dhcp.leases[SERVERS]
[Stat] IoT Leases zabbix: pfsense.dhcp.leases[IOT]
[Stat] Guest Leases zabbix: pfsense.dhcp.leases[GUEST]
[Table] New Leases (24h) zabbix: pfsense.dhcp.new_leases
Dashboard 3: Proxmox Virtualization
Panels:
Row: "Cluster Health" ─────────────────────────────
[Stat] Cluster Status zabbix: pve.cluster.status
[Stat] VMs Running zabbix: pve.vm.count
[Stat] CTs Running zabbix: pve.ct.count
Row: "Resource Usage" ─────────────────────────────
[Gauge] CPU Total zabbix: system.cpu.util{host=proxmox*}
[Gauge] Memory Total zabbix: vm.memory.util{host=proxmox*}
[Bar] Storage per Pool zabbix: pve.storage.used[*] / pve.storage.total[*]
Row: "VM/CT Details" ─────────────────────────────
[Table] All VMs + CPU/Mem/Disk zabbix: pve.vm.{cpu,mem,disk}[*]
[Table] All CTs + CPU/Mem/Disk zabbix: pve.ct.{cpu,mem,disk}[*]
Grafana Variables for Reusable Dashboards
Set up template variables so dashboards work across all hosts:
Variable: $host
Type: Query
Query: zabbix: hosts*
Multi-value: Yes
Include All: Yes
Variable: $vlan
Type: Custom
Values: MGMT, SERVERS, IOT, GUEST, SERVICES
Variable: $interval
Type: Interval
Values: 1m,5m,10m,30m,1h,6h,1d
Auto: Yes
Alerting: Wake Me When It Matters
Zabbix Triggers → Grafana Alerts
The alerting pipeline:
Zabbix Agent → Zabbix Server (trigger fires) → Grafana Alert Rule → Notification Policy → Channel
Critical Alerts (Wake Me Up)
| Alert | Trigger Expression | Severity | Channel |
|---|---|---|---|
| Host down | nodata(5m) |
Disaster | Telegram + Email |
| Disk > 90% | last(/{HOST}/vfs.fs.size[pct])>90 |
High | Telegram |
| pfSense down |
nodata(3m) on pfSense |
Disaster | Telegram + Email |
| Gateway packet loss > 10% | last(/{HOST}/pfsense.gateway.loss)>10 |
High | Telegram |
| Zabbix server down | Internal zabbix trigger | Disaster | Email (fallback) |
Warning Alerts (Check in Morning)
| Alert | Trigger Expression | Severity | Channel |
|---|---|---|---|
| CPU > 80% sustained | avg(10m)>80 |
Warning | Dashboard only |
| Memory > 85% | last(/{HOST}/vm.memory.util)>85 |
Warning | Dashboard only |
| Certificate expiring < 14 days | last(/{HOST}/cert.days_left)<14 |
Warning | Email digest |
| Docker container stopped | last(/{HOST}/docker.container.running)<expected |
Warning | Dashboard only |
Information Alerts (Weekly Digest)
| Alert | Trigger Expression | Severity | Channel |
|---|---|---|---|
| New DHCP lease on MGMT VLAN | Event log match | Info | Weekly digest |
| Firmware update available | diff(/{HOST}/pfsense.firmware.version) |
Info | Weekly digest |
| Storage growth rate > 5%/week | trend(7d)>5 |
Info | Weekly digest |
Telegram Alert Integration
Grafana supports Telegram natively. Create a bot via @BotFather:
Grafana → Alerting → Contact points → Add Contact Point
Type: Telegram
BOT API Token: YOUR_BOT_TOKEN
Chat ID: YOUR_CHAT_ID
Notification Policy:
Group by: alertname, severity
Group wait: 30s
Group interval: 5m
Repeat interval: 4h
Route: severity=disaster → Telegram immediately
Route: severity=high → Telegram, 5m repeat
Route: severity=warning → Email digest, 1d repeat
Route: severity=info → Weekly email
Performance Tuning
Zabbix Housekeeper
Zabbix's built-in housekeeper is notoriously slow with PostgreSQL. Replace it with partitioned tables:
-- Connect to Zabbix database
sudo -u postgres psql zabbix
-- Enable partitioning extension
CREATE EXTENSION IF NOT EXISTS timescaledb;
-- Convert history tables to hypertables (TimescaleDB)
SELECT create_hypertable('history', 'clock', chunk_time_interval => 86400);
SELECT create_hypertable('history_uint', 'clock', chunk_time_interval => 86400);
SELECT create_hypertable('history_str', 'clock', chunk_time_interval => 86400);
SELECT create_hypertable('trends', 'clock', chunk_time_interval => 2592000);
SELECT create_hypertable('trends_uint', 'clock', chunk_time_interval => 2592000);
Disable Zabbix internal housekeeper (TimescaleDB handles it now):
# /etc/zabbix/zabbix_server.conf
DisableHousekeeping=1
Set retention policies:
-- Keep raw history for 14 days
SELECT add_retention_policy('history', INTERVAL '14 days');
SELECT add_retention_policy('history_uint', INTERVAL '14 days');
SELECT add_retention_policy('history_str', INTERVAL '14 days');
-- Keep trends for 2 years
SELECT add_retention_policy('trends', INTERVAL '2 years');
SELECT add_retention_policy('trends_uint', INTERVAL '2 years');
Database Size Estimates
| Monitoring | Items | History/Day | 14-Day History | 2-Year Trends | Total DB Size |
|---|---|---|---|---|---|
| 5 hosts | ~500 | ~15 MB | ~210 MB | ~200 MB | ~500 MB |
| 10 hosts | ~1000 | ~30 MB | ~420 MB | ~400 MB | ~1 GB |
| 20 hosts | ~2000 | ~60 MB | ~840 MB | ~800 MB | ~2 GB |
A homelab with 10-20 hosts will use 1-2 GB of storage over 2 years. That's nothing.
Backup Strategy
Your monitoring data is valuable — it contains your baseline, your history, your incident timeline. Back it up.
Zabbix Database Backup
#!/bin/bash
# zabbix-backup.sh — Daily Zabbix database backup
BACKUP_DIR="/mnt/nas/backups/zabbix"
DATE=$(date +%Y-%m-%d)
RETENTION_DAYS=30
# PostgreSQL dump
sudo -u postgres pg_dump zabbix | gzip > "${BACKUP_DIR}/zabbix_${DATE}.sql.gz"
# Zabbix config
tar czf "${BACKUP_DIR}/zabbix_config_${DATE}.tar.gz" \
/etc/zabbix/ /etc/grafana/
# Cleanup old backups
find "${BACKUP_DIR}" -name "zabbix_*.sql.gz" -mtime +${RETENTION_DAYS} -delete
find "${BACKUP_DIR}" -name "zabbix_config_*.tar.gz" -mtime +${RETENTION_DAYS} -delete
echo "Backup complete: ${DATE}"
Grafana Dashboard Export
Grafana dashboards should be version-controlled:
#!/bin/bash
# grafana-export.sh — Export all dashboards as JSON
GRAFANA_URL="http://10.0.20.11:3000"
API_KEY="YOUR_GRAFANA_API_KEY"
OUTPUT_DIR="/home/commstech/grafana-dashboards"
# Get all dashboard UIDs
DASHBOARDS=$(curl -s -H "Authorization: Bearer ${API_KEY}" \
"${GRAFANA_URL}/api/search?type=dash-db" | \
jq -r '.[] | .uid')
# Export each dashboard
for UID in ${DASHBOARDS}; do
curl -s -H "Authorization: Bearer ${API_KEY}" \
"${GRAFANA_URL}/api/dashboards/uid/${UID}" | \
jq '.dashboard' > "${OUTPUT_DIR}/${UID}.json"
done
echo "Exported $(echo ${DASHBOARDS} | wc -w) dashboards"
Commit these to git. Your dashboards are code.
Cost Summary
| Item | Cost | Notes |
|---|---|---|
| Zabbix Server (LXC on Proxmox) | $0 | Already have hardware |
| Grafana (LXC on Proxmox) | $0 | Already have hardware |
| PostgreSQL + TimescaleDB | $0 | Open source |
| Zabbix Agents | $0 | Open source |
| Storage (2 GB over 2 years) | $0 | Negligible |
| Telegram bot for alerts | $0 | Free tier |
| Total Monthly Cost | $0 | Self-hosted, zero subscriptions |
Compare to Datadog at $15/host/month for 10 hosts = $150/month = $1,800/year. You're saving $1,800/year by self-hosting.
Dashboard Screenshots Description
Since this is a text article, here's what your dashboards should look like:
Infrastructure Overview Dashboard
- Top row: Three large stat panels — green "5 Hosts Up", red "0 Hosts Down", orange "2 Active Warnings"
- Middle left: Time series graph showing CPU usage for all hosts over last 1 hour, with 80% threshold line
- Middle right: Gauge panels showing memory usage per host (color-coded: green < 60%, yellow 60-80%, red > 80%)
- Bottom left: Network traffic stacked area chart per VLAN
- Bottom right: Storage usage horizontal bar chart per Proxmox pool
pfSense Security Dashboard
- Top row: Firewall deny rate time series — should show consistent low rate, any spike is suspicious
- Middle: Table of top 10 denied source IPs with last attempt time
- Bottom: DHCP lease count per VLAN as small bar charts, with "new in 24h" annotation
Next Steps
With monitoring in place, you can now:
- Set up automated remediation — Zabbix can run scripts on alert (restart a service, clear a cache)
- Add log monitoring — Forward syslog from pfSense, Proxmox, and Docker to Zabbix
- Implement capacity planning — Use trend data to predict when you'll run out of disk/CPU/memory
- Add synthetic monitoring — Zabbix web scenarios to check your services are actually responding
- Integrate with Home Assistant — Send Zabbix alerts to your home automation for visual/audio alerts
Key Takeaways
- Self-hosted monitoring costs nothing but electricity — Zabbix + Grafana is enterprise-grade, free, and yours
- Zabbix for collection, Grafana for visualization — each tool does what it does best
- Active checks work across VLANs — agents push data, no need to open inbound ports
- Tune your triggers for homelab scale — enterprise defaults are too sensitive for home infrastructure
- TimescaleDB partitioning is essential — the built-in housekeeper will kill your database performance
- Alert on what matters, ignore what doesn't — disaster = wake me, warning = check in morning, info = weekly digest
- Version-control your dashboards — they're infrastructure code, not click-and-hope
- Back up your monitoring data — it's your operational history, and losing it means losing your baselines
CommsNet — Building infrastructure that respects your privacy and your intelligence.
Follow on Medium and Dev.to for more homelab, networking, and self-hosting content.
Top comments (0)