Luna Commsnet

Posted on Jun 22

Self-Hosted Monitoring Stack: Zabbix + Grafana for Home Infrastructure

#homelab #selfhosting #infrastructure #monitoring

Self-Hosted Monitoring Stack: Zabbix + Grafana for Home Infrastructure

Published: June 15, 2026 | CommsNet

You know that feeling when something breaks and you only find out because the website is down? That's not monitoring — that's embarrassment detection. Real monitoring tells you before things break. It shows you the memory leak that started three hours ago, the disk that's filling at 2% per day, the SSL certificate expiring in 12 days.

Enterprise monitoring platforms (Datadog, New Relic, Splunk) cost hundreds to thousands per month. For a homelab, that's absurd. But running blind is worse. The answer: self-hosted Zabbix for data collection and alerting, paired with Grafana for visualization. Together, they give you enterprise-grade observability at the cost of the electricity to run them.

In this article, I'll walk through deploying a complete Zabbix + Grafana monitoring stack on Proxmox, configuring agents across VLANs, building dashboards that actually tell you something, and setting up alerts that wake you up when they matter — not at 3 AM for a transient spike.

Why Zabbix + Grafana?

The Monitoring Landscape

Solution	Cost	Data Ownership	Complexity	Alerting	Dashboards
Datadog	$15-23/host/mo	Cloud (theirs)	Low	Excellent	Excellent
Prometheus + Grafana	Free	Self-hosted	Medium	Good	Excellent
Zabbix + Grafana	Free	Self-hosted	Medium-High	Excellent	Excellent (with Grafana)
Netdata	Free	Self-hosted	Low	Basic	Good (built-in)
Uptime Kuma	Free	Self-hosted	Low	Basic	Basic

Why Not Just Prometheus?

Prometheus is the darling of the cloud-native world, and for good reason. But for homelab monitoring, Zabbix has advantages:

Agent-based collection works across VLANs — Prometheus pull-based scraping struggles with firewall rules between VLANs. Zabbix agents push data to the server (or use active checks), making firewall rules simpler.
Auto-discovery — Zabbix can discover hosts, interfaces, and services automatically. With Prometheus, you're writing prometheus.yml targets by hand.
Built-in templates — Zabbix has 400+ out-of-the-box templates for everything from Linux to pfSense to Proxmox to SNMP devices. Prometheus requires exporters for everything.
Trigger logic — Zabbix triggers support expressions like "average of last 5 minutes > threshold AND last value > threshold". Prometheus alerting rules are powerful but harder to compose.
Grafana integration — Zabbix data in Grafana gives you the best of both: Zabbix collection + Grafana visualization.

Where Grafana Fits

Zabbix has its own dashboards, but they look like 2005. Grafana is the visualization layer:

Beautiful, customizable dashboards
Unified view across multiple data sources
Annotation layers (deploy events, maintenance windows)
Alerting with deduplication and routing
Mobile-responsive (check your homelab from your phone)

Architecture

┌────────────────────────────────────────────────────────────────┐
│                    Monitoring Architecture                       │
│                                                                  │
│  ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────────┐  │
│  │ Proxmox  │   │ pfSense  │   │ Docker   │   │ IoT Devices  │  │
│  │ Agent    │   │ Agent    │   │ Agent    │   │ SNMP         │  │
│  │ (VLAN20) │   │ (VLAN10) │   │ (VLAN20) │   │ (VLAN30)    │  │
│  └────┬─────┘   └────┬─────┘   └────┬─────┘   └──────┬───────┘  │
│       │              │              │                 │          │
│       └──────────────┴──────┬───────┴─────────────────┘          │
│                             │                                     │
│                    ┌────────▼────────┐                            │
│                    │  Zabbix Server  │                            │
│                    │  (VLAN 20)      │                            │
│                    │  - Collection  │                            │
│                    │  - Alerting    │                            │
│                    │  - Triggers    │                            │
│                    └────────┬──────┘                            │
│                             │                                     │
│                    ┌────────▼────────┐                            │
│                    │    Grafana      │                            │
│                    │  (VLAN 20)      │                            │
│                    │  - Dashboards   │                            │
│                    │  - Visualization│                            │
│                    │  - Alert UI     │                            │
│                    └─────────────────┘                            │
│                                                                  │
│  Alert Channels: Telegram, Email, Webhook                       │
└────────────────────────────────────────────────────────────────┘

Network Considerations (VLAN-Aware)

Following our zero-trust VLAN architecture from the previous article:

Zabbix Server lives on VLAN 20 (Servers)
Zabbix Agents on VLAN 10 (Management) push data to server via active checks
SNMP polling from Zabbix to VLAN 30 (IoT) requires explicit firewall allow rules
Grafana on VLAN 20, with an optional reverse proxy on VLAN 50 (Services) if you want external access

Firewall rules needed:

# Allow Zabbix agents → Zabbix server (active checks)
ALLOW  VLAN10 → VLAN20  TCP 10051  — "Management agents → Zabbix"
ALLOW  VLAN20 → VLAN20  TCP 10051  — "Server agents → Zabbix"

# Allow Zabbix server → IoT (SNMP polling, if desired)
ALLOW  VLAN20 → VLAN30  UDP 161    — "Zabbix SNMP poll IoT"

# Allow Grafana access from Management VLAN
ALLOW  VLAN10 → VLAN20  TCP 3000   — "MGMT → Grafana dashboard"

Deployment on Proxmox

Step 1: Create the Zabbix Server LXC

Proxmox LXC containers are perfect for monitoring — low overhead, fast startup, full Linux userspace.

# Download Debian 12 LXC template
pveam download local debian-12-standard_12.2-1_amd64.tar.zst

# Create container
pct create 200 local:vztmpl/debian-12-standard_12.2-1_amd64.tar.zst \
  --hostname zabbix \
  --memory 4096 \
  --swap 2048 \
  --cores 2 \
  --storage local-lvm \
  --rootfs local-lvm:32 \
  --net0 name=eth0,bridge=vmbr0.20,ip=10.0.20.10/24,gw=10.0.20.1 \
  --unprivileged 1 \
  --onboot 1 \
  --start 1

Why LXC instead of VM: Zabbix doesn't need its own kernel. LXC gives you 95% of a VM's isolation with 5% of the overhead. Your monitoring shouldn't be the heaviest thing on the host.

Step 2: Install Zabbix Server + PostgreSQL

# Enter the container
pct enter 200

# Install PostgreSQL
apt update && apt install -y postgresql postgresql-contrib

# Create Zabbix database
sudo -u postgres createuser --pwprompt zabbix
sudo -u postgres createdb -O zabbix -E Unicode -T template0 zabbix

# Add Zabbix repository
wget https://repo.zabbix.com/zabbix/7.2/debian/pool/main/z/zabbix-release/zabbix-release_7.2-1+debian12_all.deb
dpkg -i zabbix-release_7.2-1+debian12_all.deb
apt update

# Install Zabbix server, frontend, and agent
apt install -y zabbix-server-pgsql zabbix-frontend-php zabbix-apache-conf zabbix-sql-scripts zabbix-agent2

# Import initial schema
zcat /usr/share/zabbix-sql-scripts/postgresql/server.sql.gz | \
  sudo -u zabbix psql zabbix

# Configure Zabbix server
cat > /etc/zabbix/zabbix_server.conf << 'EOF'
DBHost=localhost
DBName=zabbix
DBUser=zabbix
DBPassword=YOUR_POSTGRES_PASSWORD_HERE
LogFile=/var/log/zabbix/zabbix_server.log
LogFileSize=50
DebugLevel=3
StartPollers=5
StartPollersUnreachable=2
StartTrappers=5
StartDiscoverers=2
StartHTTPPollers=2
CacheSize=64M
HistoryCacheSize=32M
TrendCacheSize=8M
ValueCacheSize=32M
Timeout=10
EOF

# Start services
systemctl restart zabbix-server zabbix-agent2 apache2
systemctl enable zabbix-server zabbix-agent2 apache2

Step 3: Install Grafana

# Add Grafana repository
apt install -y apt-transport-https software-properties-common
wget -q -O /usr/share/keyrings/grafana.key https://apt.grafana.com/gpg.key

echo "deb [signed-by=/usr/share/keyrings/grafana.key] https://apt.grafana.com stable main" | \
  tee /etc/apt/sources.list.d/grafana.list

apt update && apt install -y grafana

# Configure Grafana
cat > /etc/grafana/grafana.ini << 'EOF'
[server]
http_addr = 10.0.20.11
http_port = 3000
domain = grafana.commsnet.local

[security]
admin_user = admin
admin_password = CHANGE_ME_IMMEDIATELY

[database]
type = sqlite3

[analytics]
reporting_enabled = false
check_for_updates = false

[auth.anonymous]
enabled = false
EOF

systemctl restart grafana-server
systemctl enable grafana-server

Step 4: Connect Grafana to Zabbix

Install the Zabbix data source plugin in Grafana:

grafana-cli plugins install alexanderzobnin-zabbix-app
systemctl restart grafana-server

In Grafana UI (Configuration → Plugins → Zabbix):

Enable the Zabbix app plugin
Add data source:
- Name: Zabbix
- Type: Zabbix API
- URL: http://10.0.20.10/zabbix/api_jsonrpc.php
- Username: Admin
- Password: Your Zabbix admin password
- Trends: Enable (use trends for long-term graphs)

Configuring Zabbix Agents

Agent on Proxmox Host

# On the Proxmox host itself
apt install -y zabbix-agent2

cat > /etc/zabbix/zabbix_agent2.conf << 'EOF'
Server=10.0.20.10
ServerActive=10.0.20.10
Hostname=proxmox-host
LogFile=/var/log/zabbix/zabbix_agent2.log
DebugLevel=3

# Custom metrics for Proxmox
UserParameter=pve.cluster.status,/usr/bin/pvesh get /cluster/status --output json 2>/dev/null | grep -c '"online"'
UserParameter=pve.vm.count,/usr/bin/qm list 2>/dev/null | wc -l
UserParameter=pve.ct.count,/usr/bin/pct list 2>/dev/null | wc -l
UserParameter=pve.storage.used[*],/usr/bin/pvesm status --storage $1 --output json 2>/dev/null | grep -o '"used":[0-9]*' | cut -d: -f2
UserParameter=pve.storage.total[*],/usr/bin/pvesm status --storage $1 --output json 2>/dev/null | grep -o '"total":[0-9]*' | cut -d: -f2
EOF

systemctl restart zabbix-agent2
systemctl enable zabbix-agent2

Agent on pfSense

pfSense has a Zabbix agent package:

System → Package Manager → Available Packages → pfSense-zabbix-agent

Configuration:

Zabbix Server IP: 10.0.20.10
Zabbix Server Port: 10051
Hostname: pfsense
Enable active checks: Yes

pfSense-specific items to monitor:

CARP status (if using HA)
Gateway quality (packet loss, latency, jitter)
State table utilization
DHCP lease counts per VLAN
Firewall rule denials per VLAN (from our zero-trust setup)
OpenVPN/WireGuard client counts
Interface traffic per VLAN

Agent on Docker Hosts

# docker-compose.yml for Zabbix agent
version: '3.8'
services:
  zabbix-agent:
    image: zabbix/zabbix-agent2:latest
    container_name: zabbix-agent
    restart: unless-stopped
    environment:
      - ZBX_SERVER_HOST=10.0.20.10
      - ZBX_HOSTNAME=docker-host-01
      - ZBX_ACTIVE_ALLOW=true
    volumes:
      - /:/hostfs:ro
      - /var/run/docker.sock:/var/run/docker.sock:ro
    network_mode: host
    privileged: true

Docker-specific metrics:

# Additional UserParameters for Docker
UserParameter=docker.container.count,/usr/bin/docker ps -q | wc -l
UserParameter=docker.container.running,/usr/bin/docker ps --filter status=running -q | wc -l
UserParameter=docker.image.count,/usr/bin/docker images -q | wc -l
UserParameter=docker.volume.count,/usr/bin/docker volume ls -q | wc -l

Zabbix Host Configuration

Adding Hosts in Zabbix UI

Configuration → Hosts → Create Host:

Host	Templates	Groups	Interface	Proxy
proxmox-host	Linux by Zabbix agent, Proxmox VE by Zabbix	Servers	Agent: 10.0.20.5:10051	None
pfsense	pfSense by Zabbix	Network	Agent: 10.0.10.1:10051	None
docker-host-01	Linux by Zabbix agent, Docker by Zabbix	Servers	Agent: 10.0.20.20:10051	None
unifi-switch	SNMP Generic, Ubiquiti Switch	Network	SNMP: 10.0.10.2:161	None
cisco-2960	SNMP Generic, Cisco Switch	Network	SNMP: 10.0.10.3:161	None

Template Customization

Zabbix templates are good out of the box but need tuning for homelab scale:

Linux by Zabbix agent — Adjust trigger thresholds:

Trigger	Default	Homelab Adjusted	Reason
CPU load > 5min per core	5 per core	80% sustained 10min	Homelab CPUs burst, don't alert on spikes
Available memory < 20%	20%	10%	Homelab hosts use more memory; 20% is too sensitive
Disk space < 20%	20%	10%	Small disks fill faster; 20% on 100GB = 20GB free
Swap usage > 50%	50%	80%	Some swap usage is normal in homelabs

pfSense template — Add custom items:

# Custom pfSense items via UserParameter
UserParameter=pfsense.gateway.loss[*],/usr/local/bin/php -r "require '/etc/inc/util.inc'; echo get_gateway_loss('$1');"
UserParameter=pfsense.dhcp.leases[*],/usr/local/bin/php -r "require '/etc/inc/util.inc'; echo count_dhcp_leases('$1');"
UserParameter=pfsense.firmware.version,/usr/local/bin/php -r "require '/etc/inc/util.inc'; echo get_firmware_version();"

Building Grafana Dashboards

Dashboard 1: Infrastructure Overview

The single pane of glass for your entire homelab:

Panels:

Row: "Host Status" ────────────────────────────────
  [Stat]  Hosts Up          zabbix: hosts.count{status=0}
  [Stat]  Hosts Down        zabbix: hosts.count{status=1}
  [Stat]  Active Triggers   zabbix: triggers.count{value=1}

Row: "System Health" ──────────────────────────────
  [Time Series]  CPU Usage per Host    zabbix: system.cpu.util{host=*}
  [Gauge]        Memory % per Host     zabbix: vm.memory.util{host=*}
  [Time Series]  Disk I/O per Host     zabbix: vfs.dev.read{host=*}, vfs.dev.write{host=*}

Row: "Network" ─────────────────────────────────────
  [Time Series]  Traffic per VLAN      zabbix: net.if.in{host=pfsense,if=VLAN*}
  [Stat]         Firewall Denials/h    zabbix: pf.deny.count
  [Table]        Top Talkers          zabbix: net.if.total{host=*}

Row: "Storage" ─────────────────────────────────────
  [Gauge]   Proxmox Storage Used  zabbix: pve.storage.used[*]
  [Bar]     Docker Disk Usage     zabbix: vfs.fs.size{host=docker*,fs=/var/lib/docker}

Dashboard 2: pfSense Network Security

Dedicated to monitoring the zero-trust firewall:

Panels:

Row: "Firewall Activity" ──────────────────────────
  [Time Series]  Denials per VLAN/hour    zabbix: pf.deny{vlan=*}
  [Table]        Top Denied Sources       zabbix: pf.deny.src{groupby=src_ip}
  [Time Series]  Allow vs Deny Ratio      zabbix: pf.allow / pf.deny

Row: "Gateway Quality" ────────────────────────────
  [Time Series]  Packet Loss %             zabbix: pfsense.gateway.loss[*]
  [Time Series]  Latency ms               zabbix: pfsense.gateway.latency[*]
  [Stat]         Gateway Status            zabbix: pfsense.gateway.status

Row: "DHCP Leases" ────────────────────────────────
  [Stat]     MGMT Leases       zabbix: pfsense.dhcp.leases[MGMT]
  [Stat]     Server Leases     zabbix: pfsense.dhcp.leases[SERVERS]
  [Stat]     IoT Leases        zabbix: pfsense.dhcp.leases[IOT]
  [Stat]     Guest Leases      zabbix: pfsense.dhcp.leases[GUEST]
  [Table]    New Leases (24h)   zabbix: pfsense.dhcp.new_leases

Dashboard 3: Proxmox Virtualization

Panels:

Row: "Cluster Health" ─────────────────────────────
  [Stat]    Cluster Status          zabbix: pve.cluster.status
  [Stat]    VMs Running             zabbix: pve.vm.count
  [Stat]    CTs Running             zabbix: pve.ct.count

Row: "Resource Usage" ─────────────────────────────
  [Gauge]    CPU Total              zabbix: system.cpu.util{host=proxmox*}
  [Gauge]    Memory Total           zabbix: vm.memory.util{host=proxmox*}
  [Bar]      Storage per Pool      zabbix: pve.storage.used[*] / pve.storage.total[*]

Row: "VM/CT Details" ─────────────────────────────
  [Table]    All VMs + CPU/Mem/Disk  zabbix: pve.vm.{cpu,mem,disk}[*]
  [Table]    All CTs + CPU/Mem/Disk  zabbix: pve.ct.{cpu,mem,disk}[*]

Grafana Variables for Reusable Dashboards

Set up template variables so dashboards work across all hosts:

Variable: $host
  Type: Query
  Query: zabbix: hosts*
  Multi-value: Yes
  Include All: Yes

Variable: $vlan
  Type: Custom
  Values: MGMT, SERVERS, IOT, GUEST, SERVICES

Variable: $interval
  Type: Interval
  Values: 1m,5m,10m,30m,1h,6h,1d
  Auto: Yes

Alerting: Wake Me When It Matters

Zabbix Triggers → Grafana Alerts

The alerting pipeline:

Zabbix Agent → Zabbix Server (trigger fires) → Grafana Alert Rule → Notification Policy → Channel

Critical Alerts (Wake Me Up)

Alert	Trigger Expression	Severity	Channel
Host down	`nodata(5m)`	Disaster	Telegram + Email
Disk > 90%	`last(/{HOST}/vfs.fs.size[pct])>90`	High	Telegram
pfSense down	`nodata(3m)` on pfSense	Disaster	Telegram + Email
Gateway packet loss > 10%	`last(/{HOST}/pfsense.gateway.loss)>10`	High	Telegram
Zabbix server down	Internal zabbix trigger	Disaster	Email (fallback)

Warning Alerts (Check in Morning)

Alert	Trigger Expression	Severity	Channel
CPU > 80% sustained	`avg(10m)>80`	Warning	Dashboard only
Memory > 85%	`last(/{HOST}/vm.memory.util)>85`	Warning	Dashboard only
Certificate expiring < 14 days	`last(/{HOST}/cert.days_left)<14`	Warning	Email digest
Docker container stopped	`last(/{HOST}/docker.container.running)<expected`	Warning	Dashboard only

Information Alerts (Weekly Digest)

Alert	Trigger Expression	Severity	Channel
New DHCP lease on MGMT VLAN	Event log match	Info	Weekly digest
Firmware update available	`diff(/{HOST}/pfsense.firmware.version)`	Info	Weekly digest
Storage growth rate > 5%/week	`trend(7d)>5`	Info	Weekly digest

Telegram Alert Integration

Grafana supports Telegram natively. Create a bot via @BotFather:

Grafana → Alerting → Contact points → Add Contact Point
  Type: Telegram
  BOT API Token: YOUR_BOT_TOKEN
  Chat ID: YOUR_CHAT_ID

Notification Policy:
  Group by: alertname, severity
  Group wait: 30s
  Group interval: 5m
  Repeat interval: 4h

  Route: severity=disaster → Telegram immediately
  Route: severity=high → Telegram, 5m repeat
  Route: severity=warning → Email digest, 1d repeat
  Route: severity=info → Weekly email

Performance Tuning

Zabbix Housekeeper

Zabbix's built-in housekeeper is notoriously slow with PostgreSQL. Replace it with partitioned tables:

-- Connect to Zabbix database
sudo -u postgres psql zabbix

-- Enable partitioning extension
CREATE EXTENSION IF NOT EXISTS timescaledb;

-- Convert history tables to hypertables (TimescaleDB)
SELECT create_hypertable('history', 'clock', chunk_time_interval => 86400);
SELECT create_hypertable('history_uint', 'clock', chunk_time_interval => 86400);
SELECT create_hypertable('history_str', 'clock', chunk_time_interval => 86400);
SELECT create_hypertable('trends', 'clock', chunk_time_interval => 2592000);
SELECT create_hypertable('trends_uint', 'clock', chunk_time_interval => 2592000);

Disable Zabbix internal housekeeper (TimescaleDB handles it now):

# /etc/zabbix/zabbix_server.conf
DisableHousekeeping=1

Set retention policies:

-- Keep raw history for 14 days
SELECT add_retention_policy('history', INTERVAL '14 days');
SELECT add_retention_policy('history_uint', INTERVAL '14 days');
SELECT add_retention_policy('history_str', INTERVAL '14 days');

-- Keep trends for 2 years
SELECT add_retention_policy('trends', INTERVAL '2 years');
SELECT add_retention_policy('trends_uint', INTERVAL '2 years');

Database Size Estimates

Monitoring	Items	History/Day	14-Day History	2-Year Trends	Total DB Size
5 hosts	~500	~15 MB	~210 MB	~200 MB	~500 MB
10 hosts	~1000	~30 MB	~420 MB	~400 MB	~1 GB
20 hosts	~2000	~60 MB	~840 MB	~800 MB	~2 GB

A homelab with 10-20 hosts will use 1-2 GB of storage over 2 years. That's nothing.

Backup Strategy

Your monitoring data is valuable — it contains your baseline, your history, your incident timeline. Back it up.

Zabbix Database Backup

#!/bin/bash
# zabbix-backup.sh — Daily Zabbix database backup
BACKUP_DIR="/mnt/nas/backups/zabbix"
DATE=$(date +%Y-%m-%d)
RETENTION_DAYS=30

# PostgreSQL dump
sudo -u postgres pg_dump zabbix | gzip > "${BACKUP_DIR}/zabbix_${DATE}.sql.gz"

# Zabbix config
tar czf "${BACKUP_DIR}/zabbix_config_${DATE}.tar.gz" \
  /etc/zabbix/ /etc/grafana/

# Cleanup old backups
find "${BACKUP_DIR}" -name "zabbix_*.sql.gz" -mtime +${RETENTION_DAYS} -delete
find "${BACKUP_DIR}" -name "zabbix_config_*.tar.gz" -mtime +${RETENTION_DAYS} -delete

echo "Backup complete: ${DATE}"

Grafana Dashboard Export

Grafana dashboards should be version-controlled:

#!/bin/bash
# grafana-export.sh — Export all dashboards as JSON
GRAFANA_URL="http://10.0.20.11:3000"
API_KEY="YOUR_GRAFANA_API_KEY"
OUTPUT_DIR="/home/commstech/grafana-dashboards"

# Get all dashboard UIDs
DASHBOARDS=$(curl -s -H "Authorization: Bearer ${API_KEY}" \
  "${GRAFANA_URL}/api/search?type=dash-db" | \
  jq -r '.[] | .uid')

# Export each dashboard
for UID in ${DASHBOARDS}; do
  curl -s -H "Authorization: Bearer ${API_KEY}" \
    "${GRAFANA_URL}/api/dashboards/uid/${UID}" | \
    jq '.dashboard' > "${OUTPUT_DIR}/${UID}.json"
done

echo "Exported $(echo ${DASHBOARDS} | wc -w) dashboards"

Commit these to git. Your dashboards are code.

Cost Summary

Item	Cost	Notes
Zabbix Server (LXC on Proxmox)	$0	Already have hardware
Grafana (LXC on Proxmox)	$0	Already have hardware
PostgreSQL + TimescaleDB	$0	Open source
Zabbix Agents	$0	Open source
Storage (2 GB over 2 years)	$0	Negligible
Telegram bot for alerts	$0	Free tier
Total Monthly Cost	$0	Self-hosted, zero subscriptions

Compare to Datadog at $15/host/month for 10 hosts = $150/month = $1,800/year. You're saving $1,800/year by self-hosting.

Dashboard Screenshots Description

Since this is a text article, here's what your dashboards should look like:

Infrastructure Overview Dashboard

Top row: Three large stat panels — green "5 Hosts Up", red "0 Hosts Down", orange "2 Active Warnings"
Middle left: Time series graph showing CPU usage for all hosts over last 1 hour, with 80% threshold line
Middle right: Gauge panels showing memory usage per host (color-coded: green < 60%, yellow 60-80%, red > 80%)
Bottom left: Network traffic stacked area chart per VLAN
Bottom right: Storage usage horizontal bar chart per Proxmox pool

pfSense Security Dashboard

Top row: Firewall deny rate time series — should show consistent low rate, any spike is suspicious
Middle: Table of top 10 denied source IPs with last attempt time
Bottom: DHCP lease count per VLAN as small bar charts, with "new in 24h" annotation

Next Steps

With monitoring in place, you can now:

Set up automated remediation — Zabbix can run scripts on alert (restart a service, clear a cache)
Add log monitoring — Forward syslog from pfSense, Proxmox, and Docker to Zabbix
Implement capacity planning — Use trend data to predict when you'll run out of disk/CPU/memory
Add synthetic monitoring — Zabbix web scenarios to check your services are actually responding
Integrate with Home Assistant — Send Zabbix alerts to your home automation for visual/audio alerts

Key Takeaways

Self-hosted monitoring costs nothing but electricity — Zabbix + Grafana is enterprise-grade, free, and yours
Zabbix for collection, Grafana for visualization — each tool does what it does best
Active checks work across VLANs — agents push data, no need to open inbound ports
Tune your triggers for homelab scale — enterprise defaults are too sensitive for home infrastructure
TimescaleDB partitioning is essential — the built-in housekeeper will kill your database performance
Alert on what matters, ignore what doesn't — disaster = wake me, warning = check in morning, info = weekly digest
Version-control your dashboards — they're infrastructure code, not click-and-hope
Back up your monitoring data — it's your operational history, and losing it means losing your baselines

CommsNet — Building infrastructure that respects your privacy and your intelligence.

Follow on Medium and Dev.to for more homelab, networking, and self-hosting content.