Comprehensive System Health Monitoring

Заказчик: AI | Опубликовано: 07.10.2025

I’m rolling out a full-scale monitoring and incident-management framework for my web, database, and mail servers and I’d like an expert to lead the build-out from design through hand-off. The core of the solution will be Nagios (or a compatible fork you recommend). Daily health checks need to cover CPU, memory, disk, key services and transactional probes, with clear dashboards and meaningful thresholds. On top of that, cron-driven email alerts must fire the moment a system restart, billing-tracking anomaly, or content CDR issue is detected, so scripting familiarity (bash or Python) is essential. Because my most frequent pain points are system outages, performance slowdowns, and data discrepancies, I also need a concise incident-response playbook that spells out escalation paths and SLAs. Hourly revenue monitoring for CRBT and SDP is another must—if numbers drift outside expected ranges, I want automatic alerts plus a short analysis note ready for the daily summary. Every week I meet with stakeholders, so you’ll package key metrics and incident summaries into a clean PowerPoint deck. When changes are needed, you’ll draft the Method of Procedure and Change Requests directly in our service-management portal, then coordinate with the dev team to test new features on our staging servers before they roll into production. Deliverables • Nagios (or equivalent) configuration files and dashboards • Cron scripts and alert templates for restarts, billing tracking, and content CDRs • Incident-response playbook covering outages, performance issues, data discrepancies • Revenue-monitoring job with hourly alerting logic and sample reports • Weekly slide template populated with first-week data as proof of format • Completed MOP and CR examples plus testing checklist for one sample release I’ll provide SSH access to the test environment and existing monitoring notes. If this sounds like your wheelhouse, tell me how you’d structure the project and the tools you’d use.