Some Webservio hosting systems experienced downtime today as a result of a UPS backup error.
- Self-tests and individual battery module testing results indicated a full UPS battery replacement was needed. No simulated full power outage was necessary. All UPS batteries have been replaced, installed, and tested. All tests have been successful and systems indicate 40+ minutes of runtime on battery power alone. Automated UPS system self-test results will be verified weekly to ensure the integrity of UPS system.
- Event Description
- Power to the Knoxville data facility was lost for approximately 16 seconds, beginning on 11/9/2018 at 3:43:47 and ending at 3:44:03.
- As a result, all server equipment shut down for approximately 16 seconds.
- CAT generator automatically started and ran for approximately 10 minutes.
- During this time, all server equipment powered on.
- Facility power returned to normal.
- The UPS battery backup did not sustain power long enough to allow the CAT generator to turn on and fully stabilize prior to the ATS making the switch to generator power input.
- Incident Response
- Webservio engineers visually inspected all power distribution unit control panels and battery units. No errors or battery low indicators were found.
- Webservio engineers visually inspected all systems, monitoring graphs and monitoring notifications, noting all systems that did not come back online correctly. These included several VPS host servers, Webservio's billing, support, and phone systems, and four servers belonging to colocation clients.
- Webservio engineers connected consoles to each of these systems and worked with clients to resolve all system errors.
- All Webservio and client systems were restored by approximately 7:00PM EST.
- Incident Resolution
- Webservio engineers reviewed the event logs from the UPS battery systems. No determination of incident causes can be made by reviewing the logs alone.
- Webservio engineers have determined a simulated power outage must be performed to fully inspect the UPS batteries, battery connections, loads, generator, and ATS functions in real-time. Due to the sensitive nature of this testing, we are working to schedule the simulated power outage for late evening.
- Webservio staff will send notifications to all clients to inform everyone of the test schedule.
- Webservio engineering will analyze the results of the simulated power outage and use this analysis to inform decisions regarding the next steps to take to ensure the power reliability of the Infinity Data Center.
At approximately 3:30 pm EST today the Knoxville data facility location experienced a brief power flicker. Due to an error with the UPS battery backup, this caused some servers to reboot. Upon reboot, some systems had to be manually restored, which resulted in downtime for Webservio hosting clients located at the Knoxville facility. Service was restored for all systems by 7:00 pm EST.
As part of this incident, it was necessary to manually restore the Webservio billing system. When the system was restored a cron job ran automatically, generating invoice notices. PLEASE DISREGARD THESE INVOICE EMAILS. The incorrect invoice notices were sent between 6:30 - 6:38 pm EST on November 9, 2018. These invoices did not get created in the system, and you can verify this upon logging into your account at https://billing.webservio.net/clientarea.php
If you have any questions, please create a ticket and we will be glad to respond.