CivicEngage: Service Disruption Nov 4 2019



  • Carl Bowen

    All Sites have Recovered. We apologize for the inconvenience caused to clients and citizens and will take measures to learn and improve from this. The Cause of the issue appears to be triggered by a vmotion of the server to balance workload between hosts resulted in the server becoming unresponsive this task is completely routinely without issue. The vmotion reported as completing successfully with CPU usage showing much lower in the management dashboard but the server was unresponsive. This  forced the engineer to Hard reboot the machine and he also moved it back to the original host. Investigation is ongoing but it appears that the Server was operating at elevated CPU levels before the event and the vmotion event aggravated that condition.  After the Restart and second vMotion the server appeared to boot normally and it took 15-20 minutes from that point for sites to recover. 

    Comment actions Permalink