E-Mail Server Failed to Start Properly Following UPS Failure

Email server is not working properly. Mail is being handled but users cannot access the server. Problem followed power failure on a UPS in Data Center about 8pm on Sunday, 3/6. UPS placed back online by 1am, 3/7 but mail server authentication services would not start properly on re-boot of server. It is believed to be related to changes made to the authentication system the week of 3/1.
—————————

3:00am — After numerous attempts to have authentication services start correctly (using the newer configuration) it was decided to return to the older authentication system. This was successful. Jason will have to debug the system when he returns to work on Thursday, 3/10.

UPS Failure , Several Servers Shutdown

The Matrix UPS providing power to the SOUTH RACK failed at about 8pm, Sun, 3/6. Discovered about 10:45pm. NetSYS tech support switched UPS to BYPASS mode about 11pm.

Servers known to be affected include: Calendar, Mail, DNS, CCTV, FS, FSAPPS. Exact status of servers unknown.

NetSYS techs discussed strategy and decided to take down all servers with single-source powersupplies and switch UPS back to OPERATING mode. This was done and UPS responded appropriately and all seemed to be working normally. Battery at 98%.

Special procedure done to the iSeries (AS400) box (a single-source power supply server). NetSYS worked with Loretta (via phone) to powerdown the unit. Pressed the right button on face plate and power-up proceeded to LOGIN screen on console within 5 minutes.

As a precaution DNS server moved to NORTH Matrix until we are sure all problems with SOUTH Matrix are resolved.

Some additional servers had to be restarted to insure they were running properly. In the course of doing these server restarts it was found that the MAIL server would not start properly. This may be due to changes made last week for the LDAP services. See separate Outage log setup for the MAIL server.

ST crashed

ST crashed today when doing some general maintenance.

When DSrepair finished, it showed on error corrected. Hit the ESC key and the notification went away, but the screen never came back to normal. CPU was at 90+, monitor showed nothing abnormal in busiest threads – a lot were 200,000 – 300,000. Had to power off and restart.