Two core servers need to be taken offline in order to create images of their operating systems. Having these images is vital in restoring servers if they fail unexpectedly. The main services affected by this outage will be email, wireless access, and DHCP; however ALL network services could experience disruption.
Monthly Archives: March 2005
Dad and Mail server down
Dad and Mail servers went down when the UPS signalling cable was removed.
This occured during the replacement of the control box for the matrix 5000 on the south rack. After the matrix was back on-line, the servers were powered back on.
Webmail outage
Webmail failed when DAD failed. Had problems getting the SSL cert reinstalled.
Student Information System Outage
The Student Information System (iSeries, Campus Web) will be unavailable for routine maintenance to the system. A two hour interruption is anticipated.
The systems affected will be Jenzabar application software (TEAMS2000) and CampusWeb (https://campusweb.emu.edu). CampusWeb provides student access to their financial information (billings and FinAid) as well as final class grades.
No other systems are affected by this outage.
UPS Failure During Diagnostics
During diagnostic procedures following UPS failure of 3/6, the affected UPS failed again. This caused a number of single-source power supply serversDuring diagnostic procedures following UPS failure of 3/6, the affected UPS failed again. This caused a number of single-source power supply servers to crash, most noteably the SIS running on the IBM iSeries (Jenzabar TEAMS 2000), the EMU web server (www.emu.edu) and the Oracle Calendar server.
To prevent this from reoccurring the following servers were moved to the north rack MATRIX UPS: IBM I-series, TS, and Tserve. ZW is still connected to the additional small 280 UPS. All test servers located on the north rack have been powered off – to increase run-time/decrease load on the MATRIX.
APC RMA number: 886725
contact: Ed
The control head has been determined to possibly be the cause due to other issues currently present. A new one is being shipped. We will need to test to see why, when removing the control head, power was lost. We will need to document when this occured – when placed in bypass, when removed, when replaced, etc.
A couple of issues were also discovered in the configuration of the powerchute agent on ZW. This is due to it being a fresh install – use the web client to modify the parameters.
E-Mail Server Failed to Start Properly Following UPS Failure
Email server is not working properly. Mail is being handled but users cannot access the server. Problem followed power failure on a UPS in Data Center about 8pm on Sunday, 3/6. UPS placed back online by 1am, 3/7 but mail server authentication services would not start properly on re-boot of server. It is believed to be related to changes made to the authentication system the week of 3/1.
—————————
3:00am — After numerous attempts to have authentication services start correctly (using the newer configuration) it was decided to return to the older authentication system. This was successful. Jason will have to debug the system when he returns to work on Thursday, 3/10.
UPS Failure , Several Servers Shutdown
The Matrix UPS providing power to the SOUTH RACK failed at about 8pm, Sun, 3/6. Discovered about 10:45pm. NetSYS tech support switched UPS to BYPASS mode about 11pm.
Servers known to be affected include: Calendar, Mail, DNS, CCTV, FS, FSAPPS. Exact status of servers unknown.
NetSYS techs discussed strategy and decided to take down all servers with single-source powersupplies and switch UPS back to OPERATING mode. This was done and UPS responded appropriately and all seemed to be working normally. Battery at 98%.
Special procedure done to the iSeries (AS400) box (a single-source power supply server). NetSYS worked with Loretta (via phone) to powerdown the unit. Pressed the right button on face plate and power-up proceeded to LOGIN screen on console within 5 minutes.
As a precaution DNS server moved to NORTH Matrix until we are sure all problems with SOUTH Matrix are resolved.
Some additional servers had to be restarted to insure they were running properly. In the course of doing these server restarts it was found that the MAIL server would not start properly. This may be due to changes made last week for the LDAP services. See separate Outage log setup for the MAIL server.
Campus Web Unavailable
The Campus Web website is currently unavailable. Technicians are working to get it back online.
Problem resolved about 1pm. Failed backup overnight blocked restart of web services. Clearing error messages allowed the script to continue.
ZW server hung
ZW server not responding. CPU at 75%
rebooted it.
Lancaster network outage – all day
The Firewall and router for the Lancaster campus will be upgraded on March 1. This means the internet connection (and connection to/from EMU) will be sporatic throughout the day. Due to the nature of the changes – it is impossible to say exactly when the network will be unavailable or for how long.