BB went down because of a power managment problem – came back up clean.
BB and DNS outage
Blackboard and Dad Servers went off line. This was due to the power to the UPS going out, causing the power managment software to gracefully shut down the servers. Power was restored and servers brought back on-line.
Faculty/Staff Novell Server Failure
Faculty/Staff Novell server experienced a hard failure on the SYSTEM drives. Both drives in the mirrored set failed. Various restart efforts attempted. Ultimately the SYSTEM disk had to be restored from the daily backups but numerous problems were encountered during the restore.
Dan Marple Jr directed the recovery effort beginning at 7:30am, 3/6. The server was returned to service about 5:00pm.
The server appears to be stable but a final assessment will be made first thing on Friday morning, 3/7. Faculty/Staff can once again use their network drives and printers by restarting their computers and logging in to Novell.
All Day Outage for Network Software Upgrade
Upgrade the 6509 IOS and the MSFC IOS to allow for the use of the new copper based GBICs. Changed all trunking from ISL to DOT1Q. Had problems in the seminary with the link between the 3550-48 and the 3524 on the 1st floor. It is possibly a bad gigastack on the 3524.
Upgraded the Pix to IOS 6.2.2 to allow for testing and use of VPN’s for Lancaster as well as for the windows client.
A flash card will now need to be purchased for the 6509 to allow for failover and for future upgrades – not enough memory on the board anymore.
All Novell servers, but NS, are patched to the latest service packs for pervasive software. I’m leary on doing NS since these SP’s affect Btrieve, which Novastor uses.
Dan
Reconfigure RAID on IMAP Mail Server
Reconfigured the RAID disk on the IMAP email server (ms.emu.edu) so that the WRITE CACHE is enabled.
Although it was hoped this would be a fairly simple change, the reality was that an image of the 8gb disk had to be made and the RAID was then completely rebuilt. The actual reconfiguration effort began about 10am and was completed about 6pm. However, as a backup strategy in preparation for this procedure, the IMAP mail services were shutdown before the nightly backup of the server at 2am.
We believe this should improve the I/O performance of the IMAP mail server.
8+ Hr Webmail Server Outage.
The webserver service on the webmail server died about 4:00am, Sunday, 2/16. Adam Nolley notified Jack about 11:30am. Jack contacted Ben Beachy at home who was then able to dial in to the system, diagnose the problem and fix it. Service was restored about 12:20pm. Cause of problem was a program that was installed during a build of the server. The program was not being used but it’s license expired and it did not shutdown gracefully. The program has now been removed totally from the server.
Novell server FSAPPS crashed – 30 min downtime
Novell server FSAPPS crashed.
abended in DS.nlm – couldn’t tell why based upon logs.
Ran DSREPAIR with options for cleaning 613 errors about 15 min before crash – but it was also run on all of the servers.
All-Day Server Outage – Saturday, 1/18/03
Info Systems is performing a major Novell server upgrade on Sat, 1/18/03. Work will begin shortly after 6am during which time the Novell servers will be unavailable. The outage could last to as late as 9pm, however, it is hoped that it can be completed before 5pm.
During the outage there will be no access to the Novell servers (i.e. network drives and printers). Users will also NOT be able to use email during the outage.
Status of the outage will be posted on the Critical Information Notice on the EMU home page throughout the day.
10 Hr Webmail Server Failure
Weekly restart of Webmail Apache server(webmail.emu.edu) failed due to new config file entry. Only the Webmail server affected. Mail services from ms.mail.emu were unaffected.
15 Min WebMail Outage
WebMail experienced an outage of about 15 minutes. Users could not login to WebMail. Other IMAP services were unaffected. Network Admin made some config changes to get it working again. No clear explanation for the failure.