the web server and calendar servers were down due to a ups problem
NOVELL SERVERS: 10 Minute Outage for REBOOT, 5:30pm, Today
A problem has been identified following the Novell software upgrade of Saturday that will now require a reboot of all the Novell servers.
Info Systems will reboot the Novell servers at 5:30pm today. A message to save any open documents and logoff Novell will be broadcast 10 minutes before the servers go down. The outage will last about 10 minutes.
This only affects network disks, network printers and authentication to email and calendar.
If users do NOT logoff of Novell by 5:30pm it would be best to reboot their computers in order to properly reconnect to the Novell servers after the outage.
Novastore backup problems
The Novell backups failed after the SP6 Novell upgrade.
The Novastor “nnadmin” was unable to see the ultrium tape or the other Novell servers.
After searching the Novell and Novastor websites a TID was found that stated: In Novastor, communication between Novell servers may fail after the installation of SP6. If this occurs you must install the latest TCP stack drivers found on Novell’s website.
These were installed and seemed to take care of the Novell communication problem. HOWEVER, we still had the problem with the ultrium tape drive.
After much digging and trying different/updated drivers we found that in “NNADMIN” the wrong/corrupted SCSI driver was being referenced for the ultrium tape drive. This lead us to find out the correct driver was not being loaded due to a
“invalid slot” failure. When loading the driver manually, it stated which slots must be referenced. After correcting the slot value, all seemed to work well.
NOTE:
During this testing we found out that if you change the load order of drivers in the “startup.ncf” file, NNADMIN assigns a different value to those drivers. This will cause errors in the backup because the device you selected to backup to will no longer exist.
Currently the adpt160m.ham driver is the only updated driver. If there are backup problems, this will need to be rolled back.
Novell Server Service Pack 6 Upgrade
Novell Service Pack 6 needs to be installed prior to hardware upgrade migration process planned in late October or early November.
The planned outage for this Service Pack upgrade is Saturday morning, 10/04/03
Systems that will be UNAVAILABLE:
Novell Network Disks/Printers (FS, ST, FSAPPS, STAPPS)
Novell NDS Authentication Services:
– EMail (all clients)
– Calendar
Systems that WILL REMAIN available:
EMU Web Server (www.emu.edu)
Blackboard (http://bb.emu.edu)
Sadie (Library Catalog System)
AS400 (CampusWeb)
The installation is expected to take 6 to 9 hours. The process will begin about 6am.
——————————
Close-out Note as of 1:00pm, Saturday 10/04/03:
This upgrade process went very well. All servers returned to service by 11:30am, Saturday, 10/4/03.
Thank you, Dan Marple Jr, for doing a superb job of planning and implementation of this service pack upgrade that involved 6 production Novell servers. This is probably a first to have *NO GLITCHES*!
— Jack Rutt —
= = = = = = = Follow up Comments = = = = = = = =
In preparing for the SP6 upgrade I ran DSREPAIR. This gave me errors on 4 servers. This error was a -771 in reference to “error initializing schema cache”. No matter what I did I couldn’t resolve the error. Following TID 10063329 I did the following on the NS server:
Downed it,
restart using “server -ndb”
ran a local repair
down
restart normally.
This seemed to resolve the errors. I decided not to do anything further till Sat morninng. On Sat Morning I bounced all servers and ran a DSREPAIR on all of them. No errors were reported. Why – I’m not sure.
What was accomplished this outage:
1. Install SP6 on the following servers: NS, LD, FSAPPS, FS ST STAPPS.
2. Image all servers before and after upgrade.
3. No drivers were changed during the upgrade.
4. No files were backed up using the SP backup routine.
5. FSAPPS was upgraded to 2GB memory.
— Dan Marple Jr. —
Novell server FSAPPS was crashing
FSAPPS file server was behaving irradically. When trying to run DSREPAIR and do an unattended repair, the nlm hung.
Other access seemed to be normal, so nothing was initially done. AFter an hour or so, things started to slow down, bindery connections were not permitted, and other “funky” things.
All users of goldmine were notified to log off the system.
The server was NOT able to be brought down gracefully. Meaning the server was powered down, all volumes were “verpaired”, and the server restarted.
All seemed normal after that.
The reason for this could have been because of:
1. Not enough memory,
2. Memory incompatibility – 2 512 chips and 2-128 chips – Novell recommends all chips be the same.
— Dan Marple Jr. —
Brief Email Interruption
Users who were logged into webmail or IMAP at this time may need to log out and back in again. This is due to a small system change made in preparation for upcoming system upgrades.
EMail Server Failure – Incoming Internet Mail Rejected
Mail coming to the EMU email server from the Internet was rejected between 5:00pm and 6:40pm today (Wed, 9/17). We estimate this involved about 1,000 messages, many of which were likely spam. The senders of this mail should have received error messages.
Early Morning Calendar Outage
The calendar server was taken offline for about 30 minutes in order to make a backup image of the server OS.
Calendar Server Login Failure
Logins to Calendar Server from non-web clients failed. Problem was a repeat of an occurence on 08/31/2003. Problem seems to have occurred only since the upgrade to 9.0.4 was performed.
Jeremy is tracking the details of these failures so that a complete description can be given to Oracle in order to open a support ticket.
Blackboard server outage 7 AM Sept 5
A patch was applied to Blackboard at 7:45 AM. Application was unintrusive and system unavailability lasted only a few seconds.