Novell eDirectory Security Patch Install – LDAP/Novell Down

A major security patch must be applied to the Novell Directory Services (NDS) system. The process will require several hours of down time. Systems affected will be Novell file servers and all other systems that authenticate with LDAP (i.e. email, calendar, certain web services).

Plans are to begin shortly before 7am on Saturday, 2/28. If all goes well all systems should return to service by 2:00pm.

During the outage the following network services will NOT be available:

NOVELL FILE/PRINT: (P:, G:, Z: drives and network printers)
EMAIL: LDAP authentication will be down so users cannot login to email. No email will be lost because it will be held in queue.
CALENDAR: The Oracle Calendar LDAP authentication will be down. Users will not be able to login to the calenar server.
CERTAIN WEB SERVICES: A few functions on the EMU web site require user login (i.e. access to web directory from off-campus). These rely on LDAP which will be unavailable.

Concurrent with this upgrade technicians will be upgrading Linux servers. Each server will be down for a short time and the entire upgrade process is expected to take less than two hours–between 8:00am and 10:00am.

All other network services should remain operational (i.e. Internet access for browsing and IM).

During the EDirectory upgrade, several items were noted – they are as follows:

On server FS:
Several schema changed produced an error – they were SASSecretStore:Key, SASSecretStore:Data, PKIStore:keys, NDSPKI:Keystore. On doing a check with Novell’s knowledgebase these could be errors in the install of 8.7.1 where the attributes should be marked as hidden, but are not. They state that this error can be ignored and processing continue (TID 1008666) and that Novell can dial-in to correct this problem.
There was also a problem wher “PKI Install encountered an error -641” and “NMAS object could not be installed – error -641”. After the installation I checked and things seemed to be installed correctly – I’m hoping these errors were because of the “hidden field” problem stated earlier.

On server LD:
A schema error occured with houseIdentifier. Again – according to Novell’s knowledgebase, this error is because the shema wasn’t updated correctly with the schema enhancements of DSREPAIR. We know this not to be the case, so after everything was completed on all servers I issued a command for LD to abtain a new schema from the tree. There were also errors where, during the upgrade serveral files were not replaced because newer ones were already on the server. These files are as follows: JCERT.JAR, JNET.JAR, JSSE.JAR

On server FSAPPS:
“An error occured while installing product LDAP – error -254”. This occured during the eDirectory upgrade. The server also crashed while it was trying to reboot (during the upgrade). This required me to powercycle the machine to continue.

On server ST:
The server hung after the reboot – it didn’t show a console screen. I was able to spawn another console process and dismount all the volumes and power it off.

After all servers were upgraded, I ran DSREPAIR to check on timesync and synchronization errors – ST wouldn’t sync – error 625. After running an unattended repair it seemed to work fine.

Also noted were SOMETIMES when DSREPAIR was run (unattended full) an error would occur on the console screen that stated:

NLSLSP: main was unsuccessful
SERVER-5.00-205: Module NLS FLAIM Database Engine cannot be unloaded at this tim
e.
Module NLSLSP.NLM is being referenced
You must unload NLSFLAIM.NLM before you can unload NLSLSP.NLM
You must unload NLSTRAP.NLM before you can unload NLSLSP.NLM

2-28-2004 12:04:45 pm: SERVER-5.0-1400
Error unloading killed loadable module

This occured on most servers and didn’t occur all the time. I couldn’t find anything regarding this from Novell and all seemed to still be fine on the server afterwards – so – time will tell if we need to deal with this.

Short Novell Server Outage – faculty staff servers

The faculty/staff Novell servers need to be rebooted to fix a few things in preparation for the next upgrade.

FS was rebooted, when it was coming back up it did a hard boot when I tried to change off of the gui screen. It did come back up and ran dsrepair.

FSAPPS was rebooted. When it came back up volume APPS didn’t load w/o doing a vrepair. Then DSREPAIR still would not run. I tried unloading DS and the server crashed. After powered cycling it came back up fine and dsrepair did run.

STUDENT COMPUTERS DISCONNECTED TO SAVE THE NETWORK!

NOTE: Lacking a better description, for OUR purposes we are calling this virus episode the EMU SATURDAY VIRUS. It is a member of or variant in the Agobot family of viruses.

Residence Hall wall jacks for the students listed below have been disconnected because it is strongly suspected these computers have a virus that is flooding the EMU network and firewall with thousands of invalid data packets.

These computers need to be fixed before the room connections for them are turned on. students need to get the CD and instructions from the HelpDesk in the Campus Center. Instructions on a sheet of paper that accompanies the CD.

By carefully following the instructions for using the CD, the computer will be fixed and the student must then call the HelpDesk to have the network connection re-enabled. If, for some reason, the student computer again begins to flood the network Info Systems will disable the connection and notify the student to bring the computer to the HelpDesk for further work.

As the list below is updated the reconnected computers will be removed from this list. Computers shown on this list are disconnected as of the update time at the top of the list.

Students whose computers remain disconnected as of Tuesday, 1/29, 08:45
———————————————————————–
———————————————————————–

STUDENTS: IF YOUR NAME AND/OR WALL JACK IS ON THE LIST ABOVE — DO NOT ATTEMPT TO MOVE YOUR COMPUTER TO A DIFFERENT WALL JACK. YOUR COMPUTER HAS BEEN IDENTIFIED AS ONE THAT IS TAKING DOWN THE NETWORK! IF YOU HAVE ALREADY MOVED IT — PLEASE UNPLUG IT IMMEDIATELY!!

Problems with Internet Connection – Again at 11:00am, Sat, 1/24

Problems similar to those that began around 11:00pm, Fri 1/23, seem to have returned.

Info Systems brought a network administrator to campus at noon and began debugging the problem. At about 1:15pm it was determined that the problem is caused by a number of computers in the Residence Halls flooding the network with data that is recognized by the firewall to be “problem connections”. The firewall then started to deny these connections but the volume is too great for it to deal with, its memory filled up and inconsistent network behavior developed in many areas.

At 1:15pm the network administrators shutdown all network connectivity going into and out of the entire Residence Hall network segment. We will try to isolate the computers that are causing the problem to see if we can restore network connectivity for the rest of the Residence Hall connections. This may take several hours.

A number of odd characteristics have been observed on the network during the past week, beginning with the extreme slowdown last Monday evening (1/19) that lasted about 4 hours. Another observation that Info Systems made on Thursday and Friday was that a number of computers in the Residence Hall network segment were transmitting packets of unusual data that looked somewhat like a virus. One of these computers was brought into the HelpDesk area and examined by technicians to see if the virus could be identified. A scan of that system revealed no identifiable virus but an unusual fileset was found. However, tech support did submit a notification to Sophos, asking them to comment on the findings. As of 1:00pm today (1/24) no response has been received from Sophos.

As of 2:00pm (1/24) we believe that there are a number of computers on the Residence Hall LAN segment that have something on them that is sending lots of “bad data” to the network that is choking the firewall.

By 4:00pm the problem computers were identified and their connections to the network were disabled. Info Systems will work with these students on Mon, 1/26, to find a solution to clean their computers and re-enable their network connections.

Extreme Network Slowdown for 1 to 2 Hours, Sat, 1/24

About 9:15am, Sat, 1/24, Info Systems became aware that the EMU Internet connection was very slow, or non-existent. There were other anomalies such as problems accessing DNS servers, web server and ftp servers. Ping commands worked for some hosts and not others. No web pages could be accessed from off campus and accessing them from on-campus computers was very slow.

Diagnostic procedures lead to a possible problem with the EMU firewall. The device was rebooted about 10:10am and all network services appeared to begin to function normally.

Internet Connection Reconfiguration

A new type of Internet connection will become available to EMU that will provide more flexibility in meeting future Internet connectivity needs. This new service will require some downtime for the Internet connection to EMU.

An outside consultant will assist with this operation. Coordination will be required with our current Internet provider, nTelos.

We anticipate that about an hour of interuption will be required, however, there may be lingering affects that will require up to several hours until the up-stream DNS servers learn of this new type of connection. However, no email message should be lost, only delayed.

Systems Affected include: Internet connection, email, off-campus access to web systems on-campus (i.e. Blackboard, Campus Web, WebMail). All on-campus access to these web systems will NOT be affected.

Internet Connection is Slow or Non-Existent

About 4:35pm today (Mon, 1/19) EMU began experiencing slow Internet response times. The situation continues to worsen.

Info Systems is aware of the problem and is attempting to identify the cause. There is no projected time for resolution of the problem.

At the present time you may or may not be able to access off-campus web pages.
————————-

Investigation revealed that there was an unusually high number of TCP packets being both sent to and being received from the Internet connection (i.e. nearly 3X normal rate both in and out). This info was obtained from discussions with an ISP engineer. Total bandwidth, however, was within reasonable range. The only way to debug further was by use of advanced diagnostic equipment which was not easily available at the time.

About 9pm performance returned to near normal conditions without explanation.