PDA

View Full Version : Maintenance: Switch Restart


othelloRob
13th January 2009, 01:27
Due to an SNMP error being reported by the primary Management Card in one of our Cisco backbone switches, we will be removing the card from the unit and replacing it with a new one from our onsite spares.

We hope to determine from this if there is a fault in the specific piece of hardware, which will go in our test setup, or a more general issue with the switch IOS, which we will escalate to the manufacturers.

Installing a new Management Card will restart the switch so that it can re-learn all the MAC Codes (unique network card numbers) for all the connected devices.

This will mean for approximately 4 minutes at 1am, some servers within our network will not be able to see the outside world until the switches relearn the ports things are connected to.

This will affect some clients in the the 80.82.x.x and 80.76.x.x ranges...

Plesk (legacy) Shared Hosting
CPanel Shared Hosting
Virtuozzo (legacy) VPS
Some Colo and Dedicated Servers

Other IP ranges are not affected, so Downstream Transit clients, Xen VPS, HSphere/HELM/DirectAdmin Shared Hosting, VoIP systems and the majority of Cpanel Reseller services will not be affected.

Although the total reload time on the Cisco switches is ~7 minutes, services start to return after the initial self-tests (90 seconds) - any incoming active connections will drop, and then re-establish - HTTP, FTP, SMTP/POP3/IMAP are all "self-repairing" in that respect, but SSH and some other protocols may need to be manually re-connected from the client end.

We are sorry for the very short notice, but it is essential as without SNMP Polling/Checking working to the switch fabric, we are unable to monitor the state of servers and services for our alerting systems - so a short outage for Hardware replacement is preferable to not being able to automatically page the on-call engineers for service related issues.

Rob

Roger
13th January 2009, 13:18
Thanks for the info and I appreciate the logic in what you are doing. Do I take it that you posted only a few minutes before the switch restart, so this is all water under the bridge? Or is this something that will happen overnight tonight (ie at 01:00 on Wednesday 14 January)?

othelloRob
13th January 2009, 13:27
It was done between 1.00am and 1.04am today (Tuesday 13/January) - takes a few minutes to undo all the screws and slot the card out, being careful not to snag the cables

Switch restart was logged at 1.06am and everything was confirmed as online and accessible from our external monitoring systems at 1.15am.