othelloRob
13th January 2009, 01:27
Due to an SNMP error being reported by the primary Management Card in one of our Cisco backbone switches, we will be removing the card from the unit and replacing it with a new one from our onsite spares.
We hope to determine from this if there is a fault in the specific piece of hardware, which will go in our test setup, or a more general issue with the switch IOS, which we will escalate to the manufacturers.
Installing a new Management Card will restart the switch so that it can re-learn all the MAC Codes (unique network card numbers) for all the connected devices.
This will mean for approximately 4 minutes at 1am, some servers within our network will not be able to see the outside world until the switches relearn the ports things are connected to.
This will affect some clients in the the 80.82.x.x and 80.76.x.x ranges...
Plesk (legacy) Shared Hosting
CPanel Shared Hosting
Virtuozzo (legacy) VPS
Some Colo and Dedicated Servers
Other IP ranges are not affected, so Downstream Transit clients, Xen VPS, HSphere/HELM/DirectAdmin Shared Hosting, VoIP systems and the majority of Cpanel Reseller services will not be affected.
Although the total reload time on the Cisco switches is ~7 minutes, services start to return after the initial self-tests (90 seconds) - any incoming active connections will drop, and then re-establish - HTTP, FTP, SMTP/POP3/IMAP are all "self-repairing" in that respect, but SSH and some other protocols may need to be manually re-connected from the client end.
We are sorry for the very short notice, but it is essential as without SNMP Polling/Checking working to the switch fabric, we are unable to monitor the state of servers and services for our alerting systems - so a short outage for Hardware replacement is preferable to not being able to automatically page the on-call engineers for service related issues.
Rob
We hope to determine from this if there is a fault in the specific piece of hardware, which will go in our test setup, or a more general issue with the switch IOS, which we will escalate to the manufacturers.
Installing a new Management Card will restart the switch so that it can re-learn all the MAC Codes (unique network card numbers) for all the connected devices.
This will mean for approximately 4 minutes at 1am, some servers within our network will not be able to see the outside world until the switches relearn the ports things are connected to.
This will affect some clients in the the 80.82.x.x and 80.76.x.x ranges...
Plesk (legacy) Shared Hosting
CPanel Shared Hosting
Virtuozzo (legacy) VPS
Some Colo and Dedicated Servers
Other IP ranges are not affected, so Downstream Transit clients, Xen VPS, HSphere/HELM/DirectAdmin Shared Hosting, VoIP systems and the majority of Cpanel Reseller services will not be affected.
Although the total reload time on the Cisco switches is ~7 minutes, services start to return after the initial self-tests (90 seconds) - any incoming active connections will drop, and then re-establish - HTTP, FTP, SMTP/POP3/IMAP are all "self-repairing" in that respect, but SSH and some other protocols may need to be manually re-connected from the client end.
We are sorry for the very short notice, but it is essential as without SNMP Polling/Checking working to the switch fabric, we are unable to monitor the state of servers and services for our alerting systems - so a short outage for Hardware replacement is preferable to not being able to automatically page the on-call engineers for service related issues.
Rob