Norfolk Southern system-wide issue cancels VRE trains

Norfolk Southern system-wide issue cancels VRE trains today. No word on what the “system-wide issue” is.

Heard that NS is having ‘computer issues’. Which computers and where are unknown.

Recall about 20 years ago when CSX got attacked with a computer virus that ended up crashing the CADS computers - took about 72 hours to resolve all issues and get back to norma.

I am unable to hit https://www.nscorp.com/

https://nscorp.mediaroom.com/2023-08-28-System-outage-update

ATLANTA, Aug. 28, 2023 /PRNewswire/ – Norfolk Southern Corporation (NYSE: NSC) provided an update Monday on a technology outage that impacted rail operations:

This morning, Norfolk Southern experienced a hardware-related technology outage that impacted rail operations. At this time, we have no indication that this was a cybersecurity incident. Our teams worked throughout the day and successfully restored all systems at 7:00 p.m. ET. We are safely bringing our rail network back online. Throughout this, we have been in contact with our customers and will work with them on updated timing for their shipments. We expect the impact to our operations to last at least a couple of weeks.

According to this source it’s a PTC system problem.

https://railfan.com/norfolk-southern-snarled-by-positive-train-control-outage/

More specifically, Railway Age said it was NS access to the PTC network, perhaps from their side as the supposed computer outages seem to have been at the same general time. NS said it was fixed by 7pm yesterday (the Web site linked in the previous post was fully live by 10:30pm Central at my location).

The ‘several weeks’ was for the delays and problems from the outage to ‘work themselves out’, not additional repair or programming time.

Unintended consequences of putting an untested/untried system in place back at the beginning of PTC? Next??

(Insert Chad Thomas popcorn emoji here.)

I don’t know how NS performed their PTC installation.

CSX installed theirs incrementally a few subdivisions at a time with a big ‘Help Desk’ cadre to respond to issues as that happened and those issues would be resolved before installation was done on the next group of subdivisions. Many Dispatcher Desks operate territories that have both PTC and non-PTC subdivisions.

I find it ironic that NS’s spokesperson is named Conner Spielmaker. Making his spiel heard around the industry.

I find it interesting that NS says they expect full recovery in the coming weeks (!). Granted it was evidently a system wide outage - for half a day apparently - but the repurcussions, given their time estimate, hint loudly at very poor recovery ability. And bad choice of “leading tech provider”. You would think a rail system would have some backup capability as opposed to holding everything (presumably). Just more of the recent shut down mentality malaise that affects a floundering republic.

Perhaps their system got hacked…

Apparently that’s happened to a couple of area hospitals lately.

One has to understand how dependent companies are today on computers and their applications for EVERY aspect of the company’s operation - at all levels from the field to Board Room decisions.

Computers have been designed to replace people and the job functions they once performed and the manual paper systems that these people worked with. When the computer/application comes the people and their manual systems go - never to be replaced.

[quote user=“tree68”]

Perhaps their system got hacked…

Apparently that’s happened to a couple of area hospitals lately.

[/quote

NS put out a statement saying an IT vendor’s system had a problem and it affected NS.

I’ve heard that UP had some issues about the same time. It must not have been too bad. I’ve been on vacation, but haven’t heard much discussion about any problems from coworkers.

Jeff

The stuff that has been hitting hospitals has been ransomware attacks, pay us and get your data back. However, some have been hacked and had various forms of data stolen. Locally Johns Hopkins has been hacked - fortunately just BEFORE I got referred into their system. My data is on too many medical information systems!

Now we are learning it was a software problem. Evidently when a piece of hardware was installed the existing software crashed. Can understand why NS thought it was hardware that was installed. So, as it appears that the software did not like the new piece of hardware? That hardware probably passed all the tests before being used and may have had other installations that worked fine? The full explanation may take months??

Read all about it in the computer geek pubs.

Way back in the mid-1970s I worked in the TOPS (Computer) Control Center OC) for SP. On one swing shift our system (computer) terminal monitor printer reported a station with an error code of “SNO”

This code was not in our documentation. SP’s Communication Data Control team had no alarms or lights to indicate a problem. Calling the yard with the terminal (an IBM 1050) didn’t find a problem. We sent a test message to the machine and it went through. Everything was working and there was nothing we could do. We shrugged our shoulders and logged the even in our daily log and finished our shift.

Came to work the next day to be greeted by two of SP’s System Programmers. While writing the program they identified all the conditions that could happen and assigned a code to each one. Then, just in case something COULD happen for which they had not coded a code…they assigned “SNO” for Should Not Occur.

Fortunately, the detail logs kept by the TOPS system -did- list more detail and the programmers were able to add the situation, and an additional code, to cover future events.

Back ‘in the olden days’ B&O installed a IBM 370 computer to operate Chicago Terminal. The computer was connected to a number of ACI scanners to read the three foot or so high reflective bar codes that were applied to cars in the 1970’s as the AAR’s first attempt at Automatic Car Identification. Around 1978 or 79 that ‘experiment’ was stopped as the bar codes would get too dirty to be readable or get burned off the cars from thaw sheds or loading hot steel or similar high temperature into cars. After the AAR ended the ACI requirements, the computer was patched to remove the ACI scanners from the computer.

About 1984 or 85, Chicago personnel undertook a routine maintenance shutdo

When I worked in an Army installation data processing center, our resident IBM tech told of a time he went on vacation.

It seems that IBM would put several functional areas on the cards for their computers. In slot one, section A would get used. In slot two, section B got used, etc.

It was not uncommon to put a card with a non-functional section B in slot one, as that slot did not use section B. And a given slot may use several sections.

Multiply that times numerous cards and you may see where I’m going with this.

The regular tech knew which cards had which section non-functional, so would not put a car with a bad section A in slot one, etc.

The replacement tech didn’t know which cards had which sections OOS.

When the regular tech returned, it took him a while to sort everything out and get the subject computer back on line…

During a routine maintenance procedure performed by a vendor, the vendor’s software created an error that caused the company’s primary and recovery data-storage systems to become unresponsive, which then affected its core operational systems. Norfolk Southern didn’t name the vendor but described it as a “leading global technology provider.”

https://www.wric.com/business/press-releases/cision/20230901PH98837/norfolk-southern-provides-technology-outage-update/

It was more than PTC, but remember PTC sits on top of the heap. Talks to train dispatching system, signal system, Yard inventory system (consist for braking algorithm), Crew call (sign in for train crew), and more. Any one of these goes dark, PTC goes dark.

Why things didn’t fail over to backups is a good question.

When computerized applications are implemented in the areas you have mentioned - there are no longer enough people available, knowledgeable enough, or having paper systems in place to be a back up to a failed computer and its applications. In today’s business/railroad world - when the computer stops, so does everything else.