The JOURNAL
Web Exclusive
Troubleshoot Your Industrial Network
Learn how problems with your cabling infrastructure, electrical signaling and network can impact your industrial Ethernet.
By Fluke Networks
Ethernet is rapidly spreading across a range of industrial environments. This standardized, plug-and-play technology is easy to install and works 99% of the time. But when it doesn't work, it's not so easy to understand why. With more devices being added to the network and fewer people on the floor, most production and maintenance engineers are under a lot of pressure — especially since all plants want to run faster with minimal downtime.
A diagram of the parts of the network and the signal flow paths between various devices can help you visualize the problem and save valuable troubleshooting time.Causes of LAN Problems
Because industrial Ethernet is the result of the trend in equipment moving from analog to digital signaling, network communication now has three elements:
- The Physical Layer: This is the cable infrastructure that transmits signals. It can be copper, fiber or wireless.
- The Network Layer: This is the Ethernet and Internet Protocol (IP) that controls the data communication.
- Electrical Signaling: Digital signals from equipment pass through switches and Virtual LANs (VLAN) to convey the data.
To establish error-free data communication, troubleshooting must establish that the cabling is good, digital signaling is consistent, and the communication protocol for the equipment is operating correctly.
Best Practices for Successful LAN Troubleshooting
Troubleshooting downtime issues can be time-intensive and costly. Following a standardized troubleshooting procedure will help save you countless hours. The basic troubleshooting sequence consists of these steps:
- Document your network. A diagram of the parts of the network and signal flow paths between various devices can help you visualize the problem and save valuable time. Theory and practical experience will determine which test to perform first. Consider experience with the same or similar equipment and related symptoms, and the probability of what's likely to fail or what suffers from repeated failures. A good practice is to start with the physical layer and work your way up the protocol stack.
- Collect all available information and analyze the symptoms. Ask yourself if you understand the symptoms, and verify or recreate the reported problem if possible.
- Localize and isolate the problems. It's best to subdivide or isolate the problems into a smaller functional section — removing the largest convenient section first. As you go step-by-step, eliminate culprits.
- If you can ping or simulate signals and get a response, you know there's power.
- If the device is properly connected, then you can proceed to the control or management device.
- Look at the programmable logic controller (PLC) for error codes. You might discover, for example, that it received several bad packets of information that caused it to turn off a particular port to protect the rest of the network.
Start at any convenient spot near the center point of the problem and divide the problem in half. Continue halving the problem until you've isolated it to the smallest possible section. - Document what you did. Documentation can be used for future reference to help you troubleshoot the same or similar problems. You also can use the documentation to prepare reports on common network problems for management and/or departments, or to train new hires or members of the network-support team.
With this basic regimen, let’s look at the specific troubleshooting issues you’re likely to encounter in each of three major areas of the network, beginning with cabling infrastructure.
Troubleshooting Cabling Issues
Poor cable terminations or cable faults: If a cable is suddenly cut, the system might simply stop. Or, with faulty cross-connection through a junction, for example, you might start getting CRC or FCS errors. These types of errors indicate incomplete signals being transmitted. The PLC has detected corruption in a frame that contains critical information and throws the frame away.
Now the PLC is still waiting for that information — glue temperature for the bottle labels, for example — so it asks again. If the re-transmission isn’t successful because of the faulty cable, either the system aborts, or the line produces bottles without labels because the glue wasn’t the correct temperature to adhere.
Insufficient cable rating: In a heavy industrial environment such as mining, for example, cable may be at risk from temperature extremes, water or being run over. Without sufficient protection, the insulation might soak through, be cut or short circuit. Standard office Ethernet cabling doesn't stand a chance under these conditions.
Make sure both cabling and connectors have appropriate IP ratings to withstand particular hazards of your environment. In addition, use protective cabling in any areas with repetitive motion or an area that a link might be stepped on or crushed.
Many cable manufacturers offer specific cabling with boots and/or advanced engineered material cable jackets specifically for harsh environments.
Issue with connectors: Both the cable and all network components on a manufacturing floor must be different from those used in office environments and data centers. In areas with extreme temperature swings, high humidity or chemical or high-pressure washes, consider robust connectors and components such as the M12 connector.
The M12 is a popular 4-pin connector with a rubberized boot in diameters ranging from 6.0 mm to 6.7 mm. It has been used in industrial and production environments for decades to transmit power and signals to optical sensors. In newer installations, it's also being used for bidirectional communications such as Cat 5e or 100 MHz. If you're using RJ-45, the IP67 rating is reached by using sealed connectors.
Note that any external machinery noise issues when deciding between UTP (unshielded), STP (shielded) or fiber-optic components.
Contaminated fiber optic cable end faces: Microscopic fiber-optic cable is particularly susceptible to contaminants clogging the ends. When light can't pass through the signal isn't fully transmitted, and communication breaks down.
One international transportation system, for example, discovered that fine dust from routine wear on the rubber train tires was clogging the fiber optic communication cable. Trains weren't reliably receiving information about whether or not another was on the track ahead; this activated safety features that automatically stopped the trains. Unhappy commuters were kept waiting because of dust.
Troubleshooting Signaling Issues
Once you've eliminated cabling problems, you can move to the next major area of concern: electrical signaling issues. Here are some of the more troublesome problems.
Device installation errors: A common and difficult-to-troubleshoot issue is misconfigured devices. The original installer must select the connection speed between switch and sensor. They also can choose full- or half-duplex communication. But what if the sensor is configured for half duplex, and the switch on the other end is set to full duplex in error?
If the switch has something to say, it says it. Now what happens when the poor sensor tries to talk? "I've detected the speed of liquid." While the sensor transmits, the switch transmits at the same time. The sensor reacts: "Uh oh — somebody else transmitted. I need to stop, wait a moment and listen again." Now that the switch has stopped talking, the sensor re-sends the message, and again, coincidentally, the switch transmits at the same time.
Duplex mismatch error is very difficult to detect. When you troubleshoot, you'll find that the cabling is fine. Both switch and sensor are connected. There is little traffic. The only clue is that delay or latency will be extremely variable. As a result, the network appears "slow." It's helpful to have a tester that can tell you the configuration of each device so you can find a mismatch like this.
Device failures caused by a harsh electrical environment: Devices (Ethernet or proprietary) connected in an industrial network translate control-system variables and commands into digital information. These electrical signals in the form of digital ones and zeros are then transmitted across copper cables, network switches and repeaters to and from the control room. These electrical signals aren't impervious to the harsh electrical and climatic environment found in industrial plants and factories.
Several factors can influence or degrade electrical signaling, affecting network performance. High-energy sources such as lighting systems and heavy-duty machinery create electrical interference. Noise, static discharge and transients can cause network devices to reset or fail.
Monitoring the network, one segment at a time can capture these events and help you isolate their source.
Improper cable installation: Issues such as excessive cable lengths, using the incorrect grade of cabling, loose or corroded connections, or simply too many or incorrect network terminations can attenuate and distort electrical signals. This causes excessive retransmissions or interruptions. Other cable installation issues that affect electrical signals include:
- Improper ground connections.
- Tight bend radius.
- Worn or broken cable shielding or connectors.
Heavy equipment operation: Operating near equipment that radiates electromagnetic waves, such as motors, induction furnaces and lighting systems, can distort and interrupt the electrical signaling. Any device or segment in an automation or process control system with electrical signal quality issues can affect overall performance of the Ethernet network.
High-voltage electrical loads: Transients, surges and harmonics are the most common electrical phenomena found on high-voltage electrical feeder and branch circuits with breakers and nonlinear loads. These and other phenomena, such as static discharge from rotating machinery, can cause disturbances to electrical signals. Ground-loop currents also are notorious for creating erroneous equipment failures that are hard to diagnose.
With attention to proper installation practices, minimizing sources of electrical disturbances and using appropriately IP-rated devices, you can mitigate many of these problems.
Troubleshooting Tips for Electrical Signal Integrity
As for all three areas of the network, use standard troubleshooting protocol when tracking down signaling problems.
- Always baseline network signaling by capturing waveforms and measurements at commissioning. Use these as a reference point, looking for any significant changes.
- Start at one side of the trunk and make measurements along the network cable at every junction or connection point.
- Document key parameters at each point.
- Look for significant changes.
- The further away from the power source, the more attenuation you should expect to see.
- Isolate sections one at a time, testing to see if signaling returns or improves.
- Isolate suspicious devices by removing them one at a time, testing to see if signaling returns or improves.
- Look for these factors:
- Changing DC offsets or waveform fluctuations, indicating grounding problems.
- Excessive high impulses or transients, evidence of HV equipment radiating energy onto the cabling.
- Correlations between changes and external events like equipment turning on or off.
The Open Systems Interconnection (OSI) model defines seven layers of network protocols and equipment communications and interoperation. You can perform basic tests on the first three layers to troubleshoot network issues.Troubleshooting Network Issues
The Open System Interconnection (OSI) model, a technology standard maintained by the International Standards Organization (ISO), defines seven layers of network protocols and equipment communications and interoperation. For day-to-day testing of layers 1 through 3 (1 is Physical, 2 is Data Link, 3 is Network), you can perform a series of basic tests.
- First, test at the outlet where you connect your device. Is the device connected to a switch on the far end? Is the data switch open on the far end? If it's open, the tester will tell you at what length the cable is open. Either it's not connected or it's cut.
- If the device is connected to the switch, how is the switch configured? Is it set at 10 MB or 100 MB, half duplex or full duplex?
- Now measure utilization in the collision domain where it's attached.
Connect to the network and request an IP address to verify that the DHCP server is connected to network. Ping tests ensure that the network is turned on and devices and servers are properly connected.
Proper VLANs and multicasting segmentation: Before switch features became complex and varied, a LAN described all of the PCs and other devices connected to hubs, bridges and switches on one port of a router. The router defined the end of what was called a broadcast domain. Everything attached to that router port was within that broadcast domain, network or LAN (depending on which term was appropriate for the discussion). To change router ports was to change broadcast domains.
When switch features became more complex, it became possible to logically divide the switch into different broadcast domains. Instead of being physically separated from another broadcast domain by the interconnection of a different set of hubs, bridges and switches attached to a router port. That means it's now possible to logically separate them via the switch configuration. This logical separation is called a VLAN.
You might encounter cases of sensors in the network sending their information to multiple addresses instead of transmitting to the single, relevant address. A pressure sensor, for example, sends its measurements to bottling, labeling and capping, instead of sending only to the bottling machine. It broadcasts to all the machines: "here's the pressure." This causes unexpected network traffic.
Or if the bottling line is incorrectly segmented from regular network traffic, it may send bottling-machine pressure information at the same time as dozens of people in the corporate network are busily downloading a new network security fix, causing major ramifications on the production line. This could easily happen just by someone adding a port and connecting it wrong. But as a result of mixing the office and production "highways," you get unexpected variable traffic.
On the industrial side, traffic doesn't vary much, but there's much greater sensitivity to the delay or latency. When a sales manager is downloading his forecast file and it takes 20 seconds longer, he might not even notice. In the bottling line, the difference between 25 ms and 2,000 ms may result in bottles without labels.
Duplicate IP addresses and unexpected service behavior cause network havoc: These are problems that could adversely affect any LAN, but in an industrial network, the repercussions are more severe.
Each device operates with a unique IP address. IP addresses are used when end devices communicate point-to-point. Every PLC, switch and Ethernet-enabled sensor must have a unique IP address to operate normally. In an automation network, the communication between switch and PLC is restricted to only one area of control or task, called point-to-point, and each device is individually commanded to perform its task.
These IP addresses may be either dynamically assigned (DHCP), automatically assigned a fixed address (DHCP or BOOTP) or manually assigned (static).
If a duplicate IP address appears — either because a mix of dynamic and static addressing was used, or the static address was accidentally duplicated — then either the commands for one specific PLC will be sent to both PLCs using the same IP address, or one PLC will operate normally for periods of time while the other fails to work at all. In either case, the affected PLCs won't be able to operate reliably. This is a difficult problem to identify from symptom analysis.
In addition, a DHCP server will attempt to allow a PLC to keep using the same IP address forever. However, if one of many situations arises on the network, the DHCP server will tell the PLC to begin using a different address. When this happens, it's possible that the controller managing the PLC won't know about the address change, and the PLC will cease to operate because the controller is suddenly unable to talk to it.
A good network tester will reduce troubleshooting of this sort of problem from hours to minutes.
Margin analysis: You add one more sensor, maybe to measure temperature of the bottles so the labels will properly adhere. That adds traffic. How will the human-machine interface (HMI) to PLC perform? Maybe you've upgraded your user interface to that lovely touch screen. But that adds traffic, so when you push buttons, there's no (or slow) response.
The question you need to ask continually is: if my traffic patterns change, how much will that affect performance?
Nondeterministic network behavior: If, for example, the symptom is a significant variation in latency, you can measure throughput, delay or latency, and jitter. Compare these to baselines for network, and confirm that performance is in line with expectations.
If the symptom is excessive latency or delay, look at all the devices attached. What's the utilization level (or the number of cars on the highway)? Maybe you find excessive utilization of a collision domain. Look at the traffic. Do all these "cars" belong here? Find out where they're coming from and where they're going. Suddenly you're seeing "logging trucks" where you're expecting compact cars!
Now you need to find out why this unanticipated traffic is suddenly showing up on this highway and where it came from to track down the problem.
Measuring determinism means the ability to accurately understand the worst-case time-to-exchange information end-to-end, regardless of what other network traffic occurs. In troubleshooting nondeterministic behavior, consider throughput and latency or delay as well as variability and jitter. These measurements also must be made with traffic present or a real-time operation. Make these measurements across an individual switch or across a wide-area global network.
Network Uptime is Crucial to Profitability and Quality of Your Output
Networks aren't perfect. The ability to quickly and effectively troubleshoot is fundamental to your production process. One failure can trigger another, resulting in a significant loss of time and money. Now you know how to address many issues that could impact productivity in three major areas of Industrial Ethernet: the cabling infrastructure, electrical signaling and the network.
For more information, visit Fluke Networks at www.rockwellautomation.com/go/p-flukenetworks.