769
Views
2
CrossRef citations to date
0
Altmetric
Article

Designing fault-tolerant real-time computer systems with diversified bus architecture for nuclear power plants

, &
Pages 521-525 | Received 19 Mar 2013, Accepted 19 Dec 2013, Published online: 20 Jan 2014

Abstract

Fault-tolerant real-time computer (FT-RTC) systems are widely used to perform safe operation of nuclear power plants (NPP) and safe shutdown in the event of any untoward situation. Design requirements for such systems need high reliability, availability, computational ability for measurement via sensors, control action via actuators, data communication and human interface via keyboard or display. All these attributes of FT-RTC systems are required to be implemented using best known methods such as redundant system design using diversified bus architecture to avoid common cause failure, fail-safe design to avoid unsafe failure and diagnostic features to validate system operation. In this context, the system designer must select efficient as well as highly reliable diversified bus architecture in order to realize fault-tolerant system design. This paper presents a comparative study between CompactPCI bus and Versa Module Eurocard (VME) bus architecture for designing FT-RTC systems with switch over logic system (SOLS) for NPP.

1. Introduction

In nuclear power plants (NPP), fault-tolerant computer systems are being used increasingly both in safety critical applications, such as reactor protection and actuation of safety systems as well as safety-related applications, such as process control and monitoring systems. The practice of design and implementation of fault-tolerant real-time computer (FT-RTC) systems has matured over the last several years. With current state of technology, it is possible to develop FT-RTC systems for carrying out more stringent requirements important to safety of NPP. The dependability on computer systems in NPP has increased many folds and therefore should be highly reliable. Since its introduction in 1995, CompactPCI (CPCI) bus architecture has established in industrial control application and has become the fastest growing industrial bus architecture adopted by the major equipment manufacturers. Similarly, Versa Module Eurocard (VME) bus-based systems have already proven their reliability in various market segments. Some of the key features required while designing RTC systems are outstanding computing performance in the most severe environments like high temperature, humidity, saline atmosphere, electromagnetic interference (EMI), seismic activity, vibration and high dose gamma radiation during reactor operation and maintenance. For system designers involved with safety critical applications, the harsh reality is that there is no margin for error, no allowance for small changes and no time for addressing unanticipated problems. From initial requirements through the design concept, component selection, implementation, testing, validation, environmental qualification and EMI/electromagnetic compatibilty (EMI/EMC) compliances, developing FT-RTC systems for safety critical applications in NPP is exactly culminating in life or death. Therefore a thoughtful selection of right system bus architecture is more important in the incipient stage of designing FT-RTC system architecture.

2. Proposed fault-tolerant system architecture

To handle large number of geographically distributed field signals in NPP with high reliability and availability, two different backplane bus-based RTC systems with switch over logic system (SOLS) is proposed in this paper that prevents system failure due to commonality between RTC systems as shown in . The use of diversified redundant system components or subsystems like bus-backplane, central processing unit (CPU) and I/O card, power supply and software development tool is an important design strategy to mitigate risk in safety-related applications in NPP. Safety-related computer-based systems are those systems that play a complementary role to the safety critical systems in the achievement of safety of NPP. The reliable operation of safety-related computer-based systems may avoid the need to initiate safety critical action that prevents postulated initiating events.

Figure 1. Fault-tolerant system architecture with diversified RTC systems and SOLS.

Figure 1. Fault-tolerant system architecture with diversified RTC systems and SOLS.

Redundant hardware reduces risk by multiplying the mean-time-between-failure value. That can simply be execution of same application software on two different hardware platforms like VME and CPCI. Using diversified hardware platform in “1-out-of-2” system (1oo2) architecture, the redundant capability is kept in the “hot standby” state. Typical application software includes, reading various process parameters from sensors, engineering unit conversion, limit checking, PID control, generating control output, sending data over Ethernet to upper level computer and running diagnostic features to monitor watchdog timer, memory, peripheral I/O card. The SOLS is a highly reliable relay-based non-processor and non-software system that checks the healthiness of both RTC systems. The term “healthiness” indicates system is working normal and all the diagnostic features are tested and satisfied as per requirements. Signal from field sensors (analog/digital) are duplicated and connected simultaneously to both VME and CPCI systems for processing and generating identical output signals since the logic of application software running on both the systems is similar. These output signals are used for control, alarm annunciation, interlock and lamp indication purposes and routed through SOLS to final control element. SOLS plays an important role in checking healthiness of both VME and CPCI systems and sending healthy system's output to the plant. It receives both health signals and output signals from diversified RTC systems. Health signals are generated from each RTC system after running diagnostic features available on each system including CPU and I/O cards. Depending on system healthiness, it routes healthy system's output to the plant. The state transition diagram of SOLS shown in represents two online status of both RTC systems. During RTC failure, complete switch over takes place in few milliseconds from present online system to hot standby system thereby producing bumpless output to the final control element in the field. S1 Online corresponds to VME system and S2 Online corresponds to CPCI system enclosed in circles.

Figure 2. State transition diagram of SOLS.

Figure 2. State transition diagram of SOLS.

Online status indicates that the healthy system's output is being routed to the plant. Depending on the RTC system's healthiness, SOLS changes its states as per the logic incorporated in the system. On system power-ON, SOLS starts from S1 Online state. Similarly, on system power off or failure of both systems, SOLS retains its last state and sends fail-safe output to the plant. A directed line connecting the circles indicates the transition between the states. The directed lines are labeled with an external cause of transition. The transition from one state to other state takes place only when the online system fails and the other system is in healthy condition. On manual selection, either system's output can be routed to the plant during maintenance. This type of system architecture helps in achieving redundancy with high availability by avoiding common cause failure. The ideal bus architecture for RTC system should also provide the required features in realizing the above system architecture. Two standards have emerged as the dominant choices for such a system bus, i.e. VME and CPCI. Both have similar physical and performance specifications along with support from numerous manufacturers that offer a range of processors, memory and peripherals. CPCI and VMEbus technologies can be used synergistically to achieve hardware/software redundancy to avoid common cause failure. Outwardly, they appear to be enough alike to compete for the same area of application, yet significant differences exist between these two bus specifications, fulfilling the diversified nature of their functionality suitable for our application.

3. Technological difference between CPCI and VMEbus architecture to achieve diversification

3.1. Bus origin

VMEbus was introduced by Motorola in 1981 and was designed as the I/O bus for 68,000 CPU from Motorola. VMEbus has commanded almost half of the embedded computer boards in the market.

CPCI bus grew out of the personal computer's internal peripheral component interconnect (PCI) bus. Because of its computer origins, the original PCI bus used edge connectors and had limited room for user defined I/O lines. In 1994, PCI Industrial Computer Manufacturers Group (PICMG) developed specifications that adapted PCI technology for use in industrial applications. PICMG first used personal computer's PCI/industry standard architecture (PCI/ISA) form factor (PICMG 1.x), then used PICMG 2.x which includes the definition of CompactPCI for Eurocard-based, rack mount applications [Citation1].

3.2. Functional characteristics

CPCI was created to meld the low-cost components of the personal computer bus with the high reliability of passive-backplane systems such as VME. Both CPCI and VMEbus are processor-independent, high-performance buses in Eurocard physical form factor for the interconnection of high-bandwidth peripherals and intelligent controllers with the most powerful CPUs. Concurrent operations on the processor/memory bus and local bus can be realized. CPCI supports virtually all the processors available in the market, including Pentium, PowerPC, Sparc and Alpha [Citation2]; thus, it guarantees the same software support as workstation implementations and reduces the software development time. Since a large number of functions are available from the board-level manufacturer, any specific board can be designed rapidly according to the CPCI specification, including: communication, digital signal processing (DSP) and industrial I/O. Due to the broad support from the industry, the hardware design task is shortened using the CPCI bus. However, today the fight is focused on software support, as applications are becoming more and more complex. CPCI boards have several features like high-density 2-mm pin and socket connectors, excellent vibration and shock protection characteristics, shield for EMI/radio frequency interference (EMI/RFI) protection, I/O connections on front or rear of backplane module and staged power pins for hot swapping that make them ideal for development of embedded systems for NPP. In CPCI I/O bus, one host interacts with multiple slaves whereas VMEbus supports peer multiprocessing. Technology distinctions certainly exist between VME and CPCI bus [Citation1]. However, technical considerations are only factors in designing diversified RTC systems in fault-tolerant system design.

3.3. Synchronous vs. asynchronous bus

The technical features that currently favor these buses are speed and passiveness of backplane. The data rates for both VME and CPCI are 60 and 100 MB/s, respectively, that suit most of the safety applications in NPP. Furthermore, CPCI provides a synchronous interface, whereas VMEbus provides an asynchronous interface to its peripherals.

3.4. Bus bandwidth

Bus bandwidth plays a major role in designing high-speed RTC system for distributed data acquisition and control applications in NPP. A 64-bit VMEbus can sustain 80 MB/s theoretically. In addition, the total bandwidth in a VME system is shared between the devices, whereas CPCI using 64-bit bus transfers data at the rate 264 MB/s [Citation3]. VME sends data in single cycle or burst mode, whereas CPCI sends data in burst optimized mode.

3.5. Reflected wave vs. incident wave switching

CPCI takes a radically different approach to bus termination. It eliminates the termination networks altogether and actually takes advantage of the reflected wave front. The bus driver is designed to drive the line about “halfway”, as the wave front propagates to the end of the line. When the wave front reaches the end of the bus, it is reflected back with double the strength and the receiver switches as the wave front passes them second time in the other direction. VME takes the approach of “incident wave” thereby needing proper termination of all bus lines to prevent unwanted reflection. CPCI is based on complementary metal oxide semiconductor (CMOS), which means that steady-state direct currents are minimal, whereas VME is based on transistor–transistor logic (TTL) and thereby consumes more power [Citation1]. CPCI bus should maintain the minimum clock cycle time to 30 ns at 33 MHz. The maximum clock skews on the clock when measured on the clock pin of two CPCI components is 2 ns. The minimum clock skew rate is 1 V/ns and maximum skew rate is 4 V/ns. The clock frequency can be changed as long as the clock edges remain clean and the minimum clock period for clock high time and clock low time is not violated.

3.6. Bus arbitration

When CPCI bus master requires using bus, it must request the bus from the CPCI bus arbiter. The CPCI specification defines the timing of the request and grants handshaking, but not the procedure used to determine the winner of a competition. The algorithm used by CPCI bus arbiter to decide which of the requesting bus masters will be granted use of the CPCI bus is system specific, whereas VMEbus with single master, asks for the bus, gets it and keeps it. Two or more masters in VME can request the bus at the same time on the same request level, and then proximity to slot 1 is used to determine who will get the bus [Citation4]. Arbitration is done by the system controller that resides in slot 1.

3.7. Rear I/O modules

CPCI offers rear I/O capability, which allows connection of all the cables on the back of the chassis. The cables include: communication signals, sensors and actuators. Since swapping an adapter no longer requires disconnecting all the signals, this reduces the cost of wiring and maintenance time. This feature is also available in some versions of the 6U VMEbus standard. However, CPCI offers more free pins in terms of I/O connections in 6U format compared to VMEbus [Citation5].

3.8. Hot-pluggability and expandability

A key performance element of RTC systems is their response to hot-pluggability. The term “hot-pluggability” is the ability to remove faulty peripheral I/O card and replacing with a good one in a running system. It is more appropriate in safety critical system. This mechanism allows dynamic configuration of the system during operation and helps in reducing system downtime and cost while the other standby system is still in operation. In NPP, this feature becomes handy not only for online system but also for systems reporting error, malfunctions or meant for maintenance. CPCI system has four physical interrupt request (IRQ) lines and assigns the IRQ numbers during hot-plug initialization process and prevents interrupt conflicts. Whereas VME64x also supports a hot-plug operation and the interrupt structure is simpler than CPCI's since each of the interrupting devices has a vector number that the system designer pre-assigns. This vector number is unique to the interrupting device, eliminating the chance of having conflicts. The weakness of VME64x system is that system configuration, resource assignment and avoiding conflict is the developer's responsibility and design documents must be thorough and accurate to support future addition or change to the system in later stage of development.

Of the features that are commonly required in building FT-RTC system architecture for applications in nuclear industry is expandability. This mechanism allows dynamic configuration of the system during operation. For example, it allows the addition or removal of an I/O card while the system is still running. This is particularly important for applications, for which downtime can represent a significant cost. The maximum number of slots in a 19′′ bin supported by VMEbus is 21, whereas CPCI supports 8 that can be further expanded by using active bridging technology [Citation6].

3.9. Multiprocessing

The two buses differ in their behaviors in multiprocessing. To achieve multiprocessing, the computer boards in a system must be able to communicate with each other and with the other cards in the system. The backplane provides a convenient communications path, but to use it each of the computer boards should be able to become the system master or bus master. Both buses support this approach. In VME, bus arbitration is done by hardware. The bus request lines pass through the backplane while the bus grant is daisy-chained. When a computer board gets the bus, it can immediately start working with any other board on the bus and access any address in the VME address space. Because all the addressing is predefined by the developer and the application software knows the addresses for all the different boards and/or functions that the computer board may need to access. The software overhead is virtually nil [Citation7,8]. After thorough verification and validation, application software can be allowed to run on deployed platform.

Whereas CPCI enables each board to communicate with each other, but because of the plug and play mechanism, the addressing and interrupt structures are dynamic. The only board that knows the addressing and interrupts assignment is the system master. Other boards that may become bus masters are peripheral masters. When a peripheral master needs to communicate with another peripheral master, it must not only arbitrate for control of the bus, but also get the information about the other peripheral masters from the system master before it can start communicating [Citation9,10]. This adds software overhead to make multiprocessing scheme on CPCI bus.

3.10. High availability

While multiprocessing is little awkward on CPCI, the bus handles high availability computing performance with ease. In order to get such high-availability performance, the system needs to have mechanisms for detecting malfunction, communicating the malfunction to the right system components, allowing hot-swap of system cards and offering redundant functions. All of the features needed to implement high availability are available for CPCI. Nuclear industry has adopted and driven high availability because such a feature is a major requirement for safe and reliable operation of NPP [Citation1,Citation3,Citation6,Citation10–14]. VME32 does not support such feature. Other features which are not available in VME32, but are available in CPCI are clock sync across I/O and redundant function.

3.11. Software support

The CPCI architecture allows concurrent development of the hardware platform and the application code. Because of the electrical compatibility with PCI bus, the rugged embedded platform is identical on the software level to the general purpose computer development platform. Even if the hardware platform is not ready, software engineers can still develop and debug their application code using general purpose computer. Whereas deploying Windows-based application on a VMEbus computer requires a great deal of adaptation.

4. Conclusion

Designing FT-RTC systems with diversified bus architecture for NPP requires great deal of design consideration starting from bus selection, system architecture and proving to have achieved reliability, availability and fail-safe requirements for NPP. In our proposed system architecture with two diversified RTC systems using CPCI and VMEbus proves to be potential candidate to realize fault-tolerant system architecture using SOLS for NPP in order to avoid common cause failure by adapting two different approaches for system design and software development for various safety-related applications in NPP. At present, the existing fault-tolerant real-time system architecture realized using twin VMEbus-based RTC system with SOLS is being re-engineered by replacing one VMEbus-based RTC system with CPCI bus-based RTC system and further improvement in terms of system availability in the presence of common cause failure is being studied.

Acknowledgements

This project has been taken up as part of I&C system development for fast breeder reactor (FBR - 1&2) at Kalpakkam.

References

  • Peterson W. VME bus interface handbook. 4th ed. Fountain Hills (AZ): VFEA International Trade Association; 1997.
  • Abbott D. PCI bus demystified. Eagle Rock (VA): LLH Technology Publishing; 2000.
  • PICMG 2.0 D3.0 CompactPCI Specification. Wakefield (MA): PICMG; 1999.
  • Alderman R. The state of VMEbus and beyond. VMEbus Systems. Fountain Hills (AZ): VITA Technology; 2001.
  • Schmitz M, MEN Mikro Elektronik, Nuiten R, 3M Deutschland GmbH. New CompactPCI plus standards enhance CompactPCI compatibility with high-speed serial data transfer capabilities. Dominion Electronics [Internet]. 2009 Oct 29. Available from: http://www.dominion.net.au/news.php?id=216
  • Mollman R. CompactPCI – a growing alternative to COTS VMEbus systems. 2007. Available from: http://www.rtcmagazine.com/articles/view/100781
  • Alicke F, Bartholdy F, Blozis S, Dehmelt F, Forstner P, Holland N, Huchzermeier J. Comparing bus solutions. (Application Report). Texas Instruments; 2000. Available from: http://polimage.polito.it/∼lavagno/esd/bus.pdf
  • Ye F, Kelly T. Criticality analysis for COTS software components. Paper presented at: Proceedings of the 22nd International Conference on Software Engineering, ICSE; 2000 June 4–11; Limerick, Ireland.
  • Trends in modular systems. Available from: www.picmgeu.org
  • IEEE Std. 1014-1987 – a Versatile Backplane Bus: VMEbus. Available from: http://standards.ieee.org/findstds/standard/1014-1987.html
  • Swaminathan P. Design aspects of safety critical instrumentation of nuclear installations. Paper presented at: Conference on Materials and Technologies for Fuel cycle, SERC; 2003 Dec 15–16; Chennai, India.
  • Design Safety Guide on Safety Critical System (AERB/SG -D10).
  • Design Safety Guide on Computer Based System (AERB/SG -D25).
  • Berding D. Innovations in backplane design. VMEbus Systems. Fountain Hills (AZ): VITA Technology; 1999.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.