Saturday, September 24, 2011

Dynamic Analysis of Safety Critical Systems

From the desk of
Sandeep Patalay

 Background: The dynamic analysis of the safety critical software is an important phase of the Independent Verification and Validation of the system. The EN 50128 has detailed the methods that shall be used for this phase of the verification life cycle. This phase is so critical to the project output that it demands meticulous planning and organization. Here we discuss the dynamic analysis methods suggested by the CENELEC standards for SIL 4 software.

 Boundary Value Analysis

The aim of this method is to remove software errors occurring at parameter limits or boundaries. The input domain of the program is divided into a number of input classes. The tests should cover the boundaries and extremes of the classes. The tests check that the boundaries in the input domain of the specification coincide with those in the program. The use of the value zero, in a direct as well as in an indirect translation, is often error-prone and demands special attention:
  • Zero divisor;
  • Blank ASCII characters;
  • Empty stack or list element;
  • Null matrix;
  • Zero table entry.

Normally the boundaries for input have a direct correspondence to the boundaries for the output range. Test cases should be written to force the output to its limited values. Consider also, if it is possible to specify a test case which causes output to exceed the specification boundary values. If output is a sequence of data, for example a printed table, special attention should be paid to the first and the last elements and to lists containing none, 1 and 2 elements.

 Error Guessing

The aim of this method is to remove common programming errors. Testing experience and intuition combined with knowledge and curiosity about the system under test may add some uncategorised test cases to the designed test case set. Special values or combinations of values may be error-prone. Some interesting test cases may be derived from inspection checklists. It may also be considered whether the system is robust enough. Can the buttons be pushed on the front-panel too fast or too often? What happens if two buttons are pushed simultaneously?

Error Seeding

The aim of this method is to ascertain whether a set of test cases is adequate. Some known error types are inserted in the program, and the program is executed with the test cases under test conditions. If only some of the seeded errors are found, the test case set is not adequate. The ratio of found seeded errors to the total number of seeded errors is an estimate of the ratio of found real errors to total number errors. This gives a possibility of estimating the number of remaining errors and thereby the remaining test effort.. The detection of all the seeded errors may indicate either that the test case set is adequate, or that the seeded errors were too easy to find. The limitations of the method are that in order, to obtain any usable results, the error types as well as the seeding positions must reflect the statistical distribution of real errors.

Performance Modelling

The aim of the method is to ensure that the working capacity of the system is sufficient to meet the specified requirements. The requirements specification includes throughput and response requirements for specific functions, perhaps combined with constraints on the use of total system resources. The proposed system design is compared against the stated requirements by:

  • Defining a model of the system processes, and their interactions,
  • Identifying the use of resources by each process, for example, processor time, communications bandwidth, storage devices etc),
  • Identifying the distribution of demands placed upon the system under average and worst-case conditions,
  • Computing the mean and worst-case throughput and response times for the individual system functions.
For simple systems, an analytic solution may be possible whilst for more complex systems, some form of simulation is required to obtain accurate results. Before detailed modelling, a simpler ’resource budget’ check can be used which sums the resources
requirements of all the processes. If the requirements exceed designed system capacity, the design is infeasible. Even if the design passes this check, performance modelling may show that excessive delays and response times occur due to resource starvation. To avoid this situation engineers often design systems to use some fraction (e.g. 50 %) of the total resources so that the probability of resource starvation is reduced

Equivalence Classes and Input Partition Testing

The aim of this method is to test the software adequately using a minimum of test data. The test data is obtained by selecting the partitions of the input domain required to exercise the software. This testing strategy is based on the equivalence relation of the inputs, which determines a partition of the input domain.

Test cases are selected with the aim of covering all subsets of this partition. At least one test case is taken from each equivalence class. There are two basic possibilities for input partitioning which are:
  • Equivalence classes may be defined on the specification. The interpretation of the specification may be either input oriented, for example the values selected are treated in the same way or output oriented, for example the set of values leading to the same functional result; and
  • Equivalence classes may be defined on the internal structure of the program. In this case the equivalence class results are determined from static analysis of the program, for example the set of values leading to the same path being executed

Structure Based Testing

The aim of this method to apply tests which exercise certain subsets of the program structure. Based on an analysis of the program a set of input data is chosen such that a large fraction of selected program elements are exercised. The program elements exercised can vary depending upon the level of rigour required.

  • Statements: This is the least rigorous test since it is possible to execute all code statements without exercising both branches of a conditional statement.
  • Branches: Both sides of every branch should be checked. This may be impractical for some types of defensive code.
  • Compound Conditions: Every condition in a compound conditional branch (i.e. linked by AND/OR is exercised).
  • LCSAJ: A linear code sequence and jump is any linear sequence of code statements including conditional jumps terminated by a jump. Many potential sub-paths will be infeasible due to constraints on the input data imposed by the execution of earlier code.
  • Data Flow: The execution paths are selected on the basis of data usage for example a path where the same variable is both written and wrote.
  • Call Graph: A program is composed of subroutines which may be invoked from other subroutines. The call graph is the tree of subroutine invocations in the program. Tests are designed to cover all invocations in the tree.
  • Entire Path: Execute all possible path through the code. Complete testing is normally infeasible due to the very large number of potential paths.

Saturday, September 17, 2011

The Computer Based Interlocking Architecture

From the desk of
 Sandeep Patalay

The Computer Based Interlocking Architecture


The Solid state Interlocking systems for Railways should ensure the following:
  1. Fail safety
  2. Availability
  3. Reliability
  4. Maintainability
 Architecture and methodology

Generally following three types of redundancy techniques are used for achieving fail-safety in the design of signaling systems:

Hardware Redundancy – In this case, more than one hardware modules of identical design with common software are used to carry out the safety functions and their outputs are continuously compared. The hardware units operate in tightly syncronised mode with comparison of outputs in every clock cycle. Due to the tight syncronisation, it is not possible to use diverse hardware or software. In this method, although random failures are taken care of, it is difficult to ensure detection of systematic failures due to use of identical hardware and software.

Software Redundancy – This approach uses a single hardware unit with diverse software. The two software modules are developed independently and generally utilize inverted data structures to take care of common mode failures. However, rigorous self check procedures are required to be adopted to compensate for use of a single Hardware unit.

Hybrid Model - The hardware units have been loosely syncronised where the units operate in alternate cycle and the outputs are compared after full operation of the two modules. Therefore, it is no more required to use identical hardware and software. Although the systems installed in the field utilize identical hardware and software, the architecture permits use of diverse hardware and software. Moreover, operation of the two units in alternate cycles permits use of common system bus and interface circuitry.

To ensure the above said points hardware and software is designed accordingly. There are various techniques to meet the above said requirements as discussed below:



Table 1: Existing Failsafe Methods employed in Design of Computer Based Interlocking Systems

Method Name
Method of Implementation

Type of Errors Detected
Practical Problems with the Method
Time Redundancy


The same software is executed on the same hardware during two different time intervals
(Refer: Figure 5: Time Redundancy)



Errors Caused by transients. They are avoided by reading at two different time Intervals



Single hardware Fault leads to Shut down of the System. This method is not used since software faults are not completely found in validation. And the Self diagnostics employed for checking of hardware faults is not complete.





Hardware Redundancy


The same software is executed on two
identical hardware channels
(Refer: Figure 6: Hardware Redundancy)


Hardware faults are detected since outputs from both the channels are compared. And single hardware fault does not lead to shut down of the system

Software faults are not detected since the same software is running on two identical hardware channels. Software Faults at design stage are still not detected.



Diverse Hardware

Identical Software is Executed on Different hardware Versions
(Refer: Figure 7: Hardware Diversity)

Hardware Design faults at the Initial stage are Detected

Software Faults at the design stage are still not detected







Diverse software

The different software versions are
executed on the same hardware during
two different time intervals
(Refer: Figure 8: Software Diversity)


Software Faults at design stage are detected

Even though the software is diverse, they are executed on the single hardware channel; single hardware fault leads to Shut down of the system.


Diverse software on
redundant hardware




The different software versions are
executed on two identical hardware
channels
(Refer: Figure 9: Diverse software on redundant hardware)




Software Faults at design stage are detected and single hardware faults does not lead to system shut down


Hardware faults at the design stage are not detected.







Diverse software on
diverse hardware



The different software versions are
executed on two different hardware
channels
(Refer: Figure 10: Diverse software on Diverse Hardware)


Software Faults and Hardware Faults are detected at the design stage.





This methods is rarely used, Since design complexity involved is high





Sunday, September 11, 2011

CENELEC Standard: Faults and Effects


From the Desk of
Sandeep Patalay

CENELEC Standard: Faults and Effects


Effects of single faults
It is necessary to ensure that the system/sub-system/equipment meets its THR in the event of single random fault. It is necessary to ensure that SIL 3 and SIL 4 systems remain safe in the event of any kind of single random hardware fault which is recognized as possible. Faults whose effects have been demonstrated to be negligible may be ignored. This principle, which is known as fail-safety, can be achieved in several different ways:

1) Composite fail-safety
With this technique, each safety-related function is performed by at least two items. Each of these items shall be independent from all others, to avoid common-cause failures. Non-restrictive activities are allowed to progress only if the necessary number of items agree. A hazardous fault in one item shall be detected and negated in sufficient time to avoid a co-incident fault in a second item.

2) Reactive fail-safety
This technique allows a safety-related function to be performed by a single item, provided its safe operation is assured by rapid detection and negation of any hazardous fault (for example, by encoding, by multiple computation and comparison, or by continual testing). Although only one item performs the actual safety-related function, the checking/testing/detection function shall be regarded as a second item, which shall be independent to avoid common-cause failures.

3) Inherent fail-safety
This technique allows a safety-related function to be performed by a single item, provided all the credible failure modes of the item are non-hazardous. Any failure mode which is claimed to be incredible (for example, because of inherent physical properties) shall be justified using the procedure defined in Annex C. Inherent fail-safety may also be used for certain functions within Composite and Reactive fail-safe systems, for example to ensure independence between items, or to enforce shut-down if a hazardous fault is detected.

Whichever technique or combination of techniques is used, assurance that no single random hardware component failure mode is hazardous shall be demonstrated using appropriate structured analysis methods. The component failure modes to be considered in the analysis shall be identified using the procedures defined in Annex C.

In systems containing more than one item whose simultaneous malfunction could be hazardous, independence between items is a mandatory precondition for safety concerning single faults. Appropriate rules or guidelines shall be fulfilled to ensure this independence. The measures taken shall be effective for the whole life-cycle of the system. In addition, the system/sub-system design shall be arranged to minimize potentially hazardous consequences of loss-of-independence caused by, for example, a
Systematic design fault, if it could exist.


 Detection of single faults
A first fault (single fault) which could be hazardous, either alone or if combined with a second fault, shall be detected and a safe state enforced (i.e.: negated) in a time sufficiently short to fulfill the specified quantified safety target. Demonstration of this shall be achieved by a combination of Failure Modes and Effects Analysis (FMEA) and quantified assessment of Random Failure Integrity.

In the case of Composite fail-safety, this requirement means that a first fault shall be detected, and a safe state enforced, in a time sufficiently short to ensure that the risk of a second fault occurring during the detection-plus-negation time is smaller than the specified probabilistic target. In the case of Reactive fail-safety, this requirement means that the maximum total time taken for detection-plus-negation shall not exceed the specified limit for the duration of a transient, potentially hazardous, condition.

Effects of multiple faults
A multiple fault (for example, a double or triple fault) which could be hazardous, either directly or if combined with a further fault, shall be detected and a safe state enforced (i.e.: negated) in a time sufficiently short to fulfill the specified safety target. A suitable method, for example Fault Tree Analysis (FTA), shall be used to demonstrate the effects of multiple faults. The techniques used to achieve detection-plus-negation of multiple faults within the permitted time shall be shown, including supporting calculations.

Saturday, September 3, 2011

ISSUES IN TPWS (ETCS-LEVEL 1) OPERATIONS ON SOUTHERN RAILWAY


Back Ground: TPWS (Train Protection and Warning System), term used by Indian Railways, It applies to the ETCS Level 1 concepts and the UIC/UNISIG specifications. It does not, in Indian terms, apply to the UK implementation that is based on different technology.


The TPWS project on Southern Railway installed in the Chennai Central/ Chennai Beach – Gummidipundi section of Chennai division was commissioned on 2nd May 2008 on 4 EMU rakes to begin with. The works on the balance 37 rakes were progressively completed in the next few months. Presently all the 41 rakes proposed to be provided with TPWS on-board equipments are functional. The TPWS track side equipments in the section were fully provided, commissioned and made functional right from the date of commissioning.

Problems: This TPWS project based on the European Train Control System (ETCS) Level-I system faced many hurdles during the initial installation, proto-type testing, obtaining the required clearances from RDSO and CRS. The major problems noticed during initial revenue service included
1. On-Board system not booting.
2. On-Board system going into System failure (SF) during booting.
3. SDMI ( Simplified Driver Machine Interface) going blank.
4. Speed display bouncing on the SDMI leading to braking.
5. Brake application in the rear non-driving motor coach on run.

Corrective Actions Taken by Railways: 
1. Intermittent BTM failure: - Analysis revealed that there was antenna impedance mismatch. The standing wave ratio (SWR) was found more than the tolerance limit of 1.2 to 1.4. Interference from EMI was also suspected. There was problem in communication between the onboard computer (OBC) and BTM. The corrective actions for these problems included modifying the existing antenna protection cover and providing copper braided shields for the Tx-Rx cable between antenna and BTM and for the COTDL and PROFIBUS cable between OBC and BTM. The BTM configuration files were also modified
based on some internal parameters.

2. Error in Train Interface Unit: - Analysis revealed that there was problem in communication between some modules of the OBC and now screened twisted pair cables have been introduced to protect the signals from external noise and EMI.


3. Error in Speed Sensor: - To improve the performance of the Odometric system, the signal cables between OBC and speed sensors have been provided with copper braided shield firmly connected to the coach body. To suppress the noise in the 110V DC voltage derived from the motor coach battery, a filter has been provided at the input point of the OBC. The traction control relay has been shifted outside the OBC cubicle to reduce EMI. To improve earthing of the motor coach body, a 50 sq mm copper cable is to be connected between the EMU body and its bogie.

4. Back EMF from the EB & EP relay coils: - To cover come this problem, the relay coils and EB valve solenoid coils to be terminated with 180/200V MOVRs and the body of EB & SB relays to be firmly connected to the coach body.

5. EB application in rear coach:- To overcome the problem of application of EB in the rear coach while running, the brake interface circuit has been modified to bypass the EB when the TPWS system in the sleeping mode (SM) i.e., when the cab is not the driving one.

6. SDMI Blanking:- To overcome the problem of SDMI blanking, its software has been upgraded. Apart from this, the OBC-SDMI communication cable connector cover which was earlier plastic has been changed to metallic. The OBC-SDMI communication cable and the SDMI power supply cable have been shielded with copper braids firmly connected to the coach body. A filter has been provided at the 110 VDC input point of the SDMI to suppress the ripples in the power supply.

(Source: IRSTE)

Thursday, September 1, 2011

India advances high-speed studies


Indian Railway Construction Company (Ircon) has appointed Mott MacDonald to carry out a pre-feasibility study on a 993km high-speed line from Delhi to Agra, Lucknow, Varanasi, and Patna.
Mott MacDonald will identify key issues for the development of the project including environmental impacts and assessment of viable technologies. It will also analyse operational and business requirements, including ridership, capital cost, cost-benefit analysis, and development of a planning and implementation schedule.
The project is part of the Indian Government's Vision 2020 long-term national development plan, which envisages four high-speed projects in separate areas of the country, all implemented as public-private partnerships.
A pre-feasibility study on the Ahmedabad - Mumbai - Pune route was recently presented to the Railways Board, and puts the cost of this 634km line at Rs 560bn ($US 12.7bn). Western Railway says trains will operate at up to 350km/h to provide an Ahmedabad - Mumbai journey time of around two hours, compared with 7h 5min by Shatabdi train at present.
Originally these proposals covered only the 555km Mumbai - Ahmedabad section, although the Maharashtra government has lobbied for Pune to be included.