The courses below concentrate on the issues of reliable and fault-tolerant system design.
CS 425/ECE 428: Distributed Systems
Covers topics needed for a basic understanding of distributed computer systems: protocols, specification techniques, global states and their determination, reliable broadcast, transactions and commitment, security, and real-time systems. Prerequisite is CS 241. Available Fall 2006.
CS 536/ECE 542: Design of Fault-Tolerant Digital Systems
This course introduces a system (hardware and software) view of design issues in reliable computing. The material represents a broad spectrum of hardware and software error detection and recovery techniques. The lectures discuss how these techniques interplay; e.g., which techniques can be provided in hardware, operating system, and network communication layers, and what can be provided via a distributed software layer and in the application itself. Prerequisite is ECE 411 or equivalent.
ECE 543: Digital Systems Testing and Design for Testability
This course teaches fundamentals of testing theory and practice for complex VLSI designs. The objectives are to give the student the ability to solve a wide range of non-trivial testing problems using practical and cost-effective techniques. Students will also learn to create test automation tools on their own. Topics covered include fault modeling, fault simulation, automatic test generation in combinational and sequential circuits, functional testing of microprocessors, ALUs and memories, design for testability, synthesis for testability, and built-in self-test and diagnosis. Prerequisites are ECE 411 and ECE 462 or equivalent.
Available Spring 2007.
ECE 584: IC Reliability Engineering
Description of the algorithms and procedures required to study the reliability of integrated circuit products. The course covers reliability modeling, physical causes of semiconductor device failure, reliability model development and calibration, model-based reliability prediction, product testing and measurement, and failure diagnosis. Coverage emphasizes application to integrated circuit technology.
ECE 586CH: Coding Approaches to Reliable System Design
This course describes systematic and integrated approaches toward the design and implementation of fault-tolerant combinational circuits and dynamic systems. The course blends together techniques from coding and complexity theory, digital design, and control, automata and system theory. The study of fault tolerance in systems that evolve over time follows a unifying approach that exposes the similarities between coding for reliable communication and coding for reliable computation. The course discusses several examples of systems of interest, including finite-state machines, discrete event systems, digital signal processing filters, and cellular automata. An introduction to the basic objectives and techniques in coding and in design for fault diagnosis and fault tolerance is provided.