Fault-Tolerant Systems (CS449/549)

Welcome to Fault-Tolerant Systems CS449/549, which is offered in the Spring Semester 2019 at the University of Idaho in Moscow and is also available though Engineering Outreach for off-campus students.

This web-page contains information about the course, e.g. syllabus, class notes, pointers to interesting places etc. Material can be down-loaded in pdf (or postscript) format, and will be made available in the updated form as the class goes on. To get an idea of what this class is about, take a look at last time's page. However, materials and topics constantly change, and this class will be no exception. If you have comments, please let me know.

Engineering Outreach students, there are several things you should know. First of all, if you are trying to contact me, you can call 800-824-2889 ext. 4078 (toll free). Please download the class material from the web page. This speeds up the distribution process and avoids shipping delays. There are several assignments that require access to local simulation tools. Engineering Outreach students need to have web access with ssh capability in order to use this software. Accounts on local workstations will be made available. I will talk more about this when the time has come...

Course description: this course addresses design, modeling, analysis, and integration of hardware and software to achieve dependable computing systems employing on-line fault-tolerance. It covers the concepts and terminologies of Fault-Tolerant System Design including: Reliability, Dependability, Maintainability, Redundancy, Error Detection, Damage Confinement, Error Recovery, Fault Treatment, Redundancy Management, Voting, Information Redundancy, Random Variables, cdf, pdf, Expectation, Bathtub Curve, MTTF, Reliability of Series/Parallel Systems, Stand-by Redundancy, M-of-N System, Reliability Block Diagrams, Fault Trees, Markov Process, Petri Nets, General Stochastic Petri Nets, Recovery Strategies, Roll-back Recovery, Agreement and Consensus, Byzantine Clock Synchronisation, RAID, Fail-Stop Processes, Systems Diagnosis, Case studies. I always change the material slightly to account for interesting changes in the field.

Note: This class has a prerequisite of Computer Operating Systems (CS240) or permission of the instructor. In a 400/500 level computer science class I expect working knowledge of unix and MS operating systems.

  • Class Handouts:
  • Reading Assignments (so far):
  • Homeworks/Exams:

  • A special thanks to Dr. Roger Kieckhafer (MTU) for the contributions to the material used in this class.