Fault-Tolerant Systems (CS449/549)

Welcome to Fault-Tolerant Systems CS449/549. This course is offered in the Fall Semester 2005 at the University of Idaho in Moscow and is also available though Engineering Outreach for off-campus students. The course is taught by Dr. Axel Krings.

This web-page contains information about the course, e.g. syllabus, class notes, pointers to interesting places etc. Material can be down-loaded in pdf (or postscript) format, and will be made available in the updated form as the class goes on. To get an idea of what this class is about, take a look at last semester's page. However, materials and topics constantly change, and this class will be no exception. If you have comments, please let me know.

Engineering Outreach students, there are several things you should know. First of all, if you are trying to contact me, you can call 800-824-2889 ext. 4078 (toll free). Please download the class material from the web page. This speeds up the distribution process and avoids shipping delays. If you do not have a pdf viewer, you can get it free at adobe, if you need a postscript viewer, check out the aladin viewer. If for some reason you are not able to download the material, please contact Engineering Outreach. There are several assignments that require access to local simulation tools. Engineering Outreach students need to have web access with telnet capability in order to use this software. Accounts on local workstations will be made available.

Course description: this course addresses design, modelling, analysis, and integration of hardware and software to achieve dependable computing systems employing on-line fault-tolerance. It covers the concepts and terminologies of Fault-Tolerant System Design including: Reliability, Dependability, Maintainability, Redundancy, Error Detection, Damage Confinement, Error Recovery, Fault Treatment, Redundancy Management, Voting, Information Redundancy, Random Variables, cdf, pdf, Expectation, Bathtub Curve, MTTF, Reliability of Series/Parallel Systems, Stand-by Redundancy, M-of-N System, Reliability Block Diagrams, Fault Trees, Markov Process, Petri Nets, General Stochastic Petri Nets, Recovery Strategies, Roll-back Recovery, Agreement and Consensus, Byzantine Clock Synchronisation, RAID, Fail-Stop Processes, Systems Diagnosis, Case studies. I always change the material slightly to account for interesting changes in the field.

Note: This class has a prerequisite of Computer Organisation and Architecture (CS245) or permission of the instructor. In a 400/500 level computer science class I expect working knowledge of unix and MS operating systems.

Back