CS 448/548: Survivable Systems and Networks
This page is ALWAYS under construction!!!
Welcome to CS448/548 Survivable Systems and Networks.
This course is offered in the Spring Semester 2011 at the
University of Idaho.
The course is taught by
Dr. Axel Krings.
The web site used the last time the course was taught can be viewed
here,
but be aware that each semester the format and material will change
to reflect the dynamic behavior of the research area.
This web-page
contains information about the course, e.g. syllabus, class notes, pointers
to interesting places etc.
Material can be down-loaded in pdf and/or postscript format, and will be made
available in the updated form as the class goes on.
If you have comments, please let me know.
Imagine what would happen if our critical infrastructures were to be compromised by malicious act -- failure of communications, power, water, gas, banking & finance, emergency services etc. With increasing computer security concerns and the recognition of the vulnerability of our critical infrastructure to cyber terrorism, achieving Survivability of Systems under attack is vital in computing and networked systems, whether it is the systems themselves or the critical applications or infrastructures they control.
This course will focus on malicious act and other faults and their impacts on systems, as well as techniques useful in the design of systems that can survive such acts. Survivability goes beyond computer & network security or fault-tolerance. The range of threats to survivability that must be considered is enormous, including hardware malfunctions, software flaws, environmental hazards, and malicious and accidental human acts.
But can one really design systems that can survive attacks, tolerate intrusions? You would be surprised to find out that there is an entire research areas that deals with exactly that. Don't think of your laptop that becomes invincible (no James Bond scenarios here). Think bigger, think of models that help analyze systems, model reliability, identify essential services, explore the limits of redundancy. Think of what kind of faults or attack scenarios those systems may be subjected to. Now tab into the vast amount of tools and solutions that exist, including agreement algorithms, N-version & N-variant software, new Hybrid Fault Models, new analyzing approaches etc. and start designing your system!
Course description:
This course discusses issues of Survivability, Attributes of System
Survivability, Trustworthiness, Dependability and Assurance, Threats to
Survivability, Threats to Security, Threats to Reliability, Threats to
Performance, Requirements and Their Interdependence, Systemic Inadequacies,
Approaches for Overcoming Deficiencies, Evaluation Criteria, Attempts
at Standardization, Architectures for Survivability, Implementing and Configuring
for Survivability.
A wealth of literature has surfaced that deals with issues of system
survivability.
This class will be taught in several phases in which material
will be presented by the instructor and literature will be reviewed by
individual or groups of students.
The results will be individual and group
presentations as well as discussions of contemporary issues.
The exact list of topics and class format is not final and a work in progress.
- Contact information:
- Axel Krings (PhD), JEB B30,
- Phone: 208-885-4078, fax: 208-885-9052.
- Engineering outreach students: dial toll free 800-824-2889 ext 4078
- Mailing address: Engineering Outreach, PO Box 441014,
Moscow, Idaho 83844-1014.
- Office Hours:
(see here)
- MWF 11:30-12:20 room JEB 026.
- Class Forum
-
To access the class forum you need to do the following: 1) go to the CS main site and select "Forums".
(Yes - you may get a certificate warning message, but just ignore it).
Next you will be asked to log in using your CS login name and your CS password. Now you can select the forum you would like to join.
If you don't have a CS account (this may apply to EO students) please see the FAQ.
- Spring 2011 Term Class Handouts:
- The handouts are ordered by sequence numbers and the material covered in the lectures
are indicated next to the date.
- If there are any problems with accessing the handouts,
please let me know (email, phone, smoke signs, drums, ...)!
- Corrections: some slides may contain formatting errors, typos etc.
which have been addressed in class, but have not been reflected
in the notes posted here.
- Course syllabus: to be discussed in class.
- Lecture Support Material: Note that this represents only a subset of the issues presented in class!
Whereas the information below gives the general information about the schedule of the lectures,
it does not always indicated the specific approaches, methods, mechanisms, basic concepts and building blocks.
These are derived using the reading assignments as "case studies", the concepts are introduced as we discuss the papers.
Note that we will stretch out the material of the first few
classes in order to address background issues raised during
the presentation of the papers. This will help especially
students that have not taken computer security and fault-tolerant systems.
However, please do not confuse hand-waving with in-depth knowledge!
- Lecture 1 (01/12/11): [1/1-1/03]
Sequence 1, (pdf),
:
Introduction, Fault-tolerance primer, based on Reading assignment 1
- Lecture 2 (01/14/11): [1/4-1/07]
Sequence 2, (pdf),
:
Cont. Fault-tolerance primer, Standard Definitions, Assumptions and their Limitations, [Reading assignments 2 & 3)]
- M.L. King Day (01/17/11): no class
- Lecture 3 (01/19/11): [1/8-2/05]
Sequence 3, (pdf),
:
Discussion of Ellison paper
- Lecture 4 (01/21/11): [2/06-3/09]
Sequence 4, (pdf),
:
Survivability Life Cycle, [Reading assignment 4]
- Lecture 5 (01/24/11): [3/10-3/16]
(I added some more slides to sequence 4 -- sorry :)
- Lecture 6 (01/26/11): [3/17-4/17]
Sequence 5, (pdf),
:
Survivable Network Analysis Method
- Lecture 7 (01/28/11): [4/18-5/07]
Sequence 6, (pdf),
:
Survivable Network System Analysis cont.,
A Case Study in Survivable Network System Analysis cont.
- Lecture 8 (01/31/11): [5/08-5/19]
Preliminary discussion on Fault Models [Reading assignment 5],
A Case Study in Survivable Network System Analysis cont.
- Lecture 9 (02/02/11): [5/20-6/19]
Sequence 7, (pdf),
:
SNA applications
- Lecture 10 (02/04/11): [7/01-8/11]
Sequence 8, (pdf),
:
Preparing towards dealing with or defining faults (no matter what the origin is): definitions
- Lecture 11 (02/07/11): [8/12-9/04]
Sequence 9, (pdf),
:
CS548 project discussion, Agreement Algorithms, basics
- Lecture 12 (02/09/11): [9/05-9/20]
Agreement Algorithms cont.,
- Lecture 13 (02/11/11): [9/21-9/34]
Sequence10, (pdf),
:
Hybrid Fault Models, [Based on the material in Reading Assignment 6]
- Lecture 14 (02/14/11): [10/01-10/10]
Hybrid Fault Models, cont.
- Lecture 15 (02/16/11): [10/10-11/07]
Sequence11, (pdf),
:
Surviving Attacks and Intrusions: What can we Learn from Fault Models,
[Based on the material in Reading Assignment 7]
- Lecture 16 (02/18/11): [11/08-11/14]
Surviving Attacks and Intrusions: What do we need - what can we expect?
- Presidents' Day (02/21/11): no class
- Lecture 17 (02/23/11): [11/14-11/26]
Sequence12, (pdf),
:
Basic Concepts and Taxonomy of Dependable and Secure Computing
[Reading Assignment 8]
- Lecture 18 (02/26/11): [11/27-12/14]
Concepts and Taxonomy of Dependable and Secure Computing
- Lecture 19 (02/28/11): [12/15-12/32]
continuation of discussion on reading assignment 8, [Reading Assignment 9].
- Lecture 20 (03/02/11): [12/33-13/06]
Sequence13, (pdf),
:
Recognition: Dealing with patters.
- Lecture 21 (03/04/11): [13/07-14/08]
Sequence14, (pdf),
:
Modeling background information, Markov chain basics. You might want to check out the Markov chain notes in the CS449 website
- Lecture 22 (03/07/11): [14/09-15/09]
Sequence15, (pdf),
:
Markov Analysis of Software Specifications
- Lecture 23 (03/09/11): [15/10-15/22]
Markov analysis cont., [Reading Assignment 10]
- Lecture 24 (03/11/11): [16/01-16/10]
Sequence16, (pdf),
:
Decentralizing services, Real-time attack recognition, [Ask questions for exam 1]
- Spring Break (03/14-18/11): Spring Break
- Lecture 25 (03/21/11): [16/11-16/25]
Real-time attack recognition & recovery cont.
- Lecture 26 (03/23/11): [16/26-17/09]
Sequence17, (pdf),
:
Decentralized Services: case study background: RAID (note: this will be only a brief outline of the material) [Reading Assignment 11]
- Lecture 27 (03/25/11): [17/10-18/10]
Sequence18, (pdf),
:
Decentralized Services: case study Survivable Storage [Reading Assignment 12]
- Lecture 28 (03/28/11): class canceled
- Lecture 29 (03/30/11): [18/11-18/18]
Decentralized Services: case study Survivable Storage cont., Exam discussion
- Lecture 30 (04/01/11): [18/19-19/05]
Sequence19, (pdf),
:
How to share a secret, (derivation on board), [Reading Assignment 13]
- Lecture 31 (04/04/11): [19/05-20/06]
Sequence20, (pdf),
:
Decentralized Services: case study SITAR, [Reading Assignment 14]
- Lecture 32 (04/06/11): [20/07-20/19]
SITAR cont., basic concepts of intrusion tolerant systems
Decentralized Services: case study Multi-variant Execution Models, [Reading Assignment 15]
- Lecture 33 (04/08/11): [20/20-21/04]
Sequence21, (pdf),
:
Case study: Survivability architecture. Concepts:
N-version and N-variant executions,
- Lecture 34 (04/11/11): [21/05-21/21]
N-variant executions, Petri-Nets, Stochastic Activity Networks, Probabilistic Automaton
- Lecture 35 (04/13/11): [21/22-21/28]
discussion on modeling systems, example model of Markov Chain and Petri Net of simple system.
Check out the slide sequences on Petri Nets on the website of the CS449/549 class page.
- Lecture 36 (04/15/11): [21/29-21/34]
examples: Petri Net, SAN, probabilistic automaton, and how to model a system to get an idea about its inherent survivability.
- Lecture 37 (04/18/11): [21/35-22/01]
Sequence22, (pdf),
:
Survivability Quantification, Markov Models, [Reading Assignments 16]
- Lecture 38 (04/20/11): [22/02-22/10]
Survivability quantification, case study telephone system, analysis using common survivability definitions,
Performance model, Availability model, Composite model
- Lecture 39 (04/22/11): [22/11-22/17]
Dealing with the T1A1.2 survivability definition with unknown fail-rates.
- Exam 2 (04/22/11): handed out: Due date is Monday time of the class. Bring a hardcopy to class and email the pdf as described in the exam.
EO students, make arrangements after you have heard lecture 39. I will email you the exam and you will have 3 days to return it.
- Lecture 40 (04/25/11): [23/01-23/02]
Sequence23, (pdf),
:
Design Methodology for Survivable System, IntelliDrive application domain
[Reading Assignments 17]
:
- Lecture 41 (04/27/11): [23/03-23/19]
Sequence24, (pdf),
:
Design methodology for Survivable Systems, Real-time Monitoring, certifying and more.
- Lecture 42 (04/29/11): [23/20-24/13]
Sequence25, (pdf),
:
Result certification and survivability of large computations [Reading Assignment 18]
- Lecture 43 (05/02/11): [24/14-24/29]
Sequence26, (pdf),
:
Risk Basics
- Lecture 44 (05/04/11): [25/01-27/04]
Sequence27, (pdf),
:
SP800-30 Risk Management Guide, Risk Management or Risk Analysis?
- Lecture 45 (05/06/11): [27/05-28/28]
Sequence28, (pdf),
:
Risk Case Study: Firewall
- Final exam (Spring 2011 Final Exam Schedule)
- Reading Assignments (so far):
- Note: besides the reading assignments below there are references to papers in the slides. These papers should be looked at as well!
- 1) Fault-Tolerant Computing: Fundamental Concepts, by Victor P. Nelson *
- 2) Survivable Network Systems: An Emerging Discipline, by R. J. Ellison D. A. Fisher R. C. Linger H. F. Lipson T. Longstaff N. R. Mead
(CMU-report-97tr013.pdf)
- 3) Survivable Network Analysis Method, by Nancy R. Mead Robert J. Ellison Richard C. Linger Thomas Longstaff John McHugh
(CMU-report-00tr013.pdf)
Note that this includes the previous report. Our focus will be on the material starting with chapter 3.
- 4) (CMU-report-98tr014.pdf)
- 5) The Byzantine Generals Problem, by Leslie Lamport, Robert Shostak and Marshall Pease,
ACM Transactions on Programming Languages and Systems, Volume 4, Issue 3, (July 1982).
This paper is mainly for students that have not take CS449/549
and will bring them up to speed on topics related to fault models.
We will discuss their limitations in hostile environments later.
- 6) Thambidurai, P., and You-Keun Park, "Interactive Consistency with Multiple Failure Modes",
Reliable Distributed Systems, Volume, Issue, 10-12 Oct 1988 Page(s):93 - 100.
Also look at the follow-up paper "Verification of Hybrid Byzantine Agreement Under Link Faults"
by P. Lincoln and J. Rushby that addresses a problem in the algorithm of Thambidurai and Park.
- 7) Axel Krings, and Zhanshan (Sam) Ma, "Surviving Attacks and Intrusions: What can we Learn from Fault Models",
Proceedings of the 42nd Hawaii International Conference on System Sciences, (HICSS-42) , Waikoloa, Big Island, Hawaii, January 5-8, 2009.
- 8) Basic Concepts and Taxonomy of Dependable and Secure Computing, Algirdas Avizienis, Jean-Claude Laprie, Brian Randell, and Carl Landwehr,
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 1, NO. 1, JANUARY-MARCH 2004
- 9) [Whi93] Whittaker James A., and J.H. Poore, Markov Analysis of Software Specifications,
ACM Transactions on Software Engineering and Methodology, Vol.2, No.1,
January 1993, pp. 93-106. (get from web)
- 10) A Two-Layer Approach to Survivability of Networked Computing Systems, Krings A.W, et. al.
(pdf)
- 11) Patterson, D.A., et. al., ÒA Case for Redundant Arrays of Inexpensive Disks (RAID)Ó,
ACM SIGMOD Records, International Conference on Management of Data, Vol.~17, No.~3, pp.~109-116, June~1988.
Note: this is only a background paper (to be read keeping the date in mind).
- 12) Survivable Storage, CMU Tech. Report CMU-CS-01-120.
Also look at "Decentralized Recovery for Survivable Storage Systems", Theodore Ming-Tao Wong May 2004 CMU-CS-04-119
- 13) Adi Shamir, "How to Share a Secret", Communications of the ACM, Vol. 22, No. 11, November 1979.
- 14) SITAR: A Scalable Intrusion-Tolerant Architecture for Distributed Services,
by Feiyi Wang, Fengmin Gong, Chandramouli Sargor, Katerina Goseva-Popstojanova, Kishor Trivedi, Frank Jou,
Proc 2001 IEEE Workshop on Information Assurance and Security, United States Military Academy, West Point, NY, 5-6 June, 2001
- 15) An Adaptive N-variant Software Architecture for Multi-Core Platforms: Models and Performance Analysis,
by Li Tan and Axel Krings, Proc. 11th Intl. Conference on Computational Science and its Applications (ICCSA 2011), June 20-23, 2011.
(*)
- 16) A General Framework for Network Survivability Quantification, by Y. Liu and Kishor Trivedi, Proc. 12th GI/ITG MMB, 2004.
- 17) John Munson, Axel Krings and Robert Hiromoto, The Architecture of a Reliable Software Monitoring System for Embedded Software Systems,
(pdf)
- 18) Krings Axel, Jean-Louis Roch, Samir Jafar and Sebastien Varrette,
"A Probabilistic Approach for Task and Result Certification of Large-scale Distributed Applications in Hostile Environments",
Proc. European Grid Conference (EGC2005), in LNCS 3470, Springer Verlag, February 14-16, 2005.
(pdf)
- Assignments (so far):
- Pointers to Research:
- needs to be cleaned up :-)
- DDoS issues
- Peter Neumann
- Survivability/Dependability Groups/Projects
- Critical Infrastructure Protection
- Groups/Reporting/Advisories:
- Interesting Links