COPYRIGHT NOTICE. COPYRIGHT 2008-2011 by Clinton Jeffery. For use only by the University of Idaho CS 384 class.
Team that requested to stay on Mercurial on google code: specific instructions for me to check out your project, please.
Teams already have a top-level project directory (phunctional/ and pummel/). They should contain src/ doc/ and web/ directories.
Goals:
Potential problems. You may need to...
Bidirectional one-to-one associations introduce a mutual dependency, you can almost just use two pointers, but be careful of the methods to establish and break such associations, to keep things consistent. Things get interesting when multiplicity >1 is involved.
A key point are the methods that establish (and break) the objects' participation in a given association. In Figure 10-10 examine and understand the dovetailing between the removeAccount in class Advertiser and the setOwner in class Account.
Anita Borg believed that technology affects all aspects of our economic, political, social and personal lives. A technology rebel with a cause, in her life she fought tirelessly to ensure that technology’s impact would be a positive one. It was this vision that inspired Anita in 1997 to found the Institute for Women and Technology. Today this organization continues on her legacy and bears her name, The Anita Borg Institute for Women and Technology (www.anitaborg.org).
Dr. Anita Borg devoted her adult life to revolutionizing the way we think about technology and dismantling barriers that keep women and minorities from entering computing and technology fields. Her combination of technical expertise and fearless vision continues to inspire and motivate countless women to become active participants and leaders in creating technology.
In her honor, Google is proud to honor Anita's memory and support women in technology with the 2012 Google Anita Borg Memorial Scholarship. Google hopes to encourage women to excel in computing and technology and become active role models and leaders in the field.
Google Anita Borg Scholarship recipients will each receive a $10,000 award for the 2012-2013 academic year. A group of female undergraduate and graduate students will be chosen from the applicant pool, and scholarships will be awarded based on the strength of each candidate's academic background and demonstrated leadership. All scholarship recipients and finalists will be invited to attend the Annual Google Scholars' Retreat in Mountain View, California in 2012.
You can hear from some of this year's Anita Borg Scholars on how receiving the scholarship has impacted them: The Annual Google Scholars' Retreat at the Googleplex
We know how important a supportive peer network can be for a student's success. All Google scholarship recipients and finalists will be invited to visit Google headquarters in Mountain View, California for the 2012 Google Scholars' Retreat. The retreat will include workshops, speakers, panelists, breakout sessions and social activities scheduled over a 3-day period. Students will have the opportunity to explore the Googleplex and enjoy the San Francisco Bay Area as they get to know other talented computer science students from across the country.
The Retreat Who can apply?
Applicants must satisfy all of the following criteria to be eligible:
The Google Anita Borg Memorial Scholarship is a global program. If you are a student who will not be enrolled at a university in the United States for the 2012-2013 academic year, please visit the Google Scholarships Homepage to learn more about our scholarship opportunities for students around the world. Application process
Please complete the online application. If you are a first time user, you will need to register; if you are already registered, simply login!
You will also be asked to submit electronic versions of your resume, essay responses, transcripts, and name and email of your referrers. Please scan your transcripts and enrollment confirmation into electronic format (PDF format preferred for all requested documents).
Deadline to apply: Monday, February 6, 2012.
Questions? Visit the Frequently Asked Questions page (FAQ) or email us at anitaborgscholarship@google.com.
Thoughts from Wray's "How Pair Programming Really Works"
Perhaps instead of moving it ALL to 383, we should view this section as adding depth to whatever we have said and done about this topic so far. Managing a software project includes:
A Gantt chart is a list of activities in mostly-chronological order, side-by-side with an array of corresponding bar-graph timelines. It is good for smaller projects.
A PERT (Project Evaluation Review Technique) chart is believed to scale better to larger projects; it gives up the linear list-of-activities format in favor of an arbitrary task dependency graph. Each node in the graph is annotated with schedule information. California's example uses the format on the left; Visio provides the more detailed one on the right. PERT charts can be processed computationally (like a spreadsheet), and by applying the durations and dependencies to a particular calendar timeline, the chart can be used to calculate the starts, ends, amount of slack, and critical path through the chart.
| versus |
|
Dr. J's take on the scheduling thing (have to try this some time):
Untested code is incorrect code. All code is guilty until tested innocent.
- various internet pundits
Testing is the process of looking for errors (not the process of demonstrating the program is correct). Bruegge gets more subtle, calling testing a matter of looking for differences between the design (expected behavior) and the implementation (observed behavior). Passing a set of tests does not guarantee the program is correct, but failing a set of tests does guarantee the program has bugs.
Testing is best done by someone other than the person who wrote the code. This is because the person who wrote the code would write tests that reflect the assumptions and perspectives they have already made, and cannot be objective.
Kinds of errors include:
Notes from current version of Pummel's docs
Kinds of testing include:
In-class exercise: if we wanted test cases for our semester project, what should be in them?
Is your test plan a product, or a tool? If you are using your test plan to sell your software, e.g. to a company that will use it in-house, they may want an impressive test plan to give them some confidence in your code. If you are making a product that requires a government or standards-organization approval, you may have to meet their standards. Otherwise...
A test plan is a valuable tool to the extent that it helps you manage your testing project and find bugs. Beyond that it is a diversion of resources.as a practical tool, instead of a product, your test documentation should:
- from [Kaner et al]
Keep in mind that test plans are like other software documentation, they are dynamic in nature and must be kept up to date. Therefore, they will have revision numbers. You may want to include author and contact information including the revision history information as part of either the identifier section of as part of the introduction.
You may want to include any references to other plans, documents or items that contain information relevant to this project/process. If preferable, you can create a references section to contain all reference documents.
Identify the Scope of the plan in relation to the Software Project plan that it relates to. Other items may include, resource and budget constraints, scope of the testing effort, how testing relates to other evaluation activities (Analysis & Reviews), and possible the process to be used for change control and communication and coordination of key activities.
As this is the "Executive Summary" keep information brief and to the point.
Sprint Monday == Sprint Meeting
This can be controlled and defined by your local Configuration Management
(CM) process if you have one. This information includes version numbers,
configuration requirements where needed, (especially if multiple versions of
the product are supported). It may also include key delivery schedule issues
for critical elements.
Remember, what you are testing is what you intend to deliver to the Client.
This section can be oriented to the level of the test plan. For higher
levels it may be by application or functional area, for lower levels it may
be by program, unit, module or build.
The past history of defects (bugs) discovered during Unit testing will help
identify potential areas within the software that are risky. If the unit
testing discovered a large number of defects or a tendency towards defects
in a particular area of the software, this is an indication of potential
future problems. It is the nature of defects to cluster and clump
together. If it was defect ridden earlier, it will most likely continue to
be defect prone.
One good approach to define where the risks are is to have several
brainstorming sessions.
Set the level of risk for each feature. Use a simple rating scale such as
(H, M, L): High, Medium and Low. These types of levels are understandable to
a User. You should be prepared to discuss why a particular level was chosen.
It should be noted that Section 4 and Section 6 are very similar. The only
true difference is the point of view. Section 4 is a technical type
description including version numbers and other technical information and
Section 6 is from the User.s viewpoint. Users do not understand technical
software terminology; they understand functions and processes as they relate
to their jobs.
This is a listing of what is NOT to be tested from both the Users viewpoint
of what the system does and a configuration management/version control
view. This is not a technical description of the software, but a USERS view
of the functions.
Identify WHY the feature is not to be tested, there can be any number of
reasons.
Specify if there are special requirements for the testing.
If the number or type of defects reaches a point where the follow on testing
has no value, it makes no sense to continue the test; you are just wasting
resources.
Specify what constitutes stoppage for a test or series of tests and what is
the acceptable level of defects that will allow the testing to proceed past
the defects.
Testing after a truly fatal error will generate conditions that may be
identified as defects but are in fact ghost errors caused by the earlier
defects that were ignored.
If the project is being developed as a multi-party process, this plan may
only cover a portion of the total functions/features. This status needs to
be identified so that those other areas have plans developed for them and to
avoid wasting resources tracking defects that do not relate to this plan.
When a third party is developing the software, this section may contain
descriptions of those test tasks belonging to both the internal groups and
the external groups.
Training for any test tools to be used.
Section 4 and Section 15 also affect this section. What is to be tested and
who is responsible for the testing and training.
This issue includes all areas of the plan. Here are some examples:
It is always best to tie all test dates directly to their related
development activity dates. This prevents the test team from being perceived
as the cause of a delay. For example, if system testing is to begin after
delivery of the final build, then system testing begins the day after
delivery. If the delivery is late, system testing starts from the day of
delivery, not on a specific date. This is called dependent or relative
dating.
The important thing to remember is that, if you do nothing at all, the usual
result is that testing is cut back or omitted completely, neither of which
should be an acceptable option.
At the master test plan level, this may be all involved parties.
When determining the approval process, keep in mind who the audience is:
Are there any obvious tools we should be using? If you have a choice between
manually documenting your test cases and adopting a tool for it, what are
your tool options and which would you prefer?
Among the most interesting open source candidates there are
Whole books have been written about methods of writing good tests, much of
which boils down to: write tests to challenge the boundary conditions and
assumptions that programmers typically make when writing code.
There are at least two useful kinds of coverage: statement coverage
(executing every statement), and path coverage (executing every path
through the code). Statement coverage is not sufficient to catch
all bugs, but path coverage tends to suffer from a combinatorial
explosion of possibilities. Exhaustive path coverage may not be
an option, but some weaker forms of path coverage are useful.
Coverage testing clarification: "all possible paths" is impractical
due to combinatorial explosion. "all nodes" is inadequate because it
misses too much. The right compromise is "cover all edges".
Example coverage tools:
Hansel is an open source extension to JUnit, based on code developed at
the University of Oregon. It works with bytecode not source code.
It appears to just do statement coverage. Its not much, but its free
and its better than nothing.
COCOMO is acronym-laden, and subject to perpetual tweaking and twisting of its
interpretation. Whatever I give in lecture notes about it will contradict
various COCOMO authoritative sources.
COCOMO starts from an estimate of SLOC (source lines of code). This includes declarations, but
no comments, no generated code, no test drivers). Boehm also refers
to KDSI (thousands of delivered source instructions) and it appears to
be used more or less interchangeably with SLOC.
where EAF is an Effort Adjustment Factor derived from cost drivers,
and E is an exponent derived from the 5 scale drivers. EAF defaults
to 1 and E defaults to 1.0997. But since these are parameters, it is
largely the structure of the equation that matters.
Example: Average? 8KSLOC?
Then effort = 2.94 * (1.0) * (8)1.0997 = 28.9 person-months
Calculators:
Sum of weights = "function points" of program.
Roles:
State-based testing is harder than it would seem; it is hard to automatically
generate the inputs needed before the test that are to put the system in the
state needed in order to test a given transition.
Questions:
Q: how were you supposed to do a test plan when we hadn't talked about
various kinds of software testing yet, and you'd only been given a
preliminary description of what a test plan is? This kind of thing
occurs over and over again when agile/spiral methods mix in with
traditional IEEE Standard (waterfall-style) documentation expectations.
A: Your job includes: ask questions, read lecture notes,
take a first stab at a "test plan", and refine from there, with feedback.
As for the whole waterfall-documents-in-a-spiral-world thing,
we should iteratively create and refine documents like we
do code, each sprint including coding goals, testing goals, and
documentation goals. At the end of the semester, or preferably sooner,
we should have accomplished the full set of documents and code for the
whole project.
Jeffery's conjecture:
Now, let's take a second pass at Test Plans.
Here is a Test Plan Template
based on Latex source.
Other forms of performance testing include volume testing (how does the
system handle big input datasets), security testing (by "tiger teams" ?),
timing tests, recovery tests (e.g. artificially crash the network or
other external resource).
At Microsoft there used to be the mantra:
Why bring it up here and now? Because Dr. J has to be able to read and
understand your code repository in order to grade it. What do I need?
Maybe not literate programming, but a clear, explained codebase.
Dimension 2: direct vs. indirect. Are we measuring things that are
objectively observable, or do we want to put a number on something
more abstract (typically, the *-"lities")? Exercize: how many "lities"
can you think of? How would you measure each?
Some "-ities" Bruegge mentions are stability, modularity, maturity.
Suppose one wants to measure some of these, or some of the nice ones
you came up with in class -- how do you put a number on them? If
project A has a "maturity score" of 1.7 and project B has a score of
3.4, does that mean project B is "twice as mature" as project A?
Given two (of many) measurement systems M and M' that we could use to
measure a property of our software, how do we compare them?
If we were comparing a measure in feet (M) with a measure
in meters (M'), there would be some constant c such that M=cM' (for
height, every reasonable measurement units would be convertible into
every other, so the measurement of height uses a ratio scale type).
We want measure M(x) to preserve relations such that, for example,
M(x) < M(y) is equivalent to some underlying true relation R
between x and y. Don't define a measure unless you understand the
empirical relations (the scale type) for the attribute you are measuring.
I want to measure:
Thought exercise: how do we measure each of these? How much
work will it take?
Of course, eXtreme Programming (XP) advocates do in fact tell you to write
the tests before coding. Our HP advisory board member has been very...
adamant about wanting us to figure out how to do this.
Observation: it is quite possible to write black box tests ahead of time,
but whitebox tests can't be written until there is code. So, are your
"unit tests" black box tests, or whitebox tests, or both?
Corollary: testing can be included in the spiral model, and multiple
iterations of increasing testing go along with multiple iterations of
increasing code functionality.
Potential downsides: rewriting to reduce a particular complexity metric
may just move the complexity around, into unmeasured areas. For example,
one can reduce "cyclomatic complexity" internal to methods by writing
more methods, but does that help>
Strengths: scales well; can be applied to whole programs with about the same
effort as individual functions/methods. Some information-theoretic validity.
May actually give very high level languages their claimed benefit
of being "higher level".
Weaknesses: software science seems to be voodoo. "Volume" and
"potential volume" definitions seem to be just made up numbers.
The program level #'s
might not have a stable scale type, to where you can say that a number of
.5 or above is "good" and below .5 is "bad". Doesn't acknowledge control
or data flow paths as being fundamental to complexity.
(finish discussing Halstead's software science)
McCabe wanted a complexity measure
Note that although the cyclomatic complexity of a whole program is the sum
of all the subroutines and may go very high, McCabe was not worrying about
whole program complexity, he was only worried about individual routine
complexity, so in applying his measure to our whole system we should be
interested in the maximum, and the distribution,
of the individual routines' complexity, not the sum.
Ez, practical cyclomatic complexity? PMD from sourceforge is said to be
integrated into Netbeans, Eclipse, etc.
lecture 19 starts here
There is a whole field called Formal Methods which deals with constructing
proofs of desired properties of programs. While historically these have
been used only in safety-critical systems such as radiation therapy machines,
or operating systems used in national security and defense hardware...there
is a general trend toward reducing the cost of these methods which seems
likely to end up in the mainstream someday.
The Unicon test suite attempts to validate, in a general way, the major
functions of the Unicon language; it is used by folks who build Unicon
from sources, especially those who build it on a new OS platform. The
unicon/tests/README file divides the testing into categories as follows:
Each subdirectory has a suite of tests and sample data, and a Makefile for
building and running tests. The master test/Makefile automates execution of
the general and posix tests, which are routinely run on new Unicon builds.
The general/ directory contains tests "inherited" from the Icon programming
language (50 files, 5K LOC):
The tests are all run from a script, which looks about like the following.
Each test is run from a for-loop, and its output diff'ed against an
expected output. Some differences are expected, such as
the test which prints out what operating system, version and so forth.
Sample test (diffwrds.icn):
M$ doesn't certify that your program is bug-free, but it may certify that
your program was written using current standards and API's. The large
body of software developers tends to prefer the status quo, while M$ has
good reasons to try and force everyone to migrate to whatever is new and hot.
The last time I noticed much about this, the public rollout to developers
of a new forthcoming version of Windows included lots of talk about a
new look and feel (you had to take advantage of it), and new installer
protocols (you had to register your software in a particular way during
installation so that the control panel would know how to uninstall you).
If you were willing to jump through these relatively simple hoops in support
of the M$ marketing push for their new OS, and then submit your software
(and maybe pay a modest fee), they would certify you as Windows compatible,
and you'd be eligible for subsidizing on your advertising fees as long as
you advertise your M$-compatibility.
The Windows 7 Software Logo Specification document can be downloaded free from
Microsoft; it covers topics such as the following. Much of this was
found in the Windows Vista logo specification document.
Web application certifications:
Phunctional:
Data Classification (CC1 and CC2) - what, you mean software certification
includes certification of the data?! Well, we are used to some data being
checked. Baselines, traceability, change control, change review, unauthorized
change protection, release of information...
How much independence is required during certification? Depending on your
level, some objectives may require external measurement, some may require
thorough internal (documented) measurement, and some may be left up to the
discretion of the software developer (e.g. for level "E" stuff).
DO-178B Required Software Verification:
But...there is also the title:
IEEE Computer Society Certified Software Development Professional
and the forthcoming title:
Certified Software Development Associate.
Mostly, the big and expensive test may make you more marketable in a job
search or as an independent software consultant. It is loosely inspired
by the examination systems available for other engineering disciplines.
It covers the SoftWare
Engineering Body of Knowledge (SWEBOK), a big book that sort of says what
should be covered in classes like CS 383/384. Any teacher of such a course
has to pick and choose what they cover, and the test let's you fill in your
gaps and prove that you are not just a Jeffery-product or UI-product, you
know what the IEEE CS thinks you need to know.
There are plenty of fancy commercial Bug Trackers. There
are popular open source ones. Check out
this comparison chart of bug trackers.
Dr. J's observations regarding personnel issues
Extended Static Checker for Java: a local class copy installed at
http://www2.cs.uidaho.edu/~jeffery/courses/384/escjava, but it is
rhetorical for non-Java project years. There is a copy
of the whole thing as a
.tar.gz file in case you have trouble
downloading from Ireland. My .bashrc for
CS lab machines had to have a couple things added:
In addition to your own prioritized task assignments, by the next sprint:
Consider the CMM levels 1-5, given below. Which ones are recognizable?
Part of your team's grade, not just individuals assigned to the task,
will be based on how your team did on testing, including what kinds and
how much testing can be documented. "Documented" includes: showing results
of test runs, bugs found (and possibly subsequently fixed), scripts that
allow as much as possible of the tests to be rerun automatically (for example,
invoking JUnit or similar), and or manual how-to-run-"test X" instructions.
You can think of it thus: the milestone checklist primarily identifies
what has been implemented but says nothing about whether it was
implemented well. Testing still doesn't prove correctness or quality,
but it is necessary to have any hope of approaching those goals.
This is an app dominated by its PNG-writer.
Analysis: this result suggests 2/3rds of execution time on this application
is spent in interp_0, the virtual machine interpreter's main loop. A lot
of time is also spent derefencing (this is the act of following a
memory reference (pointer) to obtain its value), and in type checking and
conversion functions. The program garbage collected 25 times, but apparently
without spending any significant time at it ?! Statistical approximation
has its pros and cons.
Basic questions:
CSCW tools are sometimes related to CASE
(Computer-Aided Software Engineering) tools. In general, CASE tools do
not have to focus on group interaction, and CSCW tools include many types
of work besides software engineering. A Venn diagram would probably show
a giant CSCW circle with a modest overlap to a much smaller CASE circle.
Is there any difference between "communication tool" and
"computer supported cooperative work tool"?
Microsoft Outlook is a ubiquitous scheduling tool for coordinating folks'
calenders and setting up meetings.
Many open source calendar applications are out there, but
UW Calendar
is probably important, because they are my alma mater, and
because they seem to deliver major working tools (e.g. pine).
On the receiving end, the person sees a popup window informing them of the
invitation, which they can accept or reject. (What is suboptimal about this
invitation-response user interface?)
Another form of virtual community is the collaborative virtual environment
. I gave a colloquium talk on this topic recently.
Compared with a wiki, a collaborative virtual environment is:
Possible domains: games, education, software engineering, ...
Now, onto the code reviews. Would each team please suggest a source file,
or shall I pick some at random?
The different UNIX
vendors that supported X11 were all using different widget toolkits, so
portability was hard, even amongst Sun-vs.-HP-vs.-SGI-vs.-IBM, etc. The
reasonably way I found write for all of them was to write in a lower-level
X11 API
called Xlib. But that wasn't portable enough: Icon ran on lots of platforms
besides just UNIX. An M.S. student reimplemented all my X Windows code (on
the order of 15K LOC, which had doubled the size of the Icon VM) with massive
ifdef's for OS/2, proving the Icon graphics API was portable. But that wasn't
portable enough: we needed MS Windows, which was mostly a knock-off of OS/2.
So we refactored all the ifdef's out and defined a window-system abstraction
layer: a set of C functions and macros that were needed to support the higher
level Icon graphics API.
Graphics portability is work-in-progress. Further refactoring is needed
now to support Cocoa/Objective C native Apple graphics. Refactoring is also
needed to support Direct3D as an alternative
to OpenGL. Unicon's 3D graphics facilities were written in OpenGL by an
undergraduate student, Naomi Martinez, but with the advent of
Windows Vista, Microsoft messed up its OpenGL (probably deliberately)
to the point where it too slow to be useful on most Windows machines.
The OpenGL code was originally under an #ifdef Graphics3D. One initial
problem was that about half that code was OpenGL-specific and half was
not and could be used by Direct3D. By brute force (defining Graphics3D
but disabling the includes for OpenGL header files), it was possible to
identify those parts of the 3D facilities that would not compile without
OpenGL. One can put all OpenGL code under an additional #ifdef HAVE_LIBGL
(the symbol used in our autoconf(1) script). Just inserting some
#ifdef's does not really accomplish refactoring, refactoring is when you end
up modifying your function set or classes (your API) to accomodate the change.
For example, the typical OO response to a need to become portable is to
split a class into platform-independent parent and platform-specific child.
Unicon 3D needed refactoring for multiple reasons. A lot of functions
are ENTIRELY opengl, while others are complicated mixes. Also, the
Unicon 3D facilities code was not all cleanly pulled out into a single file,
it is spread/mixed into several files. Besides splitting a class, pulling
code out into a few file is a common operation in refactoring.
What happens during the Unicon3D refactor job when we realize that some
of our current operations can't be feasibly done under Direct3D? What
happens when we conclude that our current API doesn't let us take advantage
of some special Direct3D functionality?
Compiler (lexer, parser) duplications:
Editable Textlist Duplications:
How did we get into this mess: it was no effort at all. Student were assigned
tasks, and copy-and-modify was their natural default mode of operation.
How do we get out: much, much harder. Student employees have resisted
repeated commissionings to go refactor to eliminate the duplication.
Options?
Went well:
Check out this
Current State of Projects
TP.4.0 Test Items (Functions)
These are things you intend to test within the scope of this test
plan. Essentially, something you will test, a list of what is to be
tested. This can be developed from the software application inventories as
well as other sources of documentation and information.
Software Risk Issues
Identify what software is to be tested and what the critical areas are,
such as:
There are some inherent software risks such as complexity; these need to be
identified.
Another key area of risk is a misunderstanding of the original
requirements. This can occur at the management, user and developer
levels. Be aware of vague or unclear requirements and requirements that
cannot be tested.
Features to be Tested
This is a listing of what is to be tested from the USERS viewpoint of what
the system does. This is not a technical description of the software, but a
USERS view of the functions.
Features not to be Tested
Sections 6 and 7 are directly related to Sections 5 and 17. What will and
will not be tested are directly affected by the levels of acceptable risk
within the project, and what does not get tested affects the level of risk
of the project.
Approach (Strategy)
This is your overall test strategy for this test plan; it should be
appropriate to the level of the plan (master, acceptance, etc.) and should
be in agreement with all higher and lower levels of plans. Overall rules and
processes should be identified.
If this is a master test plan the overall project testing approach and
coverage requirements must also be identified.
Other information that may be useful in setting the approach are:
How will meetings and other organizational processes be handled?
Item Pass/Fail Criteria
What are the Completion criteria for this plan? This is a critical aspect of any test plan and should be appropriate to the level of the plan.
This could be an individual test case level criterion or a unit level plan
or it can be general functional requirements for higher level plans.
What is the number and severity of defects located?
Suspension Criteria and Resumption Requirements
Know when to pause in a series of tests.
Test Deliverables
What is to be delivered as part of this plan?
One thing that is not a test deliverable is the software itself that is
listed under test items and is delivered by development.
Remaining Test Tasks
If this is a multi-phase process or if the application is to be released in
increments there may be parts of the application that this plan does not
address. These areas need to be identified to avoid any confusion should
defects be reported back on those future functions. This will also allow the
users and testers to avoid incomplete functions and prevent waste of
resources chasing non-defects.
Environmental Needs
Are there any special requirements for this test plan, such as:
Staffing and Training needs
Training on the application/system.
Responsibilities
Who is in charge?
Schedule
Should be based on realistic and validated estimates. If the estimates for
the development of the application are inaccurate, the entire project plan
will slip and the testing is part of the overall project plan.
At this point, all relevant milestones should be identified with their
relationship to the development process identified. This will also help in
identifying and tracking potential slippage in the schedule caused by the
test process.
Planning Risks and Contingencies
What are the overall risks to the project with an emphasis on the testing
process?
Specify what will be done for various events, for example:
Requirements definition will be complete by January 1, 19XX, and, if the
requirements change after that date, the following actions will be taken:
Management is usually reluctant to accept scenarios such as the one above
even though they have seen it happen in the past.
Approvals
Who can approve the process as complete and allow the project to proceed to
the next level (depending on the level of the plan)?
Glossary
Used to define terms and acronyms used in the document, and testing in
general, to eliminate confusion and promote consistent communications.
Test Case Examples
Many of the test case examples you will find on the web are provided by
vendors who want to sell their software test-related products. There are
whole (expensive) products specifically for Test Case Management out there.
Such commercially-motivated examples might or might not be exemplary of
best practices. You can evaluate them to some extent by asking: How well
does this example fulfill the criterion given by Dr. J above?
Examples
(missing/broken) manual test case instructions
test case report from Vietnam
manual test case
alleged Microsoft-derived test case format
foo
OpenOffice Test Case Template Example (thanks Cindy and Leah)
Historical (Past Class, java team) Test Example
.SUFFIXES: .java .class
.java.class:
$(JC) $(JFLAGS) $*.java
What To Do?
What do we want to do for our class? We want test
cases that are readable, repeatable, and relevant. These criterion
include printable in report form,
traceable back to specific requirements, and readily
evaluable as to whether they turned up a problem or sadly, failed to do so.
Corny Real-Life Example of an Automated Test Case
In the Icon programming language (a pair of large C programs),
a suite of test cases was developed by creating tests each time a new
feature was added to the language. The suite was augmented
with new test cases when pernicious bugs were reported by real
users. One test case, for the "synchronous threads" feature in the language,
looks like this
In this case, "diff coexpr.std coexpr.out" shows whether the test case
matches the expected case. One part of Bruegge's test case definition
that is missing are instructions on how to run the program in order for
it to produce the log, for example in this case there is a shell script
that you run in order to run the test case with the proper input and
output redirection (for example, to catch error output in the log)
Name coexpr
Location .../unicon/tests/general/coexpr (from coexpr.icn)
Input .../unicon/tests/general/coexpr.dat
Oracle .../unicon/tests/general/coexpr.std
Log .../unicon/tests/general/coexpr.out (generated each run)
Example of (white box) testing: Testing Loops
If your job is to write tests for some procedure that has a loop in it,
you can write tests that:
where N is the maximum number of allowable passes.
Peek Back at Example Test Cases
Including the OpenOffice one Cindy recommended.
Objectives to Insert into your Sprint Planning
Unit Testing
One method of unit testing that is statistically shown to be cost-effective
is to read the source code! This may be done with the aid of a partner or
team, performing either a walkthrough (where the code author sets the agenda
and others review) or an inspection (where the leadership sets the agenda).
Unit Testing Documentation
Testing is like detective work?
A SE author named Lethbridge makes an unfortunate analogy between
programmers and criminals; they have a modus operandi, and once you
find what type of bugs a programmer is writing in one place, the programmer
may well repeat similar bugs elsewhere in the code.
In selecting test cases, look for equivalence classes
You usually cannot test all the possible inputs to a program or parameters
to a procedure that you wish to test. If you can identify what
ranges of values ought to evoke different kinds of responses,
it will help you minimize test cases to: one representative from each
class of expected answer, plus extra tests at the boundaries of the
equivalence classes to make sure the ranges are nonoverlapping.
(back to) Unit Testing (Bruegge 11.4.3)
Motivation:
Most important kinds of unit tests:
equivalence tests
partition possible range of inputs into equivalence classes.
develop at least 2 test cases for each class: a valid input,
and an invalid input.
Example:
int getNumDaysInMonth(int month, int year) { ... }
Equivalence classes: months with 31 days, months with 30 days,
and February, which has leap years. Three equivalence classes
means at least 6 tests.
boundary tests
Focus on boundaries between equivalence classes. Developers and routine
tests often overlook boundaries (0, null input, y2k, etc.). Note:
watch out for combinations of invalid input. Two parameters
x and y might both have interesting values that separately would be
legal, but taken together denote an event that the code can't handle.
path tests
Comments on Current Hg
Pummel
Phunctional
Some C++ Unit Testing Frameworks
Question: how does one evaluate unit testing frameworks?
Some (Lethbridge) Bug Categories
Purpose of this list: construction of (mostly white box, mostly unit-)
test cases. A thorough white box tester might perform and document
having performed the following examination of the code to be tested.
For each unit to be tested
For each place where it can, write one or more test cases that looks for it.
For each category given below
Can this kind of bug occur in your unit?
Coverage Testing
Coverage means: writing tests that execute all the code. Since a significant
portion of errors are due to simple typos and logic mistakes, if we execute
every line of code we are likely to catch all such "easy" errors.
A couple more words on our sample coverage tools, Clover and Hansel.
Clover is a commercial product which works by instrumenting the source
code. It does statement and branch coverage, but not most of the
other forms of coverage. It might actually be cool.
Another commercial coverage tool is
JCover, which does
more types of coverage tests. There are no doubt dozens of others.
More on Coverage Testing
Steve Cornett
gives a nice summary of several kinds of coverage testing, including
some reference to the different subsets of path coverage that have
been proposed to make it practical.
Note that although these are phrased as
yes/no questions, a coverage tool doesn't just answer yes/no or even
give you a percentage: it gives you a percentage and shows in detail
each location or case in which the coverage property did not hold.
A couple other useful resources are Marick's
a Buyer's Guide to Code Coverage Terminology, and the related
How to Misuse
Code Coverage.
It is a challenge to even get this much coverage.
Software Project Estimation
Probably this material belongs in CS 383. I am moving it there in future.
I looked at websites for some of this material, in addition to consulting
Roger S. Pressman's book on software engineering.
COCOMO
Boehm's COnstructive COst MOdel.
Barry Boehm is one of the luminary founding
fathers of software engineering, inventor of the spiral model of
software development, and one of the early predictors that software
costs would come to dwarf hardware costs in large computer systems.
Scale Drivers
COCOMO specifies 5 scale drivers:
Cost Drivers
COCOMO has ~15 parameters that assess not just the software to be
developed, they assess your environment and team as well. They are
rated one of: (very low, low, nominal, high, very high, extra high),
with the different values contributing multipliers that combine to
form an effort adjustment factor. From
Wikipedia:
Cost Drivers
Ratings
Very Low
Low
Nominal
High
Very High
Extra High
Product attributes
Required software reliability
0.75
0.88
1.00
1.15
1.40
Size of application database
0.94
1.00
1.08
1.16
Complexity of the product
0.70
0.85
1.00
1.15
1.30
1.65
Hardware attributes
Run-time performance constraints
1.00
1.11
1.30
1.66
Memory constraints
1.00
1.06
1.21
1.56
Volatility of the virtual machine environment
0.87
1.00
1.15
1.30
Required turnabout time
0.87
1.00
1.07
1.15
Personnel attributes
Analyst capability
1.46
1.19
1.00
0.86
0.71
Applications experience
1.29
1.13
1.00
0.91
0.82
Software engineer capability
1.42
1.17
1.00
0.86
0.70
Virtual machine experience
1.21
1.10
1.00
0.90
Programming language experience
1.14
1.07
1.00
0.95
Project attributes
Use of software tools
1.24
1.10
1.00
0.91
0.82
Application of software engineering methods
1.24
1.10
1.00
0.91
0.83
Required development schedule
1.23
1.08
1.00
1.04
1.10
COCOMO equations 1 and 2
Effort = 2.94 * EAF * (KSLOC)E
Time_to_develop = 2.5(MM)0.38
Pithy Software Engineering Quote of the Day
Design without Code is just a Daydream. Code without Design is a nightmare."
-- attributed to Assaad Chalhoub, adapting it from a Japanese proverb.
Estimating SLOC
Basic ideas summarized: you can estimate SLOC from a detailed design, by
estimating lines per method for each class, and summing. Or you can do it
(possibly much earlier in your project) by calculating your "function points"
and estimating lines-per-function-point in your implementation language.
Function Points
Perhaps this might be the 2nd type of thing you measure about a forthcoming
or under-construction software project (after "# of use cases").
Weight each of these; perhaps just designate as "simple", "average", or "complex".
Inspections
Idea: examine source code looking for defects.
Usability Testing (Bruegge 11.4.2)
Three types of usability tests:
state-based tests
Developed for OO systems. Compares end-states of the system after a set of
code is executed, instead of comparing outputs. Derive test cases from a
UML statechart. Test every transition in the statechart. See Figure 11-14.
polymorphism and testing
If you use "virtual" methods and/or polymorphism, how does it affect your
testing strategy? Need to execute a given polymorphic code with all of its
possible runtime types. Example (Fig 11-15): your network interface has
open/close/send/receive methods, it is an abstract class with several
concrete implementations. Test the clients that use the network interface
against each of the concrete implementations.
From Use Cases to Markov Chains to Software Testing
This section is inspired by Axel Krings, who referred me to a paper
by James Whittacre and Jesse Poore.
Suppose you layout a finite state machine of all user activity, based
on your use cases. You can
estimate (or perhaps empirically observe) the probabilities of each
user action at each state. If you pretend for a moment that the actions
taken at each state depend only on being in that state, and not how you
got there, the finite state machine is a Markov chain. While
user actions might not really follow true Markov randomness properties,
the Markov chain can certainly be used to generate a lot of test cases
automatically!
Integration Testing
There are several ways to test combinations of units.
Big Bang
The "big bang": what happens when you link it all together? This has
the advantage of not requiring any additional test stubs that would be
needed to test partially integrated subsystems. But when things go wrong,
you have a needle in a haystack problem of finding the bugs.
Top Down
Top down = work from the user interface gradually deeper into the system.
This is a layered, breadthfirst approach. Advantage: it is more "demo-able"
for customers. Subsets of functionality may become usable before the whole
system integration is completed.
Bottom Up Testing
Bottom up = test individual units first
Focus on small groups (2+) of components. Add components gradually.
Advantage: it is more debuggable and emphasizes meat-and-potatoes
over shallow surface behavior.
Sandwich Testing
Sandwich testing selects a "target layer" and tests it against the layers
above and below it. New target layers are selected to generate additional
tests until the whole system has been tested. If the target layers are
selected to gradually work from top and bottom into the middle, then
sandwich testing is a combination of top-down and bottom-up testing.
Phunctional
well organized, got sprint report done quickly
Pummel
Build report
cd puml && hg update
abort: untracked file in working directory differs from file in requested revision: 'doc/ssrs/ssrs.aux'
...
Please remove ssrs.{aux,log,pdf,toc} from repository, LaTeX generates them.
Unit Testing, Revised
Wikipedia says we have a lot of Unit Test tools to choose from, which
do we use? So far the recommendation was CppUnit or QtTest. You are welcome
to evaluate and select something else if it has technical advantages to you.
Test Plans, Revisited
HW2 part 2 asked you to develop a test plan,
and create a doc/384-hw2.html that would presumably include links to
both your gantt or pert chart, and your test plan.
Customers will only buy-in to a newfangled development process when they
see it gives them some convincing combination of more control, better quality,
and/or less cost. Agile methods may focus more on customer and product than
on documentation, but documentation remains a key element in
communicating with the customer.
System Testing
Tests the whole system. There are many kinds of system tests.
Functional (requirements) testing
Looks for differences between the requirements and the system.
This is traditional blackbox testing. Test cases are derived from
the use cases. Try to create test cases with the highest probability
of finding the bugs the customer will care about the most.
Performance testing
This tests one of the non-functional requirements. Typically, a system
will be tested to see if it can handle the required userload (stress test)
with acceptable performance...which may require a fairly elaborate fake
user environment; consider what it would take to test how a web server
handles lots of hits.
Pilot test
Also called "field test", these tests may go out to a progressively larger
number of real customers.
Testing Odds and Ends
We will probably scratch our way through a few more testing topics
as needed in future lectures.
Acceptance test
Benchmark tests, possibly against competitors or against the system
being replaced (shadow testing). A specialized team from the customer
may be involved in the evaluation of the system.
Installation test
Anytime you install a program on a new computer, you may need to
verify that the program is running in the new environment. Software
is often shipped with a self-check capability, which might run when
first installed and then disable itself. More paranoid software
self-checks every time it is executed, or even periodically during
execution.
Managing test activities
Start the selection of test cases early; parallelize tests.
Develop a functional test for each use case. Develop test
drivers and stubs needed for unit tests. Some expert (Bruegge?)
says to allocate 25% of project resources for testing -- do you agree?
Planning the testing process
Midterm is Coming
It is time to set a Midterm Examination date. The midterm will be
Monday February 27, following a self-study / online review on Friday
February 24. The midterm
will cover aspects of software testing and mapping UML to code. Read
up, or better yet, do and ask questions.
How is your Unit Testing Going?
Do you ALL by now have some experience unit testing, and have
some tangible test driver code to show for it?
If anyone thinks they do not, ask your teammates to help you find
what to work on, or report to Dr. J for advice or reassignment.
Milestones
We want successively more accurate and more complete functional
approximations of the Gus project requirements document(s). I would
more or less like to see explicit sets of use cases attached to
specific milestone dates for demos in class. It is possible that
some use cases might be broken up into multiple milestone dates
and deliverables, but for the most part, use cases define the
usable pieces of functionality of your system.
Where we are At, Where we are Headed
You are, hopefully, in the heat of coding your project.
We are doing homeworks that mainly address this from a
testing point of view. Dr. J is trying to penetrate the
fog of war and understand what you are doing, without
imposing too many extra and onerous tasks on you.
Windows isn't done until Lotus won't run.
In software engineering class, we could have a highly unrelated and less
catchy saying
the work is not done until it is documented, findable (i.e. by Dr. J
navigating in the repository) and reproducible (i.e. others can build/run/test
successfully, not just the author).
The concept of literate programming
Donald Knuth, one of the Vampire Mega-Lords of the computer science world,
proposed the more-or-less radical concept of literate programming.
While his concept has not taken the softare engineering field by storm,
it raises many valid issues.
Knuth's solution (literate programming) is to write a hybrid document that
can be translated into either a book for humans to read, or a source code
base for a compiler to compile.
No Class Monday
It is a Federal and UI Holiday.
Myers' Checklist for Code Inspections
Figures 3.1 and 3.2 of [Myers] give a list of low-level things
to look for.
Source: Glenford [Myers], "The Art of Software Testing".
Data Reference Computation
Data Declaration
Comparison
Control Flow
Input/Output
Interfaces
Other Checks
How do we normalize class participation?
It is typical in 383 and 384 that some folks are doing far more of
the work than others. This can be for any number of reasons, some more
sympathetic than others. Basic goals:
Software Metrics
Software metrics, or measurement, concerns itself with observing properties
of a software system. Engineers really want
In addition, the engineers' managers often want to validate / justify
what the team is doing, i.e. argue that they are using good or best methods.
Metrics Taxonomy
Dimension 1: static vs. dynamic. Are we measuring properties of the
source code or .exe (products)? Properties of the execution(s) (resources)?
Properties of the development activity (processes)?
Phunctional
Pummel
Why Define Software Metrics?
If we are ever going to "engineer" software, we have to have the
underlying science on which to base the engineering. Sciences are
all about explaining the data. Sciences are all about tying the
data to underlying principles. Scientists measure and observe things.
Software Metrics are a step towards placing software on a scientific
basis.
But How do we Define the Right Software Metrics?
Say we want to measure Quality.
Definitions have been proposed for many or most of the *-"ities".
Size-oriented, direct measures
Function-oriented, indirect measures
Build Check: pUML
Build Check: pummel
Metrics in Bruegge
Bruegge says precious little about software metrics. He mentions
management metrics versus quality metrics. Management metrics might
include how many development tasks have been completed, how much $
has been spent, team turnover rates... Quality metrics might include
change requests per week, defects discovered per test-hour or test-week,
how much code is changed per week ("breakage"? "rework"? "flux"?). Simplified Notes from Fenton's
Software Measurement: A Necessary Scienfic Basis
Skipping most of the high powered math...
Measurement Relations and Scale Types
There are many, many ways to assign numbers to an attribute
with varying degrees of numeric validity.
This comes back to explain the topic of scale types mentioned last lecture.
Scale Type Comment
nominal "values" are names; measures can be compared if their
names can be mapped onto each other
ordinal values can be compared, but no origin or magnitude
can be assured
ratio values use a different scale
difference values use a different origin (Celcius v. Kelvin)
interval different origin and unit scale (Celcius v. Fahrenheit)
absolute (directly observed property)
log-interval values are exponentially related
Metrics in the Java World?
Mebbe I just need to assign you homework (in your spare time, on the side)
to try these tools out and report which ones give useful information.
Now, what about C++?
Metrics for our project?
What metrics do we need? How do we measure them?
Inspections
Metrics I Want
I am thinking about Software Metrics as part of the project
management/evaluation process, which blurs the line between
grading (which I do) and documentation (which you do).
CCCC Report #1
CCCC was part of Tim Littlefair's Australian Ph.D. project.
It generates webpages. It collects a number of metrics. It
appears to be readily available on Linux (easy install on my
Mint 12 machine in my office). Questions on my mind include:
What information does it pull out? Does it appear useful?
What potential uses might it be applied to?
Do you need a Ph.D. to interpret it? Do you need to read a
big manual to interpret it?
Sprint Meeting Notes
Pummel
Phunctional
Midterm Exam Discussion
Grading still in progress. We will discuss sample solutions.
Doomsday Speech
As far as I can tell from demos and looking at the Mercurial repositories:
12 principles that guide programming at Google
Ask yourself "which of these are agile methods?" and
"would any of these improve our team's effort?"
A "Weak Tests" Hypothesis
In a past semester, a student provided an excellent hypothesis
for why some of the tests in some of the homeworks have been missing
or trivial. That student said the
reason there aren't tests in many cases is because the code isn't
done -- meaning not written at all, or not finished enough to test,
anyhow. After all, how can you write tests if you don't have the
code written yet?
What Do Integration Tests Looks Like?
A set of integration tests would:
Note that from
http://hissa.nist.gov/HHRFdata/Artifacts/ITLdoc/235/chapter7.htm
there is a good observation, relevant to integration testing:
as component/subsystem size increases, coupling among sibling
components should decrease. If a system design follows this principle,
most integration tests will be near the leaf units.
What does an end-user system test look like?
Consider this
fragment from the Unicon checklist.
Software Complexity
Why measure? Because code that is too complex is more buggy and more
expensive to maintain.
Halstead's "Software Science"
One of the older proposed measures of software complexity takes almost
an information-theoretic approach, and measures complexity in terms of
some low-level, observable properties of the source code, in particular
from the following direct metrics:
Halstead defined the following metrics:
From all this
McCabe's Cyclomatic Complexity
Given a flow graph G, the # of cycles (cyclomatic number) will be
v(G) = e - n + p
where e=#edges, n=#nodes, and p=#connected components.
Before McCabe came along, major corporations had been having
gigantic problems with overly complex subroutines, so the point where
they had instituted maximum size limits, such as each subroutine may have
at most 50 lines (IBM) or two pages (TRW). McCabe's point was that such
limits miss the boat: some nasty spaghetti may become overly complex in
far fewer lines, while plenty of far larger routines are not complex at
all and forcing them to be broken into pieces by arbitrarily limiting their
size only complicates and slows them down.
e - n + 2p
A Skeptic's Testimony
Richard Sharpe's comments on McCabe Cyclomatic Complexity:
the proof in the pudding.
McCabe and OOP
A student once asked how McCabe applies to OOP programs with complex
interconnections of objects. One answer was that cyclomatic complexity
measures control flow complexity without measuring data complexity and
is therefore incomplete, and that OOP systems often have a lot of data
complexity. Another answer is that McCabe's metric is generally applied
at the single function/method unit level, at which calls to subroutines
are abstracted/ignored. Measuring the control complexity of Java is just
as useful (in looking for red-flags) as in non-OO languages. A third
answer is: OOP programs tend to be broken down into smaller functions,
and so the individual functions' complexity may be lower (which is good),
but there must also be a coarser-grained complexity measure for the
call graph, and OO programs may have worse characteristics for that measure.
In-class Exercise: calculate McCabe's complexity metric for an
interesting project method you have written
Any Entrepreneurs?
There is an interdisciplinary Entrepreneurship Certificate
coordinated by the college of business. You get a certificate
(which also appears on your official transcripts) if you complete
introduction to entrepreneurship, enterprise accounting,
and two CS courses: a CS tech elective and our 481 capstone, which
you probably are taking anyhow. If any of you think you may want
to do your own startup someday, it might be useful.
Things to Include in Your Sprint Planning
A Few Thoughts on Measuring Complexity of our Projects
Software Quality
What we have said so far: quality is probably not equal to #bugs/#KLOC.
Probably not change requests or defect reports per week. Some folks say
it is totally: how users perceive the software, how much value they obtain
from it. Others argue quality might be a
multiplied combination of normalized measures of the following properties.
Understandability
Completeness
Conciseness
Portability
Consistency
Maintainability
Maintainability Index = MAX(0,(171 - 5.2 * ln(Halstead Volume) - 0.23 * (Cyclomatic Complexity) - 16.2 * ln(Lines of Code))*100 / 171)
Numbers are from 0-100, with 0-9 a red flag, 10-19 a yellow alert, and 20-100 considered "green"
Halstead Volume is a measure of program size, and lines of code are a measure
of program size, so bigger things are going to be viewed as less maintainable.
Testability
Usability
Reliability
Structured
Efficiency
Security
Issues we ought to work on
Option #1: brainstorm in-class.
Option #2: graded homework assignment.
Pummel
Phunctional
Announcement
Dr. J has to go to Seattle on Friday.
Each team please meet, take attendance, and work on
whatever is most needed for your sprint Feedback on Test Plan Prose
I was asked for feedback on test plan documentation for the following.
Integration Testing
This is the phase where GUI events are tested. During this phase, all use cases will be walked through manually while confirming the correct events are invoked based on input. For these individual use case tests, each test must be run as independently as possible with minimal setup. This is to observe the behavior of each use case completely independent of the others.
Functional Testing
The functional phase is where usage of the UML editor is tested to a much higher degree. Rather than testing each use case individually, there will be a variety of users selected to attempt to produce UML diagrams of different types and magnitudes. This will produce a very large variety of permutations of use cases, and allow us to observe how the use cases behave when used together.
Reflection
Pummel
Phunctional
Agile Methods Tips
Gamedev.net once posted (associated apparently with gdc2010) an interesting
article on agile methods, which has since disappeared into the ether. All
we have left are the following observations about doing agile methods well.
See if any will help in your remaining sprints.
Project Watch
Here's what I have (what am I missing?) for
from git pulls as of 3/23/11
Software Verification
The process of checking whether a given system complies with a given
criterion. One common criterion would be: check/confirm that the software
complies with the design and that the design complies with the requirements.
Some folks would narrow
the definition to refer to a static analysis, that is, things
that are checked without running the program.
Software Validation
Validation is related to verification, but it generally refers to a process
of runtime checking that the software actually meets its requirements in
practice. This may include dynamic analysis.
Validation Testing: an old example
Prologue: This is approximately what I learned about testing from
a university, so according to Kaner, it should not be useful, or I should
not have learned anything from it.
The sub-directories here contain various test material for
Version 11.0 of Unicon and Version 9.4 of Icon.
bench benchmarking suite
calling calling C functions from Icon
general main test suite
graphics tests of graphic features
preproc tests of the rtt (not Icon) preprocessor
samples sample programs for quick tests
special tests of special features
augment.icn collate.icn gc1.icn mem01c.icn prefix.icn struct.icn
btrees.icn concord.icn gc2.icn mem01x.icn prepro.icn tracer.icn
cfuncs.icn diffwrds.icn gener.icn mem02.icn proto.icn transmit.icn
checkc.icn endetab.icn helloc.icn mffsol.icn recent.icn var.icn
checkfpc.icn env.icn hellox.icn mindfa.icn recogn.icn wordcnt.icn
checkfpx.icn errors.icn ilib.icn numeric.icn roman.icn
checkx.icn evalx.icn kross.icn others.icn scan.icn
ck.icn fncs.icn large.icn over.icn sieve.icn
coexpr.icn fncs1.icn meander.icn pdco.icn string.icn
Some of these tests were introduced when new language features were
introduced and may constitute unit tests; many others were introduced when
a bug was reported and fixed (and hence, are regression tests). A
semi-conscious attempt has been made to use pretty much every language
feature, thus, the test suite forms somewhat of a validation of a Unicon
build.
for F in $*; do
F=`basename $F .std`
F=`basename $F .icn`
rm -f $F.out
echo "Testing $F"
$IC -s $F.icn || continue
if test -r $F.dat
then
./$F <$F.dat >$F.out 2>&1
else
./$F </dev/null >$F.out 2>&1
fi
diff $F.std $F.out
rm -f $F
done
#
# D I F F E R E N T W O R D S
#
# This program lists all the different words in the input text.
# The definition of a "word" is naive.
procedure main()
words := set()
while text := read() do
text ? while tab(upto(&letters)) do
insert(words,tab(many(&letters)))
every write(!sort(words))
end
Sample data file (diffwords.dat):
procedure main()
local limit, s, i
limit := 100
s := set([])
every insert(s,1 to limit)
every member(s,i := 2 to limit) do
every delete(s,i + i to limit by i)
primes := sort(s)
write("There are ",*primes," primes in the first ",limit," integers.")
write("The primes are:")
every write(right(!primes,*limit + 1))
end
Sample expected output (diffwrds.std):
The
There
are
by
delete
do
end
every
first
i
in
insert
integers
limit
local
main
member
primes
procedure
right
s
set
sort
the
to
write
What I Have Learned About Testing
Remember, this was in an academic environment, so Kaner would dismiss it.
Software Certification
Certification Examples:
Certification of software usually includes certification of the process
used to create the software. Certification of software is also often
confused with certification of the people who write software.
Where we are at
Windows Certification
This section does not refer to certification of computing professionals,
but to certification of the software written by 3rd parties for use on
Microsoft platforms. Comparable certifications for other platforms
include
If M$ certifies you, you are legally allowed to use their logo on your box.
You have to re-certify each major or minor version in order to retain the logo.
Sprint Reflection
Pummel:
For next Monday:
Pummel:
Phunctional:
QSRs and CGMPs
Software Engineers run into these certification requirements mainly when
writing software for use in medical devices.
Definitions
Intro to DO-178B (thanks to J. A.-F.)
Software Considerations in Airborne Systems and Equipment Certification,
published by RTCA and jointly developed with EUROCAE. As near as I can
tell RTCA is an industry consortium
that serves as an advisory committee to the FAA. At this writing RTCA charges
$160 for the downloadable e-version of DO-178B; I guess they are profiteering
from public information, despite their non-profit status. UI pays money
every year to be a member, and I can access a copy free but can't share it
with you.
So... which category your software gets labeled determines how much testing,
verification, validation, or proof gets applied to it. I hope the labeling
is correct!
DO-178C
As of December 2011, a successor to DO-178B was approved which retains most of
the text of the DO-178B standard, while updating it to be more amenable to
How to be Certifiable
There is "Microsoft certified" and "Cisco certified", which usually
refers to passing an expensive test that covers a specific set of
user tasks on a specific version of software... this is the kind of
certification you'd expect to get from
"Lake Washington Vocational Technical School".
One more certification example
Courtesy of Bruce Bolden, please enjoy this
certification from codinghorror.com
Planning:
In addition to your own prioritized task assignments, please consider:
Product Support
Support for Using the Software
What kinds of support have you seen for folks who just need to use the
software?
A lot of this is really about how long will it take (how much it will cost)
to solve a problem. Humans timeout quickly, some more than others. If you
give them the tools to fix the problem themselves, working on it immediately,
they will probably be happier than if you make them wait for your fix.
Fixing Problems that Occur
How do you know a bug...is a bug?
Bug Trackers
Some past class projects have used
Trac.
Personnel Issues
From Bruegge Ch. 14:
Tasks \ Participant Bill Mary Sue Ed
control design 1,3 3
databases 3 3 1
UI 2 1,3
config mgt 2 3
Corollary: who is watching the watchmen? trust, but verify.
Static Checking, revisited
export PATH=/home/jeffery/html/courses/384/escjava:$PATH
export ESCTOOLS_RELEASE=/home/jeffery/html/courses/384/escjava
export SIMPLIFY=Simplify-1.5.4.linux
The same distribution, which tries to bundle a half-dozen platforms,
almost (and sort-of) works for me on Windows, but may be somewhat
sensitive about Java versions and such. It gives seemingly-bogus
messages about class libraries (on my Windows box) and doesn't
handle Java 1.5 stuff (in particular, Generics such as
Comparator<tile>). There is at least one system (KIV) that
claims to handle generics, but I haven't evaluated it yet.
Risk Management
(Bruegge pp. 607-609)
How to Do Risk Management
risk type
COTS component doesn't work technical
COTS component doesn't show up when needed managerial
users hate/reject the user interface technical
middlware too slow to meet perf. requirement technical
development of subsystems takes longer than scheduled managerial
risk type P I mitigation
COTS component doesn't work technical 0.1 0.9 COTS component doesn't show up when needed managerial 0.3 0.8 users hate/reject the user interface technical 0.6 1.0 middleware too slow to meet perf. requirement technical 0.2 0.9 development of subsystems takes longer than scheduled managerial 0.8 0.9
One thing understated in some textbook descriptions of risk management is
that risk mitigation allocations compete with each other and with core
development resources. Some viable mitigation options may not be worth it.
Capability Maturity Model (CMM and CMMI)
(Bruegge section 15.3)
On May 2 and 4 you will each summarize in 5-10
minutes your particular contribution to your team's project.
The final will be 12:30-2:30pm on XXXday May 10? Double-check.
Level 1: Initial
ad hoc; depends entirely on personnel; unmanaged
Level 2: Repeatable
projects use life-cycle models; basic management; client reviews and acceptance tests
Level 3: Defined
documents all managerial and technical activities across life cycle
Level 4: Managed
metrics for activities and deliverables. data collection throughout project. client knows about risks and measures used for project.
Level 5: Optimized
measurements are used to improve the model during the project
Release Day
Profiling
A profiler is an execution monitor which measures the number of executions
or amount of time spent executing the different parts of a program's code.
Profiling is motivated by the old 80-20 rule: if 80% of execution time is
spent in 20% of the code, then by identifying that 20% of the code we can
focus our attention on improving its correctness and performance.
Who Uses Profilers?
Application developers use profilers largely for performance tuning.
System platform providers use profilers to tune kernels, compiler runtime
systems, and libraries. As an undergrad I wrote a profiler (for C) which
was used to provide input for a code generator which would dynamically
improve its generated code based on application runs.
Kinds of Profiling
Profiling is somewhat related to test coverage; telling you what code has
not been executed is the same as telling you a profile count of 0.
Profiler Granularity
Profilers vary in granularity; source-code granularities often range from
function-level, statement-level and expression-level. It is tempting to
work at the basic block level, since all instructions in a basic block
will execute the same number of times. Q: does basic block granularity
correspond to statement-level, or expression-level?
Java Profilers
The only Java profiler I have used was JProbe, a commercial tool which has
been around awhile and worked pretty well for me on a project at the
National Library of Medicine. JProbe puts out pretty output like this:
Profiling Example
As another profiling example, let's look at the Unicon virtual machine and see
where it spends its time. The Unicon virtual machine, named iconx, is in
many ways a typical giant C program. To profile it, I had to compile
and link with -pg as well as -g options, and then disable its internal
use of the UNIX profil(2) interface!
One difference between iconx and some C programs is
that its inputs vary more widely than is normal: different programs may
use very different language features and spend their time in different
places in the virtual machine and its runtime system. We will look at
its profile when executing one particular program which is by definition
"representative" since it was sent to us by a user in Croatia.
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
65.13 25.09 25.09 9876086 0.00 0.00 interp_0
6.63 27.64 2.56 108318639 0.00 0.00 deref_0
3.63 29.05 1.40 8472811 0.00 0.00 invoke
2.93 30.18 1.13 61891780 0.00 0.00 cnv_ec_int
2.39 31.09 0.92 28907412 0.00 0.00 Oasgn
2.23 31.95 0.86 17074006 0.00 0.00 Oplus
1.61 32.58 0.62 14237739 0.00 0.00 equiv
1.30 33.08 0.50 1355071 0.00 0.00 Zfind
1.22 33.55 0.47 634739 0.00 0.00 cstos
1.14 33.98 0.44 12019549 0.00 0.00 Onumeq
0.93 34.34 0.36 10561077 0.00 0.00 alcsubs_0
0.92 34.70 0.35 3273189 0.00 0.00 Ofield
0.88 35.04 0.34 862347 0.00 0.00 Obang
0.71 35.31 0.28 1562097 0.00 0.00 alcstr_0
0.66 35.57 0.26 6147174 0.00 0.00 lexcmp
0.65 35.82 0.25 25 10.00 10.00 adjust
0.60 36.05 0.23 25 9.20 9.20 compact
0.57 36.27 0.22 14175397 0.00 0.00 Oeqv
0.49 36.46 0.19 5398727 0.00 0.00 Olexeq
0.45 36.63 0.17 17073415 0.00 0.00 add
0.43 36.80 0.17 5214968 0.00 0.00 cvpos
0.39 36.95 0.15 4091331 0.00 0.00 Osize
0.38 37.09 0.14 1405720 0.00 0.00 Osubsc
0.36 37.23 0.14 5542081 0.00 0.00 cnv_c_int
0.35 37.37 0.14 1715559 0.00 0.00 Osect
0.29 37.48 0.11 459321 0.00 0.00 Ztab
0.23 37.57 0.09 6579734 0.00 0.00 cnv_tstr_0
0.19 37.65 0.07 deref_1
0.18 37.72 0.07 3277 0.02 0.02 cnv_eint
0.16 37.77 0.06 1005214 0.00 0.00 alcrecd_0
0.14 37.83 0.06 4179269 0.00 0.00 cnv_str_0
0.13 37.88 0.05 1088962 0.00 0.00 Olexne
0.13 37.93 0.05 870748 0.00 0.00 Ocater
0.13 37.98 0.05 Olexlt
0.12 38.02 0.04 2186145 0.00 0.00 Oneg
0.12 38.07 0.04 1005214 0.00 0.00 Omkrec
0.10 38.11 0.04 482109 0.00 0.00 retderef
0.10 38.15 0.04 Oneqv
0.10 38.19 0.04 cnv_tstr_1
0.08 38.22 0.03 341945 0.00 0.00 Onumlt
0.08 38.25 0.03 alcsubs_1
0.05 38.27 0.02 634739 0.00 0.00 Kletters
0.05 38.29 0.02 184281 0.00 0.00 Obscan
0.05 38.31 0.02 58899 0.00 0.00 sub
0.04 38.33 0.01 Orefresh
0.03 38.34 0.01 274449 0.00 0.00 Zmove
0.03 38.34 0.01 114371 0.00 0.00 memb
0.03 38.35 0.01 98987 0.00 0.00 Ollist
0.03 38.37 0.01 90644 0.00 0.00 itos
0.03 38.38 0.01 85123 0.00 0.00 Onull
0.03 38.38 0.01 58210 0.00 0.00 Onumge
0.03 38.40 0.01 27206 0.00 0.00 tvtbl_asgn
0.03 38.41 0.01 25048 0.00 0.00 Otoby
0.03 38.41 0.01 15488 0.00 0.00 hmake
0.03 38.42 0.01 26 0.38 0.41 Opowr
0.03 38.44 0.01 Orandom
0.03 38.45 0.01 cnv_cset_1
0.03 38.45 0.01 rtos
0.01 38.46 0.01 2186145 0.00 0.00 neg
0.01 38.47 0.01 454303 0.00 0.00 pollevent
0.01 38.47 0.01 81191 0.00 0.00 alctvtbl_0
0.01 38.48 0.01 3876 0.00 0.00 div3
0.01 38.48 0.01 1 5.00 5.00 ston
0.01 38.48 0.01 Onumber
0.01 38.49 0.01 Otabmat
0.01 38.49 0.01 alcselem_1
0.01 38.50 0.01 alctelem_1
0.01 38.51 0.01 cnv_real_1
0.01 38.51 0.01 handle_misc
0.01 38.52 0.01 order
0.01 38.52 0.01 printable
[... many additional functions omitted with 0.00 times ...]
% the percentage of the total running time of the
time program used by this function.
cumulative a running sum of the number of seconds accounted
seconds for by this function and those listed above it.
self the number of seconds accounted for by this
seconds function alone. This is the major sort for this
listing.
calls the number of times this function was invoked, if
this function is profiled, else blank.
self the average number of milliseconds spent in this
ms/call function per call, if this function is profiled,
else blank.
total the average number of milliseconds spent in this
ms/call function and its descendents per call, if this
function is profiled, else blank.
name the name of the function. This is the minor sort
for this listing. The index shows the location of
the function in the gprof listing. If the index is
in parenthesis it shows where it would appear in
the gprof listing if it were to be printed.
Call graph (explanation follows)
granularity: each sample hit covers 4 byte(s) for 0.03% of 38.52 seconds
index % time self children called name
Computer Supported Collaborative Work
CSCW (sometimes called "groupware") is the field of using computers to
assist in the communication and coordination tasks of multi-person projects.
Pfeifer's Overview Pages
Someone from Canada has a nice overview of CSCW on their website.
CSCW Conferences
There are two primary research conferences on CSCW, held in alternating
years, one in North America (CSCW) and one in Europe (ECSCE). From
recent conference papers CSCW can be inferred to span topics such as:
E-mail, Chat, IM, newsgroups, WWW
The original CSCW tool, e-mail, is still the heaviest use of the Internet.
Many or most of the important CSCW ideas vastly predate the WWW.
Notes*, Outlook, UW Calendar
Lotus Notes, Domino, and related products comprise an "integrated
collaborative environment", providing messaging, calendaring, scheduling,
and an infrastructure for additional organization-specific applications.
Providing a single point of access, security, and high-availability for
these applications is a Good Thing.
SourceForge
Collaborative Editors
How do n users edit the same document at the same time? How do they see
each other's changes in real-time? How do they merge changes?
A collaborative editor example: ICI (part of CVE)
In the following example, a person wishing to collaborate on a given piece
of source code opens the file in question, clicks on the person that they
want to collaborate with, and clicks "Invite User" (the GUI has changed
a bit since this screenshot, but the idea is the same).
Wikis
Wiki-wiki means quick in Hawaiian, so this is a "quickie" CSCW tool
So, if we created a wiki for this class, how will I know when I need to
go read it? An advanced Wiki would have some way to notify subscribers
of new content. Given that many people might edit a Wiki page at the
same time, how would a wiki keep from stomping others' work? An advanced
Wiki would have versioning and auto-merging, or full-on synchronous
collaborative editing.
Virtual Communities and Collaborative Virtual Environments
A wiki is an example of a virtual community: a persistent on-line
space in which people can communicate about topics of interest. Many other
forms of text-based virtual communities are out there, including USENET
newsgroups, MUDs, and mailing lists.
A conference on CVE's has been held several times, but the field's identity
remain's split between the CSCW and VR (Virtual Reality) communities.
Additional CSCW Resources
End-of-Semester Checklist
Let's perform some arbitrary and capricious code reviews...
...to get you in the mood for instructor course evaluations. Remember,
course evaluations are vital to the operation of our department! I might
(by carefully and honestly assigning a grade you have earned) determine
whether you get to repeat 384 or not, but you (by carefully and honestly
evaluating the instructor and the course) not only suggest how to improve
the course, but whether I should keep my job. Let's bang out those course
evaluations. Did you learn anything? Why or why not? What should be done
different?
End of Semester Presentations
XXX Team Talks will be on YYY
Figure you have T-2 minutes, where T=50/N
ZZZ Team Talks will be on YYY+2
Site Quickchecks
Refactoring: More Examples
A lot of my examples will naturally come from my research efforts...
Refactoring for Graphics Portability
Around 1990 I wrote "a whole lot" of X Windows code to allow rapid development
of visualization experiments in Icon instead of in C. The goal from the
beginning was multiplatform portable (like Icon) and easy to use (like my
good old TRS-80 Color Computer, where Tandy had extended Microsoft BASIC
with color graphics and music).
Code Duplication Hell
Unicon projects such as the CVE program from cve.sf.net are just as
susceptible to lack of design or bad implementation as any other language.
But how did we get to where we support four or more different copies of the
Unicon language translator front-end, and 5+ different copies of the GUI widget
that implements a multi-line editable textlist (text editor)? And how do
we refactor our way out of this mess?
Could Improve:
End-of-Semester Checklist
Do you remember those neat-o forms that I passed out to you with which to
give you an idea about computing your grade? Cross-reference the checklist
with the syllabus weighting, which said:
Attendance is required, as this course emphasizes collaboration.
The grading will be proportioned as follows: 20% for homeworks, 20%
for the midterm exam, 20% for the final exam, and 40% for a term project.
Final Examination Review
Welcome to the final exam review day. One way to review is to go back through
all of the lecture notes. Another way is to look at past exams. A third is to
discuss what Dr. J really wishes you learned out of the course.
Welcome to the Final Exam