IIMS 1994: Ring - computer administered testing in an IMM environment

Computer administered testing in an IMM environment: Research and development

Geoff Ring
Edith Cowan University, Western Australia

The purpose of this paper is to describe a research and development project which is attempting to evaluate the efficacy of using a state of the art CAT system to determine the mathematical competency levels of a group of professional trainees (first year preservice education students). The CAT system makes optimal use of the capabilities of current microcomputer hardware and software, particularly those features which result from the application of hypermedia principles and multimedia technologies to the overall testing environment as well as to the assessment of specific performance oriented tasks. The CAT system, which is characterised by powerful navigational tools, elements of multimedia, a variety of question formats, a high level of learner control, procedural help and several feedback options, will be demonstrated.

Introduction

The purpose of this paper is to describe a research and development project based on a state of the art computer administered testing system which makes optimal use of the capabilities of current microcomputer hardware and software, particularly those features which result from the application of hypermedia navigation principles and multimedia technologies.

Testing is used for a variety of purposes: determining what a student knows, assigning grades, determining University entry, deciding who should be employed, etc. Computers have often been used in the testing process as an aid to the construction and scoring of "paper and pencil" tests. Interest in the use of computers to administer tests has received increasing attention in recent times. The term "computer administered testing" (CAT) implies a completely automated environment whereby the test is constructed using the computer, the student completes the test at the computer and the test results are concurrently scored by the computer. Research has shown a student preference for CAT over conventional paper and pencil testing provided the testing environment and the test itself are both well designed (Anderson & Trollip, 1982).

The presentation will be based on the demonstration of a prototype of a CAT system which is characterised by powerful navigational tools, elements of multimedia, a variety of question formats, a high level of learner control, procedural help and several feedback options. The prototype CAT system uses conventional test strategies and is based on three fundamental principles:

Students should not be disadvantaged in comparison with non-computer based testing;
Full use should be made of the computer based medium; and
Instructor control over the testing environment should be maximised.

The fact that CAT systems are not a significant part of education environments is largely due to their failure to adhere closely to these principles. In particular, many fail to provide the student with such basic capabilities as previewing and "browsing" (the ability to move freely among test items and re-answer them). The American Psychological Association (1986) has argued strongly that:

computerised administration [of tests] normally should provide test takers with at least the same degree of feedback and editorial control regarding their responses that they would experience in traditional testing formats (p. 12).

It is essential that state of the art CAT systems be thoroughly evaluated so that the full potential of the computer based medium for testing can be realised. The proliferation of networked microcomputers in education, commercial and industrial environments in recent years means that it is now feasible to administer sophisticated tests directly to students at a computer in a hypermedia/ multimedia environment.

The administration of competency based testing is common among traditional educational institutions as well as being prevalent in commerce and industry. There is a need to capitalise further on the huge investment in computerisation in both educational institutions and the workplace. Well designed CAT systems have the potential to reduce instructor time and effort, improve test security and offer students a testing environment with a greater variety of question types and testing formats. They also offer immediate test scoring, feedback options and more reliable and valid information on students and test items.

Desirable features of CAT systems

The following three principles are suggested by Anderson and Trollip (1982) as the basis for designing CAT systems:

ensure easy access to needed information
maximise user control
install safety barriers and nets

These principles apply equally to both instructors and students. The first principle can be exemplified by the need for instructors to access test results and the need for students to get directions on how to use the testing system. Examples of the second principle are the instructor's ability to alter parameters such as type of feedback and item randomisation and the students ability to change answers at any time. Examples of the third principle are the need to confirm significant decisions such as an instructor's indication that records are to be deleted or a student's indication that they wish to finish the test.

Instructor options

CAT systems should allow instructors to set a range of parameters such as student access, time limits, the pass mark, resources allowed and the availability of student options such as previewing, marking for review and sample question practice. Instructors should also have an option which enables them to "test the test" at any time (without having to enter the system as a student) in order to ensure that the items are correct and that the test is working properly. No records should be kept by the system in this case. Instructors should have easy access to detailed results for each student, summary statistics for the test and student comments. The results of statistical item analysis procedures should also be available to enable item performance as well as student performance to be studied. The capability of printing this information for instructors is desirable.

Item banks

A major advantage of CAT is the capability of having item banks of questions from which the actual test questions can be easily drawn. This allows the creation of different but statistically equivalent tests, a useful feature when students are taking a test in close proximity or when they are taking a test at different times. Since CAT systems offer the possibility of constructing tests by randomly selecting items from item banks, students who fail a test may see some of the same items again in another attempt. This problem is more serious when feedback is given after each question. To overcome this, CAT systems should allow instructors to select a specific item bank, or set of item banks, from which the items for a test will be drawn. Further, after the test items have been determined, the order of presentation of questions should be under instructor control with a randomising option.

Network environments

In network environments instructors should be able to view status information (eg, questions answered, time remaining) on all students and be able to communicate to students via electronic mail. This latter facility is particularly useful if an error in the test is detected and the instructor wishes to notify all students. Also, circumstances arise where instructors need to be able to delete or manipulate student records. Appropriate warnings ("safety nets") are necessary if such manipulations will permanently alter stored information.

Prior to testing

Before the testing process commences, students should be presented with clear directions concerning the use of the computer and the use of the testing system. This type of help should also be available during the testing process. The basic requirements for this stage are that the system be easy to use and that it be obvious how to start the test. Students should be presented with information such as the required passing score, the resources permitted (eg, books, clocks, etc), whether sample (practice) questions are available, whether the test questions can be previewed and how much time is allowed for both previewing and testing. The opportunity to practise on sample questions prior to taking the actual test is a particularly important feature for students not familiar with the type of computer based testing system and/or the type of interactions that will be required of them during the test. This feature helps to ensure a positive reaction to the test by students as does the implementation of appropriate "safety barriers and nets" such as the prevention of an accidental start (or finish) to a test by requiring confirmation of the relevant actions.

Screen design

Once the test begins, the student should be presented with a carefully designed screen showing the first question, relevant status information (such as question number and time remaining) and navigation controls. Well designed CAT systems usually partition the screen area to accommodate this type of information, taking account of the fact that while content related to a particular test item will change for each question, the basic layout of the status and navigation information will remain constant. Regardless of the exact nature of the screen layout used by a particular CAT system, it should remain consistent throughout the test.

Changing responses

The procedures for responding to test questions should be flexible enough to allow students to change their response or to not respond at all. Two situations arise with respect to changing responses: the first occurs when the student simply alters a response while the question is still on the screen; the second occurs when a student wishes to return to a question answered earlier and change their response. The first situation should always be possible whereas the second should be possible provided the CAT system has not been configured by the instructor to allow feedback after each question. An inability to change answers at all can be unfair to students and can affect the reliability and validity of the test. It is worth noting that allowing multiple attempts at questions makes it difficult to use adaptive testing where item presentation is dependent on previous responses.

Judging responses

When judging responses, format errors such as entering a numerical choice that does not exist in the question, clicking on an inappropriate area of the screen or dragging an object to an "out of bounds" area need not be accepted, thus allowing the student another response. This is a further example of an advantage CAT has over traditional testing; in the latter, inappropriate responses are only detected at the marking stage and must then be recorded as wrong answers.

Feedback

CAT systems should allow the instructor to decide whether feedback is to be provided or omitted, and if provided, the type and timing of the feedback to be used. It should be possible to set the type of feedback (eg, indicating correctness or providing an explanation) as well as the timing of the feedback (eg, after each question, on completion of the test or not at all). Status information such as the current question number, the number of questions answered and the time remaining should be constantly updated and always visible. However, the displaying of status information such as the progressive test score should be determined by the instructor.

Navigation

Navigation through the test questions should, in general, not be constrained, with students having the ability to move easily through the questions in order (forward or reverse) or by directly "jumping" to a nominated question. It is also desirable to allow students to "mark" questions for return at a later date without having to search the whole test for them. Such flexible navigation must, however, be compatible with the level of feedback being given.

Unrestricted movement between questions, with the corresponding ability to change previous responses to questions, is clearly not compatible with the giving of feedback after each question. While flexible navigation options are generally desirable for standard CAT they are not suited to special types of computerised testing such as adaptive testing where the set of items which constitute the test is not predefined and is determined as the student progresses, the items chosen being determined by the student's performance level.

Student comments

An option for enabling students to make comments at any time, which can be later read by the instructor, enables the gathering of useful information and allows students to express frustrations, raise objections and make complaints. Such comment files should be routinely checked by instructors and may in some cases affect final marks (for example, a comment may bring to light the fact that a particular question was ambiguous).

Termination

in addition to the automatic termination of a test based on the expiration of the time available, tests should also be able to be terminated at any time by either the student or the teacher. Teacher initiated termination may be necessary for several reasons (C91 when a student is found to be cheating) and student initiated termination is necessary to enable students to finish when they wish to do so (this is essential in situations which allow students to "browse" through questions in a flexible navigation environment). If termination is student requested the testing system should provide information on questions not answered and questions marked for review before asking the student to confirm termination. The likelihood of accidental termination by either the student or the teacher should be reduced by always requiring a confirmation procedure.

Data gathering

No termination procedures should result in the loss of data. To ensure this and to help overcome the disadvantages associated with the dependence of CAT on machinery and power availability, student and system files should be updated after each response so that no data is lost if the test is terminated because of machine or power failure. Further, the system should be capable of resuming at the point of termination with all aspects of the original environment restored (eg, the clock showing the time remaining). As a minimum the system should collect the following data for each test item: the question number, an item objective code, the student's response, the correctness of the response and the correct answer.

Time limits

While time limits are often appropriate for norm referenced or power tests, time limits which may affect performance are not generally recommended for criterion referenced or mastery tests. In tests with time limits a warning should be given as the time limit approaches.

After the test

After the testing process has concluded, students should normally be presented with a screen detailing their performance on all questions, together with an overall performance indicator. The capability of printing out this information for students is desirable. However, as with other options, the performance data to be shown and printed should be determined by the instructor. The option for enabling students to make comments should remain active after the test to allow student comments of a more general nature. Information concerning relevant online and offline learning resources should be accessible to students who have just completed the test and seen their results.

Security

Access to the system should only be possible by authorised instructors and students. Students should only be able to view data on their own performance and perhaps an indication of relative class standing. These requirements are relatively easy to meet with a network of microcomputers but for CAT using floppy disks, data encryption methods may be required to ensure security. Any system should attempt to ensure that a given student takes the right test and that the right student takes the test. The former task is not difficult and is usually handled by asking students for their name and/or an identification number. The latter task, which equates to cheating, is more difficult to handle without some form of human intervention. For students doing the test on site and under supervision, the procedures are the same as for a conventional test. If students are taking the tests at remote locations some form of human supervision is normally required.

Research

A review of the literature

One concern that arises with the advent of computer administered testing is the possibility that the mode of testing influences how students perform on tests. Studies that have investigated differences in examinee performance on items administered in pen and paper form and computer based form have produced equivocal results (Spray, Ackerman, Reckase & Carlson, 1989) with most studies showing no overall influence (for example, Olsen, Maynes, Slawson & Ho, 1986), some indicating positive effects (for example, Moe & Johnson, 1988) and some finding negative effects (for example, Dimock, 1991), with the positive and negative effects sometimes being associated with student characteristics such as age, gender, academic ability and computer experience.

A review of the literature has revealed that most research in this area has involved either non-conventional testing (such as admissible probability measures testing, adaptive testing and related matters concerning item response theory) rather than conventional testing, and yet most of the CAT systems in use are based on conventional testing procedures (Bugbee & Bernt, 1990). Almost all CAT systems described in the literature have been characterised by the exclusive use of multiple choice test items and all have failed to make optimal use of the multimedia and hypermedia capabilities that are now possible using standard hardware and software systems.

According to Spray et al (1989),

Future studies involving computer administered test taking should incorporate those suggestions given in APA's (1986) "Guidelines", especially those sections pertaining to item administration considerations, because it would appear that these variables are at least part of the cause of significant score differences across item presentation media (p.270).

Very few CAT systems in common usage follow the spirit of the APA's Guidelines developers in terms of item administration procedures. For example, the Guidelines state that "test takers should be able to verify the answer they have selected and should normally be given the opportunity to change it if they wish" (p. 17). In most studies, students taking the computer administered versions were only able to see an item once in a "single pass, no return" mode. For example, in a study by Moreno, Lee and Sympson (1985) "it was not possible for examinees to refer to previous items or to change an answer once the answer had been recorded by the computer" (pp.5-6). This flexibility was present in the paper based version. Similarly, Eaves and Smith (1986) in their study required the computer group "to deal with only one stimulus item at a time and, once examinees responded to an item, they were unable to change the response. In addition, examinees could not scan the entire test or otherwise skip items" (p.24). Again, the pen and paper group were allowed to move freely from item to item, to change presentation order, to review items and to change answers.

A few studies (see, for example, Spray et al, 1989) have attempted to make the computerised format for taking tests identical with the conventional paper based format. While some have done better than others in this respect, all have fallen short of fully achieving this objective. For example, none of the CAT systems had provided students with the ability to jump from any question to any other question instantly. Further, the items were almost always presented exactly as they appeared in the paper format (usually in multiple choice format, without graphics), an approach which, while it may be useful for statistical comparisons, fails to utilise the unique capabilities of the computer in the provision of potentially more valid question types for testing the same objectives. Of the many research studies using CAT systems which have been found in the literature, none has employed a computer based testing environment with a format as flexible, or interactions as sophisticated, as the type being developed.

Question types

An exciting characteristic of CAT is the opportunity it provides for testing through innovative ways of presenting and answering questions. Some valuable forms of interaction available in computer based environments are not possible in traditional testing situations. For example, answers to questions may be given by one or more of the following means: "text entry" (typing on the computer keyboard), "click/touch" (using a mouse or touch screen to identify areas), "move object" (directly manipulating objects on the screen using a mouse or finger), "pull down menus" (selecting from items in a temporary list that overlays the screen on demand), "key press" (pressing a key on the computer keyboard), and "preset simulations" (computer simulations of an action observed by students as an integral part of a test question). The use of computer animation techniques in testing is supported by Hale, Okey, Shaw and Bums, (1985) because

The computer has the potential to make vivid, situations that can only be statically (sic) portrayed in a paper and pencil test. Even use of pictures to accompany questions may do little to portray dynamic events involved in a problem (p.83).

The potential to improve test items by using the computer's interactive and multimedia capabilities to the maximum has led to the use of many creative testing strategies in the prototype CAT system already developed. For example, the use of constructed response items based on simulation techniques offer students tasks which are more realistic and closer to those they encounter in education and work settings. This should enhance the face validity of such items; that is, the perception among subject matter experts and test takers that such test items are better measures of the test objectives than the corresponding paper

Computer administered testing in an IMM environment

based multiple choice items. Although such items may measure somewhat different skills than their multiple choice counterparts (Ward, Frederiksen & Carlson, 1980), they may offer a "window" into the processes used to solve particular problems (Birenbaum & Tatsuoka, 1987) and they may better predict some aspects of educational performance (Frederiksen & Ward, 1978). The potential of these items to test recall rather than recognition is another good reason to explore their role in computer based assessment. The development of non-multiple choice items which can be reliably and accurately scored by a computer makes it possible to broaden the scope of standardised testing as well as diagnostic testing. This issue is becoming increasingly important as the emphasis continues to shift towards interactive, computer based assessment and performance based testing.

According to Fletcher and Collins (1987), the most frequently quoted disadvantage of paper tests is delayed feedback on scores and incorrect responses, whereas the immediacy of feedback on scores and incorrect responses were among the most commonly cited advantages of CAT systems. Research into the effects of "correctness" (right/wrong) feedback has been conducted but the results have not been conclusive (Wise & Plake, 1990). Several studies (for example, Rocklin & Thompson, 1985) have indicated that such feedback results in higher scores and lower test anxiety while others (for example, Wise, Plake, Pozehl, Barnes & Lukin, 1989) have found evidence of increased anxiety and lower test scores. Further research of the type proposed in this study is needed, not only to establish whether "correctness" feedback has a facilitative effect on test performance, but also to examine the differential effects of various types of feedback such as giving the correct answer, giving an explanation and showing a progressive score on the screen. Ring (1992) makes the point that the flexible navigation characteristics recommended in the APA Guidelines (1986) must be compatible with the level of feedback being given. For example, unrestricted movement between questions, with the ability to change previous responses, is clearly not compatible with the giving of feedback after each question.

Pilot study

Research involving a prototype version of the hypermedia/multimedia CAT system is being conducted at Edith Cowan University. The test content is based on existing pen and paper tests used at ECU to assess the mathematical competency of preservice primary teacher education students (Bana & Korbosky, 1983, 1984). A prototype 71 item version of the CAT system, enhanced with data capture routines, was completed for a pilot research study conducted towards the end of 1993.

The pilot study focussed on the following issues:

Differences between the CAT system and a near equivalent pen and paper test of mathematical competencies in relation to student performance on the tests, attitude and anxiety levels (computer use and mathematics), and the test mode preference of students.
Relationships between student attitude and anxiety levels, test performance, test mode preference and prior computer experience.
Differences in error patterns for mathematics test items in the CAT system compared with corresponding error patterns for items in the near equivalent pen and paper test.
The validity and reliability of the items in both versions of the test with a particular emphasis on those items unique to the computer based environment.

Figure 1: The model for administering the pen
and paper and the computer based tests

The subjects of the CAT project consisted of two classes (n = 44) of first year BA (Primary Teaching) students. As part of the CAT project, "CAT awareness sessions" were conducted with all subjects prior to the administration of the following two tests: a 71 item computer based test and a 71 item near equivalent pen and paper test. Half of the students were given the pen and paper test first, followed a week later by the computer based test, and half were given the pen and paper test a week after the computer based test, as illustrated by Figure 1.

Conclusion

Tests are likely to continue to remain a commonly used means of assessing learning. The development of non-multiple choice items which can be reliably and accurately scored by a computer makes it possible to broaden the scope of standardised testing as well as diagnostic testing. This issue is becoming increasingly important as the emphasis continues to shift towards interactive, computer based assessment and performance based testing in simulated environments.

Well designed CAT systems can provide relief to instructors and may well provide higher quality tests to students. Given the lack of guidance available from the existing research literature, the varying purposes of tests and the differing goals of instructors, the major implications for CAT system developers are threefold: (a) students should not be disadvantaged in comparison with non-computer based testing; (b) full use should be made of the computer based medium; and (c) instructor control over the testing environment should be maximised.

It is hoped that the research information and technical knowledge which will result from the project described in this paper will provide a sound basis for the use of such systems in a wide range of educational contexts, as well as in other professional, commercial and industrial settings.

References

American Psychological Association Committee on Professional Standards (COPS) and Committee on Psychological Tests and Assessment (CPTA) (1986). Guidelines for computer based tests and interpretations. Washington, DC: COPS/CPTA.

Anderson, R. & Trollip, S. (1982). A computer based private pilot (airplane) certification examination: A first step towards nationwide computer administration of FAA Certification exams. Journal of Computer Based Instruction, 8(3), 6570.

Bana, J. & Korbosky, R. (1983). A comparative study of the geometric concepts of Year 7-8 pupils and preservice primary teachers. Paper presented at the Annual Conference of the Mathematics Education Research Group of Australasia. Perth, May 1983.

Bana, J. & Korbosky, R. (1984). Diagnostic study of measurement concepts of twelve year old pupils and preservice teachers. In the Proceedings of the Fifth International Congress on Mathematical Education. Adelaide: ICME.

Birenbaum, M. & Tatsuoka, K. (1987). Open ended versus multiple choice response formats: It does make a difference for diagnostic purposes. Applied Psychological Measurement, 11, 385-395.

Bugbee, A. C. Jr. & Bernt, F. M. (1990). Testing by computer: Findings in six years of use 1982-1988. Journal of Research on Computing in Education, 23(1), 87-100.

Dimock, P. H. & Cormier, P. (1991). The effects of format differences and computer experience on performance and anxiety on a computer administered test. Measurement and Evaluation in Counselling and Development, 24(3), 119-26.

Eaves, R. & Smith, E. (1986). The effect of media and amount of microcomputer experience on examination scores. Journal of Experimental Education, 55, 23-26.

Frederiksen, N. & Ward, W. (1978). Measures for the study of creativity in scientific problem solving. Applied Psychological Measurement, 2, 1-24.

Fletcher, P. & Collins, M. A. J. (1987). Computer administered versus written tests: Advantages and disadvantages. Journal of Computers in Mathematics and Science Teaching, 6(2), 38-43.

Hale, M., Okey, J., Shaw, E. & Burns, J. (1985). Using computer animation in science testing. Computers in the Schools, 2(1), 83-90

Moe, K. C. & Johnson, M. F. (1988). Participants' reactions to computerised testing. Journal of Educational Computing Research, 4(1), 79-86.

Moreno, K., Lee, J. & Sympson, J. (1985). Effects of computerised versus paper and pencil test administration on examinee performance. (Available from Navy Personnel Research and Development Centre, San Diego, CA 92152)

Olsen, J., Maynes, D., Slawson, D. & Ho, K. (1986). Comparison and equating of paper and pencil, computer administered and computerised adaptive tests of achievement. Paper presented at the annual meeting of the American Educational Research Association, San Francisco, CA.

Ring, G. (1992). Computer administered testing. In Proceedings of the Second International Conference on Information Technology for Training and Education, 502-508. Brisbane: University of Queensland.

Rocklin, T. & Thompson, J. (1985). Interactive effects of test anxiety, test difficulty and feedback. Journal of Educational Psychology, 77, 368-372.

Spray, L, Ackerman, T., Reckase, M. & Carlson, J. (1989). Effect of the medium of item presentation on examinee performance and item characteristics. Journal of Educational Measurement, 26(3), 26 1271.

Ward, W., Frederiksen, N. & Carlson, S. (1990). Construct validity of free response and machine scoreable forms of a test. Journal of Educational Measurement, 17,11-29.

Wise S. & Plake, B. (1990). Computer based testing in higher education. Measurement and Evaluation in Counselling and Development, 23(1), 3-10.

Wise, S., Plake B., Pozehl, B., Barnes, L. & Lukin, L. (1989). Providing item feedback in computer based tests: Effects of initial success and failure. Educational and Psychological Measurement, 49, 479-486.

Author: Geoff Ring
Chairperson, Department of Computer Education,
Edith Cowan University, Mount Lawley Campus
2 Bradford Street, Mount Lawley, Western Australia 6050
Tel. +61 8 9370 6369 Fax. +61 8 9370 2910
Email: g.ring@cowan.edu.au
Please cite as: Ring, G. (1994). Computer administered testing in an IMM environment: Research and development. In C. McBeath and R. Atkinson (Eds), Proceedings of the Second International Interactive Multimedia Symposium, 478-484. Perth, Western Australia, 23-28 January. Promaco Conventions. http://www.aset.org.au/confs/iims/1994/qz/ring1.html

[ IIMS 94 contents ] [ IIMS Main ] [ ASET home ]
This URL: http://www.aset.org.au/confs/iims/1994/qz/ring1.html
© 1994 Promaco Conventions. Reproduced by permission. Last revision: 15 Feb 2004. Editor: Roger Atkinson
Previous URL 18 Aug 2000 to 30 Sep 2002: http://cleo.murdoch.edu.au/gen/aset/confs/iims/94/qz/ring1.html