ASET 2002: Dowsing and Long- instructions for IT skills exercises and specifications for assessment tests

Transforming instructions for IT skills exercises into specifications for assessment tests

Roy D. Dowsing and S. Long
School of Information Systems
University of East Anglia

The automated assessment of authentic Information Technology skills relies on testing candidate's output for conformity with the operations which should have been performed during the test. Since a test is unique, test specifications are unique to each test. This paper describes transformations which can be applied to the instructions given to the candidate to generate the test specifications which drive automated IT skills assessors.

Introduction

The use of Information Technology (IT) has become ubiquitous and thus the testing of IT skills both for educational and vocational use is becoming commonplace. To improve the management of the assessment process, the cost of assessment and the time to assess, more of the assessment of such skills is becoming automated (Blaney,1995) (Kennedy, 1999) (Dowsing et al., 2000).

There are many different levels of assessment of IT skills from the most basic function tests, for example, testing that the candidate knows how to use single functions of a particular IT tool, such as selecting a menu item, to authentic assessment (Fletcher, 1992) where the candidate undergoes a test which mimics a typical task which might be undertaken in the workplace, such as generating a memo or querying a database. Function tests are relatively simple to automate and are frequently used for formative assessment in training packages, for example, (SkillCheck, 2002). Authentic assessment tasks are much more difficult to assess mainly because the tasks are at a much higher level which allows the candidate much more flexibility in producing the answer. In fact the major problem with generating an assessor for authentic assessment is the ability to cope with all possible errors a candidate may make, that is, the assessor must be able to 'correctly' assess all possible candidate input. Here 'correctly' means to produce the same result as would be produced by an 'ideal' human examiner.

The model of assessment used here (Dowsing et al., 1999) consists of three phases.

Compare the candidate answer to one or more model answers. Differences between the candidate and model answer indicate possible errors.
Classify the errors using the assessment criteria.
Collate the classification of the individual errors and produce the overall assessment.

This paper is concerned with the assessment of authentic IT skills and in particular with the generation of the information required for the second phase of the assessment; the classification of errors. This work is part of an on-going research and development project, undertaken with a national UK Examinations Board, to produce automated assessors for a range of IT skills at varying levels. There are several different levels of tests with the intention that students can improve their IT skills by gradually progressing through the series of tests. The simplest tests involve candidates being asked to carry out specific modifications to a document such as inserting some new information in a specific position whereas higher-level tests give the candidate more flexibility such as asking them to insert a date in the document without specifying exactly where. In these latter tests the assessment is more complex since there are many more allowable 'correct' answers. Thus some of the skills tested are purely technical - does the candidate know how to 'cut-and-paste' information - whilst others are concerned with applying these skills to typical workplace tasks. An example of a lower-level spreadsheet test is given as the example later in this paper. In these IT examinations the tests taken by candidates are changed frequently and this involves generating new 'correct' answers and working out which parts of the candidates answers have to be tested for the separate skills tested. Generating this information by hand is error prone and hence automatic generation of this information from the examination instructions would improve the assessment process. This paper describes the process of automatic generation of this information and an example is given illustrating how the test specifications are generated from the instructions given to the candidate.

Visibility and assessment

In order to assess whether a candidate has performed an operation correctly, the result of the operation must be visible. Here visibility means that the assessor must be able to determine from the output produced whether the operation has been performed correctly. For example, if a candidate fails to insert an item, its absence has to be detected to correctly assess the operation. This requirement means that capturing the system state after every operation or capturing each operation as it is performed will enable later analysis to assess each action. Since capturing user operations which identify the functions invoked is operating system and IT tool dependent, virtually all authentic IT skills assessment captures system state rather than function operation. Instead of capturing system state after every operation, which is both time consuming and storage intensive, it can be captured less frequently since relatively few operations interact and thus the result of an operation will be visible after several following operations have been performed. Thus, typically, when a candidate is undertaking an authentic skills assessment exercise he/she will be asked to store the system state at intervals through the test to ensure that information which needs to be assessed is visible in at least one of the saved states. A typical multi-stage exercise will consist of a set of instructions interspersed with state saving instructions.

The problem

In the first phase of the assessment the candidate attempt is matched with one or more model answers to generate the 'best' match, where 'best' is defined as being the closest match to that which would be produced by an ideal human being. In the second phase of the assessment, sets of assessment criteria are applied to regions of the candidate document which correspond to operations which the candidate has been asked to perform. Consider the following example:

The candidate is asked to insert the text "red" before "kangaroo" in the sentence " One fine day Jane took her kangaroo for a walk."

The model answer would be "One fine day Jane took her red kangaroo for a walk".

In the matching phase, phase 1, the candidate attempt would be matched to this string. Assume that the candidate string was "One fine day Jane took her grey kangaroo for a walk". The matching process would show "red" unmatched in the model and "grey" unmatched in the candidate attempt.

In phase 2 errors in phase 1 are classified. Assume one of the assessment criteria is "Failure to insert text as specified is counted as 1 text error". In order to apply this rule the position in the text to be inserted in the model answer has to be noted, together with the text. The matching process generates the correspondence between items in the model and candidate answers thus allowing phase 2 to relate positions in the model to positions in the candidate answer. The classification test for this example is to apply the test for the correct insertion of the word "red" at the position immediately before "kangaroo" in the candidate answer. This is coded, for example, as [Apply criteria test n] [position x] [text], which is generic so that different examination papers can test different sets of tasks.

One of the inputs to phase two of the assessment consists of a set of specifications detailing what criteria tests to apply to the candidate answer corresponding to given regions of the model answer. Traditionally, such information is generated by a human being - the examiner - and used by human markers to assess candidate attempts. However such generation is error prone and thus some means of automating the generation of this information would help reduce assessment errors.

Transformations on the exercise

Initially it appears possible to use the exercise instructions, one at a time in the order given in the test to assess the answer. This is how function tests operate. Such tests examine each answer after the candidate has taken the appropriate action. However this approach does not work for tests where the result(s) presented corresponds to the application of a set of instructions. In such cases the instructions given to the candidate need to be transformed into a different order to be used as test specifications in the assessment process. The transformations required can be classified into 3 types:

Model independent transformations
These transformations are transformations required irrespective of the assessment model used. The addressing of regions given in the instructions to the candidate refer to the current state of the document whereas the test specifications refer to the saved state of the document. Thus if a stage of an examination contains insert or delete instructions, instructions before these operations will need to have their region addresses adjusted in the corresponding test specification.
Example: Consider the following fragment of a database test
1. Edit record 15 to show 9 birds bought.
2. Delete record 10.
3. Save your work in a file named "Part1".
The saved file will contain the edited record as record 14 and thus the assessment must be performed on this record rather than record 15 as given in the instruction.
Model dependent transformations
These transformations depend on the model of assessment employed. Some assessment schemes assess instructions as late as possible in a multi-stage test to allow the candidate to correct any mistakes as late as possible. This means that the ordering of the test specifications can be completely different from the instructions.
Example: Consider the fragment of instructions below:
Place the number 2 into spreadsheet location A4
Save the state to file A
Place the string "abcd" in spreadsheet location B3
Save the state to file B
In this example file B will contain the desired values in cells A4 and B3 and thus the value in both of these cells can be tested in file B, that is, the order of operations in testing is different to that in the original exercise. In fact, in this case, the saving to file A is not required since all the information is available in file B. The rule for moving an operation across a save operation is that an operation can be moved provided the result of the operation is visible at the next save operation. Since IT tools use overwriting semantics, this rule means that as long as there is no operation on the same region in the following stage, the operation can be moved after the next save operation. This rule can actually be refined since some operations on a region conflict - overwrite a property or attribute of the item - whereas others do not. For example, the instructions referring to A4 in
Place the number 2 into cell A4
Save the state to file A
Format the contents of cell A4 as bold
Save the state to file B
do not conflict since operations on values and format are orthogonal. Hence the testing of the value and formatting can both be performed on file B.
Another model dependent transformation required in the particular tests investigated concerns replicate instructions. Such instructions ask the candidate to insert a formula into a cell and then to replicate that formula across the row or down the column. The test for replication has to be performed before the test for the original formula because of the technique used to remove knock-on or dependency errors (Blaney, 1995). A formula in a cell may give an incorrect value if any of the dependent cells contain incorrect values. The normal assessment rule is that errors should only be penalised once, that is, knock-on errors should not be penalised. One method of ensuring this is to assess independent cells first and to change incorrect values to correct values after assessment. By this means dependent cells will always see correct values in the cells they depend on and any errors in the dependent cell are due to an incorrect formula. For the replicate action, the replication must be tested before the formula insertion since if the formula value is incorrect it will be replaced by the correct value and this is not the formula which the candidate has replicated. Thus any replicate operations have to be reordered in the test specifications.
Assessor dependent transformations
In addition to the assessor independent transformations outlined above there may be transformations required which are assessor specific. For example, specific data needed for the assessor checks can be included in the test specifications. Alternatively, that information is also available in the model answer. The instructions in the exercise contain specific data and if the assessor uses information in the model answer then this data has to be removed from the test specification.
Examples of other assessor specific transformations include formatting of the specifications and adding parameterless tests which are always required, for example, tests for visibility of all information on a spreadsheet by checking column widths.

Practical implementation

The particular examination scheme used for this work is the CLAIT (Computer Literacy and Information Technology) series of IT Skills exercises operated by OCR Examinations in the UK (CLAIT, 1998). These tests cover a wide range of IT skills ranging from word processing, the use of spreadsheets and databases to web page creation and the use of email. There are several levels of test but the structure of the tests is similar; only the difficulty differs. Each test presents the candidate with a sequence of instructions, each associated with one or more assessment criteria, to obey in order to complete the test. At intervals through the test the candidate is required to save the current state of the exercise to a file on disk and it is these files which are assessed either by human examiners marking printouts or by an automated assessor marking file contents. The examinations consists of two or more sections, one data entry and the others data manipulation. For example, in the first section of the spreadsheet test the candidate is asked to input data into a spreadsheet and set up formulae in some cells. In the second section the candidate is asked to make edits, to add a row/column, to delete a row/column and to format data items.

The individual instructions given state the operations to perform on a range of cells in the spreadsheet. To discover whether there is any overlap between the range in one instruction and another involves performing set intersection on the two ranges. Whilst this is conceptually simple, the implementation can become complicated since the two ranges can not only fully overlap or be disjoint but may also partially overlap in many different ways. It is easier to convert each instruction on a range of cells into a set of operations on single cells. The ranges of instructions now completely overlap or are disjoint. This is used to perform the model dependent transformations described above.

In the first phase of the transformation each instruction on a range of cells is converted into a set of specifications on single cells. At the same time the position of insert and delete instructions is determined and these instructions are decoded to determine whether they apply to rows or columns and to which row or column. This information is used in the second pass through the data to adjust cell addresses as the delete and insert instructions are moved to fixed positions in the test specification stream.

In the second phase, a copy of the insert test is added to the beginning of the first save section since the assessment of this examination penalises candidates who perform the insert too early. The insert specification in the second section is moved to the front of the section and the delete specification is moved to the end with consequent adjustment to the addresses of all the instructions in this section. The reason for this is that in this configuration the addresses of cells in section 1 and section 2 are directly comparable, that is, information will reside at the same cell address in both sections and hence overlap comparison is straightforward.

In the third phase of the algorithm each individual test specification in section 2 is compared with every specifications in section 1. If the section 2 specification can overwrite a section 1 specification then the section 1 specification is marked as unmoveable, otherwise it is marked as moveable.

In phase four the individual specifications are moved to their required position. Firstly, those section 1 specifications which are concerned with format are moved to the front of section 1, followed by the rest of the specifications which cannot be moved from section 1. All the other specifications go in section 2. The format specifications which can be moved from section 1 and those from section 2 are moved to the front of section 2. Next the replicate specifications are moved after the format specifications and lastly the remainder of the specifications which have not been dealt with are moved to the end of section 2.

As a final operation, the delete specification is moved from the end of section 2 to the front of section 2, with the appropriate adjustments being made to the ranges of the specifications traversed. The reason for this movement is that the output saved in section 2 has the delete operation performed so specifications in section 2 need to assume that the delete operation is performed first.

Example

To illustrate the process of transforming a set of examination instructions into a test specification, consider the following simplified examination:

Spreadsheet Examination Paper

Create a new blank spreadsheet. Enter the following headings in the first row of the spreadsheet:

NAME
MAY INCOME
JUNE INCOME
TOTAL INCOME [Assessment Objective 2a]
Under NAME, MAY INCOME and JUNE INCOME insert the
following data:

Margaret Jones
Philip Long
John Smith 1047
234
1256 245
165
2322 [Assessment Objective 2a]
In the TOTAL INCOME cell for Margaret Long insert a formula to
compute the total income for May and June.

[Assessment Objective 3a]
Fill in the TOTAL INCOME cells for all the other people by
replication of this formula.

[Assessment Objective 3b]
Save your work in a file called stage1
Ensure all the headings are left justified.

[Assessment Objective 4a]
Delete the row for Philip Long

[Assessment Objective 2c]
Insert a new column between JUNE INCOME and TOTAL INCOME
called JULY INCOME. Populate the column with the following data:

For Margaret Jones
For John Smith 1673
231

Ensure that the formulae for TOTAL INCOME are updated to include the new figures.

[Assessment Objective 2b, 2a, 3c]
Save your work to a file called stage2

Coded Paper: Results of transformation to testing specification
Section 1	Section 2	Section 1	Section 2
"A1" "2a"	"A1:D1" "4a"	"D1","2b"	"D1","2b"
"NAME"	0	"A3","2c"	"A3","2c"
"B1" "2a"	"A3" "2c"	"E3","3b"	"A1","4a"
"MAY INCOME"	"D1" "2b"	"E4","3b"	"B1","4a"
"C1" "2a"	"D1" "2a"	"A3","2a"	"C1","4a"
"JUNE INCOME"	"JULY INCOME"	"B3","2a"	"E1","4a"
"D1" "2a"	"D2" "2a"	"C3","2a"	"E3","3b"
"TOTAL INCOME"	1673	"E2","3a"	"E2"
"A2" "2a"	"D3" "2a"		"E4" "3b"
"Margaret Jones"	231		"E2"
"A3" "2a"	"E2" "3a"		"A1","2a"
"Philip Long"	"=SUM(B2:D2)"		"B1","2a"
"A4" "2a"	"E3" "3b"		"C1","2a"
"John Smith"	"E2"		"E1","2a"
"B2" "2a"			"A2","2a"
1047			"A3","2a"
"B3" "2a"			"B2","2a"
234			"B3","2a"
"B4" "2a"			"C2","2a"
1256			"C3","2a"
"C2" "2a"			"D1","2a"
245			"D2","2a"
"C3" "2a"			"D3","2a"
165			"E2","3a"
"C4" "2a"
2322
"D2" "3a"
"=SUM(B2:C2)"
"D3:D4" "3b"
"D2"

Conclusions

The results here and further tests have shown that this approach to the automatic generation of test specifications for spreadsheet and database skills assessment works well. However, as described above, some of the transformation is model specific and some is assessor specific, thus the actual transformation software is specific to a particular examination. Future work will investigate the possibility of producing a generic transformer which can be driven by the model definition and by the assessor interface definition.

References

Blayney, P. (1995), Use of a spreadsheet based marking system for assisted learning and assessment. Proceedings of the 12th Annual Conference of ASCILITE, 245-256. [verified 13 Aug 2002] http://www.ascilite.org.au/conferences/melbourne95/smtu/papers/blayney.pdf

CLAIT (1998). Tutor's Handbook and Syllabus, 3rd Edition, L706, OCR, Coventry, October.

Dowsing, R.D., Long, S. and Sleep, M.R. (1999). Assessing word processing skills by computer. Information Service and Use. IOS Press, Amsterdam, 15-24.

Dowsing, R.D., Long, S. & Craven, P. (2000). Electronic delivery and authentic assessment of IT skills across the Internet. International Conference on Advances in the Infrastructure for e-business, Science and Education on the Internet, SSGRR, L'Aquila, Proceedings on CD-ROM.

Fletcher, S. (1992). Competence-based Assessment Techniques. Kogan Page, London.

Kennedy, G.J. (1999). Automated scoring of practical tests in an introductory course in information technology. Computers and Advanced Technology in Education (CATE99), Cherry Hill, New Jersey, USA, IASTED/Acta Press.

SkillCheck, HR Press Software. http://www.individualsoftware.com/ [verified 13 Aug 2002]

Authors: Roy D. Dowsing and S. Long, School of Information Systems, University of East Anglia, Norwich NR4 7TJ, UK. Email: rdd@sys.uea.ac.uk
Dr Roy Dowsing has been a Senior Lecturer in the School of Information Systems at the University of East Anglia since 1979. In the last ten years his research interests have been almost exclusively in the field of automated assessment of IT skills, funded by the Higher Education Funding Councils and a major UK Examinations Board.

Please cite as: Dowsing, R. D. and Long, S. (2002). Transforming instructions for IT skills exercises into specifications for assessment tests. In S. McNamara and E. Stacey (Eds), Untangling the Web: Establishing Learning Links. Proceedings ASET Conference 2002. Melbourne, 7-10 July. http://www.aset.org.au/confs/2002/dowsing.html

[ ASET ] [ Proceedings Contents ]
This URL: http://www.aset.org.au/confs/2002/dowsing.html
Created 10 Aug 2002. Last revision: 10 Aug 2002.
© Australian Society for Educational Technology