EdTech'88 Proceedings: Magin and Churches - what do students learn from self and peer assessment?

What do students learn from self and peer assessment?

Douglas Magin
Tertiary Education Research Centre
University of New South Wales
and
Alex Churches
School of Mechanical and Industrial Engineering
University of New South Wales

This paper reports a study of the outcomes of students' work in self and peer marking of an examination in mechanical engineering design at the University of New South Wales. In addition to teachers' marking, students marked their own script, and another randomly assigned script, using a detailed marking protocol.
Data are presented on the accuracy and reliability of students' assessments; and on survey information concerning their learning experiences from the exercise. These data are analysed and discussed in relation to the learning benefits which accrue, and to the aim of developing skills of self evaluation relevant to professional practice.

In engineering and other fields of professional training there has been a concern for developing students' ability to evaluate their own work in ways which are applicable to their future professional work in the discipline. According to Boud & Lublin (1983):

One of the most important processes that can occur in undergraduate education is the growth in students of the ability to be realistic judges of their own performance and the ability to monitor their own learning... If students are to be able to continue learning effectively after graduation and make a significant contribution to their own professional work, they must develop these skills of appraising their own achievements and that the foundation for this should occur at the undergraduate level, if not earlier (p.3).

This belief in the importance of the developing skills of self evaluation during undergraduate education is also shared by graduates. In a survey of 1842 graduates from the University of New South Wales (Midgley & Petty, 1983), respondents were asked to rate the importance of the acquisition of different skills as part of undergraduate study, and the extent to which their university education contributed to this. From a list of nine different skills, evaluating one's own work' was listed as the second most important skill (after problem solving). However, only 20% thought that their course of study had made a 'considerable contribution', and a further 53% claimed 'some contribution' had been made to the acquisition of this skill.

It is our view that the ability to assess one's own work assumes particular importance for the specific task of developing student competence in engineering design work. This view is taken as a result of considering the complexities inherent in making judgements on the comparative worth of different solutions to a specific design problem.

At a basic level, design solutions can usually be checked through ensuring that supporting calculations are free from error, and through ascertaining that the design meets (or at least takes adequate account of) specified criteria. However, where judgements are required on the comparative worth of different approaches to solution, and on the different designs actually produced, the assessment task needs to encompass more complex elements such as creativity, economy and utility.

Although it may be argued that skill in design appraisal develops over time through professional practice, it is our belief that, within undergraduate study, opportunities should be provided for students to commence developing these skills of self appraisal. In particular, they should have opportunity to critically assess their own design work and that of their peers; and become conversant with and be able to apply the methods and standards of assessment employed by their instructors.

A further consideration arises through academics' concerns about assessment. Orpen (1982) found that (with the exception of tenure and salary) university lecturers indicated more concern about marking and grading than about any other aspect of their job. Their concerns related to doubts about the reliability of assessments; the inordinate amount of time spent on assessment tasks; and the belief that typically students learnt little from their marks, other than how they stood in relation to their peers.

Background

Undergraduate students enrolled within the School of Mechanical and Industrial Engineering at the University of New South Wales follow a common course in the first two years. All students are required to take design subjects in both years, as well as a core subject in Engineering Design in the third year.

The present study is based on the experiences of students who sat the 1987 Session 1 examination in Mechanical Engineering Design 2. Of the 140 who sat the exam, 87 participated in the self and peer marking exercise during tutorial classes. Self assessment methods have been employed in this subject for several years and reported earlier (Boud et al, 1986), but the procedures used in 1986 provided a substantial departure from previous years. In previous years students were required to undertake a design project for the duration of Session 2 (e.g. design of a machine assembly). This was then submitted for assessment at the end of session. In addition, students were given the tasks of using a protocol which listed a set of factors (with mark weightings) which provided criteria for marking their individual projects; and of making judgements on the extent to which their design had successfully applied each of these factors.

The assessment received for the subject embodied the mark determined by the lecturer marking the projects, and an additional 4% for completion of the student's self assessment protocol. Details of the outcomes of three years experience with this procedure are contained in Boud et al (1986).

In 1987 there were several major departures. First, in addition to marking their own papers, peer marking was introduced. Second, these marking procedures were made with respect to their formal Session 1 examination scripts in the subject, and not on a project as in previous years. Finally, the marking scheme differed in character from earlier years. Previously, the variety of approaches used in projects necessitated a marking protocol which was essentially qualitative in form although detailed in description.

For 1987, the marking schedule applied to the examination scripts was able to include details of specific steps of analysis and design details required for resolution of the design problem.

Description

The examination in Mechanical Engineering Design was a traditional three hour paper in which the students were required to answer one question, requiring the student to 'design a pedestal in malleable cast iron to support a stationary spindle relative to its support base'. A figure of the spindle and installation dimensions for the pedestal was supplied. 40% of the marks were allocated to design calculations, and 60% for the engineering drawing. A detailed marking schema was developed for both design calculations and drawing. All examination scripts were marked by the lecturing staff, using this schema. After all scripts had been assessed, students were given their individual examination papers during tutorial classes, and were asked to mark their own papers using the same marking schema employed by the staff.

Instructions were given for applying the marking schedule: for the calculations section, ten elements were identified, each assigned a maximum of 4 marks; and for the drawing section, 9 elements of varying marks totalling 60. In a number of these elements student markers were required to exercise a considerable degree of judgement. For example, design proposals which incorporated different shapes required different calculations to determine strength; also the location of critical cross sections and supporting calculations were dependent to some extent on the design chosen. In these instances there were departures from the model solution, requiring the marker to assess the calculations considered equivalent to the provided solution. For other elements in the schema, the marking task was one of identifying relevant work in the exam script and the conscientious application of marking instructions. For example, 4 marks were allocated if the drawing included 'machine specifications on the bore, top, spigot and base (one mark each)'.

After completion of the marking of their own scripts ('self assessment') they were then given another student's scrips to mark ('peer assessment'). At each stage, care was taken to ensure that no notation (ticks, question marks etc) was made on any examination paper. At the time of carrying out this task students were not aware of the mark which had been determined by staff marking.

During second session, after they had received the results of their session 1 examination (and were able to compare marks derived from lecturer, self and peer assessments), students were asked to complete a questionnaire which sought information on their experience of conducting self and peer assessment of examination scripts.

Results

The results from this study relate to two sets of issues. First, statistical analyses of data derived from the three different assessments are examined to determine the extent to which agreement exists between them.

Within the current literature on self and peer assessment there is considerable concern for determining the extent to which these forms of assessment can be utilised as reliable and valid indicators of student performance; and the conditions under which such assessments can be suitably employed.

The second issue on which results are presented relates to whether the procedures used are both acceptable to students and provide learning benefits which justify their use. Results derived from questionnaire analysis are used to address this issue.

Mark analysis

Were student marks biased?

Table 1 provides information on the averages obtained from lecturer, self and peer marking. The mean obtained from self marking is approximately equal to that obtained from lecturer marking, with peer marks appearing to be more stringent. Students, whether marking their own script or another, tend to mark within a more restricted range, as indicated by the lower standard deviations. Davis and Rand (1980) had noted a similar effect in their study of student versus instructor marks.

Table 1: Means and standard deviations of lecturer marks,
student self-marks and peer marks


	N	Mean	Standard deviation

Lecturer mark	87	54.6	15.0
Student self mark	85	54.1	12.1
Peer mark	81	49.3	13.3

Further inspection of results indicated that the restriction in range of marks awarded for self assessments resulted from a tendency for students with high marks from the lecturer to give themselves a lower self-mark, and students with low marks from the lecturer to award a higher self- mark. This phenomenon is illustrated in table 2 in which students are categorised into quartiles based on the scores obtained from lecturer marks.

Table 2: Lecturer versus self marks: differences in means based on performance quartiles


Quartile Group	Number in Quartile	Lecturer Mark (Mean)	Self Mark (Mean)	Difference of Means (L-S)

19-41 marks	21	34.4	39.8	-5.7
42-56 marks	20	49.8	51.4	-1.6
57-65 marks	22	60.5	59.0	+1.5
66-86 marks	21	73.2	64.3	+8.9

The group in the longest quartile (receiving lecturer marks between 19 and 41) provided self marks which were, on average, 5.7 marks above that awarded by the lecturer. Those in the highest quartile (receiving lecturer marks between 66 and 86) produced self assessments which averaged 8.9 marks below that awarded by the lecturer.

How reliable and how accurate are students' assessments?

The usual measure of reliability reported in the literature is that of product moment correlation. As indicated in table 3 following, there are moderately high correlations for all three comparisons between lecturer, self and peer marks. When lecturer marks are correlated with the averaged score obtained by combining self and peer marks, the correlation coefficient reaches 0.86.

Table 3: Comparisons of correlations and mark variations for
Lecturer, Self and Peer marked scripts


	product moment correlation	mean of absolute difference of scores

Lecturer- Self	0.79	7.3
Lecturer- Peer	0.81	8.2
Lecturer - (Self + Peer)*	0.86	7.0
Student- Peer	0.78	7.6

* 'Self + Peer' refers to the score obtained by averaging self and peer mark.

Using the lecturer score as a 'benchmark', measures of the accuracy of student self and peer marks can be gauged by determination of the mean absolute differences between marks obtained by self (and peer) and that obtained by the lecturer. As illustrated in table 3, the averaged difference between individual self and lecturer scores was 7. 3 marks; and between peer and lecturer 8. 2 marks. It may be noted that although there is a slightly higher correlation between lecturer and peer marks, peer marks are less 'accurate' because, as indicated in table 1, peers employ a harder marking standard than lecturer or self assessors.

How would assessments be affected by incorporating student and/or peer marks?

Some of the effects can be inferred from the data in preceding tables. Use of student marks only in place of the lecturer mark would result in moderately similar ordering of individual performance ( r=. 79), and would produce a virtually identical mean. The smaller standard deviation indicates a more compressed range of scores, with a tendency for 'high performers' to receive lower self scores and conversely, 'low performers' to receive higher self scores.

Use of peer mark only would also result in a moderately similar ordering of individual performance (r=. 81), but the actual scores would tend be substantially lower. A compression of the range of scores, similar to that observed for self marking, would also result. The effects of using self or peer marks as the basis for determining pass or fail status in the examination are illustrated in table 4. The criterion for pass is taken as 50%.

Table 4: Effects of employing self assessments or peer
assessments on individuals' pass/fail status


	Lecturer assessment
	Pass	Fail

Self assessment Pass Fail	49 4	8 24
Peer assessment Pass Fail	36 15	4 26
Self + Peer Pass Fail	40 12	4 24

If self marks alone were to be used as the determinant of exam results, then 8 students who had received a lecturer grading of less than 50% would have passed; and 4 students who received lecturer gradings exceeding 50% would have failed.

If peer marks alone were to be used, 4 would pass who the lecturer failed, and 15 would fail who the lecturer passed.

A combined mark, averaging peer and self mark for each individual, would result in 4 passing who the lecturer failed; and 12 failing whom the lecturer passed. The combined mark was also found to be more highly correlated with lecturer mark (r =. 86) than that obtained by either peer or self score marks with lecturer mark.

Questionnaire results

A questionnaire was administered during second session after students had obtained their midyear examination results. Information was sought in three areas: their views on the principle of employing self and peer assessment methods; their reactions to the experience of marking; and the perceived benefits or otherwise of this experience in relation to their learning of engineering design.

Principle of employing self and peer assessment

Students were asked to respond to four statements by endorsing one of five categories graded 'strongly agree' to 'strongly disagree' with 'undecided or no opinion' as the middle category. Results are displayed in table 5.

Table 5: Students' views on the principle of employing self and peer marking procedures (N =91)


	'Agree'

The ability to assess my own work is very important	91%
The idea of self-assessment is a good one	82%
We should have more opportunities for self assessment	65%
Students should be more involved in assessing other students	50%

There is strong endorsement supporting the importance of being able to assess one's own work, and the idea of using self assessment. Half of the students also supported more involvement in assessing other students.

Reactions to the experience of marking

The students' responses to nine statements in this area, using the same five response categories as above, are shown in table 6. Responses to all nine statements indicate that the experience was positively perceived and that the procedures had operated satisfactorily.

The endorsements by 73% of students that participation had 'assisted in making a realistic assessment of my own abilities in the subject'; and by 84% that it 'had made me more aware of what I need to know in the subject' raise the issue of the nature and detail of what was learnt and how they benefited through engagement in the exercise.

Table 6: Reactions to specific procedures used in self and peer assessment


	'Agree'

I found assessing my own work to be valuable	83%
I found assessing another students work valuable	64%
This exercise assisted me in making a realistic assessment of my own abilities in this subject	73%
This exercise has made me more aware of what I need to know in this subject	84%
I would like to see some changes made in the procedures used in the exercise	35%
I found it difficult to use the marking scheme	13%
I found it difficult to follow the model answers	18%
I don't think the rewards were sufficient for the amount of time I spent	29%
The whole exercise of self and peer marking was a waste of time	4%

Perceived benefits

The results shown in table 7 display details of students' claims relating to the benefit or otherwise in being involved in the self and peer marking exercise.

Table 7: Perceived benefit of experience of self and peer marking


	Substantial benefit	Some benefit	Litte or no benefit

Improving my understanding of the subject matter	24%	53%	18%
Improving exam performance through developing an understanding of what examiners look for in answers	64%	31%	2%
Developing my ability to critically assess my own work later as a practising engineer	28%	51%	18%
Developing my ability to assess the work of a colleague when I become a practising engineer	20%	53%	21%

Although the majority of students indicate some level of benefit on the four aspects included in table 7, the predominant view is that substantial benefit accrued to 'improving exam performance through developing an understanding of what examiners look for in answers'.

What was learnt?

A significant source of information on what students learnt from the exercise came from an open ended question in which they were asked to identify what they 'had learnt about deficiencies and strengths in your own work as a result of the peer and self-marking exercise'.

From the 91 questionnaires responses, 73 (80%) included written responses to this question. These were content analysed and grouped into four main categories.

Content specific: The most frequent response category related to learning which was specific to the content of the examination itself. Examples of students' descriptions are illustrated:

'I didn't look at the bending of the beam; didn't properly assess the mechanical requirements of the structure, such as wall thickness etc '
'A serious mistake was disregarding the residual component of force caused by the weight of the rotating body. My design was too bulky and should have been lighter'

In the above category 36 students described learning in terms which were specific to the content of the examination problem, although the majority (21 students) also described learning that had occurred at a more general level.

Approach: 28 students described their learning in terms of identifying deficiencies in their approach to solving design problems of the kind set in the examination.

'Learnt about deficiencies in making assumptions for the calculations and made inefficient approaches for calculation of the design'
'I was lacking in knowledge of why I drew that certain design. Looking at the overall picture and planning to satisfy all the requirements'
'I should have spent more time doing scale sketches to have a better idea of what would be a better design'
'Not daring enough, suppressed thoughts'

Design skills/Problem solving skills: 19 students identified deficiencies in different skill areas related to analysis or drawing. Some examples include:

'I need to do more work, examples, in determining what forces and stresses are involved in certain design problems'
'I found problems in spacing my drawings, locating circles, and obtaining even firmness of lines'

A further 9 students also mentioned they had developed an appreciation of the need to reallocate the time spent on different aspects of design solution and drawing:

Expectations: 26 students mentioned that the exercise had resulted in their developing a better understanding of what is expected by examiners in order to achieve high marks.

'I found that the material that I provided in the examination was not the material that the examiners wanted to see'
'I have a better appreciation of the detail needed and what the examiners are looking for with respect to ways of dimensioning and practicability'

Discussion

Marking

The literature on self and peer assessment provides a wide array of divergent findings with respect to the accuracy and validity of student marking. A difficulty with the literature, as noted by Falchikov (1986), is that the situations investigated and the methodologies employed have been as varied as the results which emerge from them.

Within the context of the situation analysed in the current study, it is apparent that considerable agreement exists between the three sets of marking procedures. The correlation of 0. 86 between lecturer marks and the combined average of peer and self marks is quite high, and perhaps more than would normally be obtained from repeat marking of examinations by academic staff. In large part, the nature of the exercise and the employment of a detailed marking protocol have provided a context in which a high level of confidence can be placed in the validity of students' assessments. This is seen as a particularly signficant finding since its applicability to assessments of the kind reported here - of design problems of an open ended quality - represents an extension from that previously reported.

This opens up the prospect of extending the application of self and peer marking procedures to provide a complete basis for assessment, and thereby obviating the need for staff to mark every script. However, since in the present study, thc students' assessments did not count formally as part of their examination result, some safeguards may need to be built into the procedures to ensure that the self marking system maintains its reliability and accuracy. The procedures and checks built into the self and peer marking scheme used by Boud & Holmes (1981 ) in Electrical Engineering on our own campus indicate a practical mechanism for achieving this.

The student experience

Questionnaire results indicated that students strongly supported the principle of self and peer marking, and viewed the experience as providing substantial learning benefit. Most of the questionnaire items had been taken verbatim from earlier studies by Boud & Holmes (1981) and Boud et al (1986), and a pattern of results was obtained which was very similar to those found in the two earlier studies. Although there is a consistency in claims to substantial benefit - for example, with respect to 'making realistic assessments of own abilities', 'more aware of what I need to know', 'developing an understanding of what examiners look for in answers' - hitherto there have been no specific descriptions of what students learnt from self or peer assessment experiences.

The descriptions by students provide clear testimony that most students had learnt in ways which reach beyond that of knowledge of the solution to the examination question and identification of the source of their own errors. Indeed, a majority of respondents had identified either deficiencies in particular skills relevant to producing engineering design, or had identified ways in which their approach to problem solving in design needed to be altered. Also, a substantial number had come to a clearer appreciation of what was expected to achieve satisfactory or superior performance in examinations.

The information we have on what students have learnt from the exercise can indicate no more than a step, and we believe an important first step, towards the development of self evaluation skills important to professional practice. In our opinion, effective development would require the systematic use of self and peer assessment throughout the students' whole course of study; its regular incorporation into formal assessment; and active collaboration between students and staff in determining the criteria for different assessments. Nonetheless, a successful first step has been taken.

Concluding remarks

The data obtained in this study have provided encouragement to continue with self and peer assessment in design subjects, and the introduction of similar procedures for the third year design subject is being planned.

Although the mark analyses indicate that a combination of self and peer assessments can provide a reliable alternative to lecturer marked scripts, we believe such an innovation would need cautious development supported by further studies. The introduction of self and peer assessment was premised on the conception of the importance of students developing the ability to assess their own design work. It was our belief that the experience of engagement in self and peer assessment would provide a modest, but nonetheless important, contribution to this. The results obtained support this belief.

References

Boud, D., Churches, A. & Smith, E. (1986). Student self assessment in an engineering design course: an evaluation. International Journal of Applied Engineering Education, 2, 83-90.

Boud, D. & Holmes, W. H. (1981). Self and peer marking in an undergraduate engineering course. IEEE Transactions, E-24, 4, 267-274.

Boud, D. & Lublin, J. (1983). Self Assessment in Professional Education: A Report to the Commonwealth Research and Development Committee. Tertiary Education Research Centre (UNSW).

Davis, J. & Rand, D. (1980). Self grading versus instructor grading. Journal of Educational Research, 73, 4, 207-211.

Falchikov, N. (1986). Product comparisons and process benefits of collaborative peer group and self assessments. Assessment and Evaluation in Higher Education, 11, 2, 146-166.

Midgley, D. & Petty, M. (1983). Final Report on the Alumni Association 1982 Survey of Graduate Opinion on General Education. UNSW Alumni Association, Kensington.

Orpen, C. (1982). Student versus lecturer assessment of learning: A research note. Higher Education, 11, 567-572.

Mr Douglas Magin is Senior Education Officer within the Tertiary Education Research Centre at the University of New South Wales. His major work has been within curriculum innovation and evaluation in higher education.
Dr Alex Churches is Senior Lecturer in the School of Mechanical and Industrial Engineering at the University of New South Wales. He has published widely in engineering education, with special interest in innovations in engineering design courses.
Please cite as: Magin, D. J. and Churches, A. E. (1988). What do students learn from self and peer assessment? In J. Steele and J. G. Hedberg (Eds), Designing for Learning in Industry and Education, 224-233. Proceedings of EdTech'88. Canberra: AJET Publications. http://www.aset.org.au/confs/edtech88/magin.html

[ EdTech'88 contents ] [ EdTech Confs ] [ ASET home ]
This URL: http://www.aset.org.au/confs/edtech88/magin.html
© 1988 The authors and ASET. Last revised 11 May 2003. HTML editor: Roger Atkinson
Previous URL 9 May 1998 to 30 Sep 2002: http://cleo.murdoch.edu.au/aset/confs/edtech88/magin.html