|  | 
Students often enjoy learning in teams and developing teamwork skills, but criticise team assessment as unfair if team members are equally rewarded for unequal contributions. This paper describes the design, implementation and evaluation of a confidential, web based system for self and peer assessment of contributions to team tasks and team management roles, which enables shared team marks to be moderated to reflect individual contributions. The web based approach has several advantages compared with paper based approaches. For students, it enables the use of multiple assessment criteria which better reflect team contributions, improves familiarity with the assessment criteria and improves confidentiality as they can access the system and change their ratings as often as they wish until a pre-determined cutoff date. For staff, it has potential to improve student learning from teamwork tasks, and saves time by automating the process of calculating self and peer adjustments of assessment grades, enabling the system to be used in subjects with large enrolments.In 1999, the system was trialed and evaluated in several different subjects with different kinds of assessment tasks. Evaluation findings suggested that most students appreciated the confidentiality of the system and felt that the system was a fair way of assessing team contributions. However there was considerable variation across the different subjects. This paper presents four brief case studies which describe the different uses of the system, the responses of staff and students and the lessons learned. The differences point to the potential usefulness of the system. More critically however, they provide evidence for the range of factors that teachers need to consider when planning to use SPARK and integrating it successfully into the subjects.
Group and team work are commonly used in higher education to facilitate peer learning and encourage students to develop their capacity to work as part of a team. There seems little argument about the value of teamwork, but its assessment has proved considerably more problematic (Conway, Kember, Sivan & Wu, 1993; Lejk, Wyvill & Farrow, 1996). One author has likened group assessment to a game, maintaining that the rules of the game advantage some students and disadvantage others, and that factors such as teamwork and contribution to a team are "essentially impossible to assess fairly" (Pitt, 2000, p. 240). However, assessment strongly influences students' learning (Ramsden, 1992; Biggs, 1999). If our courses include objectives about students' capacity to work as part of a team, and we value peer learning and collaboration then we need some means of assessing teamwork in a fair and meaningful way which promotes peer collaboration (Sampson, Cohen, Boud and Anderson, 1999).
Peer and self assessment should be a justifiable way of assessing team contributions, as it gives team members the responsibility for negotiating and managing the balance of contributions and then assessing whether the balance has been achieved. Peer assessment of individuals' contributions to assessed teamwork isn't a new idea, although the addition of self assessment is relatively innovative. While there is some debate about the inclusion of self assessment (Lejk et al 1996), we believe it encourages students to reflect on their own contributions and capabilities. In fact, Boud, Cohen and Sampson (1999) favour self assessment informed by peer feedback on specific criteria, in preference to peer assessment per se.
The SPARK system uses both self and peer assessment and was adapted from a well-designed and evaluated paper based peer assessment system in which students rated each other's contributions and the lecturer used the ratings to calculate adjustments to individual marks (Goldfinch 1994). Goldfinch's approach uses multiple assessment criteria to encourage students to consider a range of different components of the task and team process in making their assessments. The approach was a simplification of an earlier approach taken by Goldfinch and Raeside (1990) which involved a two part assessment form where students were prompted to identify peers who made the greatest contribution to particular task elements, then used these promptings to give peer ratings. A related, simplified approach was used by Conway et al (1993) and found to be reasonably well accepted and regarded as fair by their students. Cheng and Warren (2000) also used peer assessment against multiple criteria to moderate group marks, noting that the approach "facilitates the benefits of group work while providing opportunities for peer assessment" (p. 253).
While these related methods have all been reasonably effective in adjusting team marks to reflect individual contributions, they have all been paper based and involved a series of time consuming calculations to generate adjustment factors. This creates a disincentive for lecturers and delays the provision of feedback to students. The Goldfinch (1994) and Conway et al (1993) simplifications of the original Goldfinch and Raeside (1990) scheme were attempts to reduce this problem, but proved unmanageable with class sizes in excess of 500 students. The SPARK system deals with this problem by automating the processes of collecting the student assessments and completing the calculations. This is a major efficiency benefit, but efficiency was not the only rationale for designing the system.
SPARK also has the aim of improving students learning from team tasks in several ways. Firstly, provided valid assessment criteria are chosen, the system should encourage students to negotiate the way they will work in the team to achieve the best task result with equal contributions by all students. Secondly, using self and peer assessment encourages students to develop the capacity to reflect on and evaluate their own and others' contributions and develop awareness of their own strengths and needs as a team member. Another major benefit of SPARK is that it is a relatively generic template which can be easily adapted to any learning context where group or team work and/or self and peer assessment, are used.
The next section of this paper describes the SPARK system and its typical implementation in a subject. This is followed by a description and analysis of the iterative process of designing, developing, implementing and evaluating SPARK, first in one subject and then in three further subjects in very different contexts. The final sections discuss the lessons learned and implications of future use for SPARK and the wider implications of the process for project developers who seek to disseminate generic templates across teachers, subjects, disciplines and institutions.
 
Figure 1: Staff/Instructor login screen
Figure 2 shows a typical screen within the system, showing the menu bar in the top panel and help information in the left hand panel.
 
Figure 2: SPARK screen for instructors to enter or importing student details
Teachers begin using SPARK by entering subject and student details. Student details can be batch imported. Assessment criteria need to be decided, either by the teacher or the teacher in negotiation with students. Teachers need to decide which criteria will be prompting only, and which will contribute to the final assessment. The next, and very important stage, is helping students to understand how group assessment aligns to the subject objectives, the rationale for using SPARK, how they can use it to assist their teamwork and how the self and peer assessment will affect their marks. How groups are formed and facilitated are important issues but not dealt with in this paper. If teams are formed by the academic then they can be batch imported. Otherwise the students can register their own teams. Students can modify self chosen team memberships until a close off date. (SPARK does not allow any overlap between the team registration period and the subsequent rating period.)
Once students and teams are defined and assessment criteria are chosen, students can access the system as often as they wish to view and discuss the assessment criteria with their team members. After the group task is over, a defined rating period allows students to confidentially rate each member of their group. Figure 3 shows an excerpt from a typical self and peer assessment form, showing some sample criteria and ratings where 1=no contribution to the team for that aspect, 1=below average, 2=average and 3=above average contribution.
After students have done their assessment, teachers can then use the system to calculate various self and peer assessment factors, based on the formulae published in Goldfinch (1994). These factors can be exported to a spreadsheet for calculating individual marks (if the purpose is summative assessment) or used as a source of feedback to students (if the purpose is formative assessment).
 
Figure 3: Excerpt from a typical SPARK self and peer assessment form
While the front end which students and staff see is written in HTML for the web (and not some proprietary network or program), the back end database and operating system and programming approach have experienced multiple changes as technologies have advanced in the last 5 years. The current version runs on a Windows NT server with an ACCESS database. In 1998, queries to the ACCESS database were written in Java and Java script. More recent programming has been aided through the use of ServletExec servlets. The system is still being refined.
The team have utilised a web supported conferencing program for project management, asynchronous discussion of issues, brainstorming and as a repository of key materials. Educational specialists in self and peer assessment have been consulted for expert reference during the development period at conferences, by visits and by email. Although students are central to any educational context, our evaluation attempted to consider other stakeholders such as the effects on staff, academic departments, the institution, and the wider context. This holistic approach reflects our perspective that unless all parties and issues are considered, a disjointed impression can be given. Details of the findings and broader issues will be published in a later paper. Issues of particular relevance to students and academics are woven through the following four cases.
By 1996, the students enrolled in the subject had risen to 850 and a series of "homebrew" web pages and simple applets was used to facilitate staff-student and student-student interaction and provide access to additional materials. The first prototype of SPARK was developed and trialed in the web environment. Assessment criteria were developed using a focus group of previous students, which identified 16 sub-tasks involved in the completion of the case study. These task criteria were supplemented by a further six criteria related to team maintenance and leadership roles. Students registered their team and had access to the criteria before the case study began then, following submission of the third stage of the case study, they submitted ratings of their own and their peers' contributions.
A number of benefits for students arose from using SPARK. Students perceived the process was fairer because the rating items reflected a range of aspects of the team task and maintenance roles, and the self and peer assessment could be done confidentially and changed as many times as they wanted until the deadline. The latter enabled students to change their ratings privately if others had publicly coerced them. Students also used the obvious nature of the items and method of calculation to manage team effort during the process and even affect their choice of group membership. Open ended responses on the end of semester student feedback survey showed a dramatic reduction in complaints about the team assessment.
The main relevant cost to students was that of access to SPARK. Less than 10% had external access to a web connected PC and students were heavily competing for lab facilities around the time of the deadline. This problem has largely disappeared as external web access amongst these students is now closer to 90%. The other access problem arose because of occasional bugs in the software. When it comes to assessment, even occasional bugs can be very frustrating and stressful.
Staff experienced a number of benefits from incorporating SPARK. Firstly, it was possible to retain the case study as a valuable learning task. Without a solution to the free-rider complaint, it was likely that the task would have been dropped. Secondly, staff felt satisfied that the process of assessment was fair. This was ethically important as well as increasing student satisfaction. Student comment about the case study being valuable for learning in the subject re-emerged. Thirdly, staff felt a sense of satisfaction that progress was made in a conscious attempt to develop students' ability to work in a team, a capability they knew was important in the profession. Without such easy data collection on multiple rating items, this would not have been possible. Another indicator of success was that despite the growing number of teams, the number of team problems needed staff intervention had reduced substantially.
There were several costs to staff. Firstly, having to reprogram the system for several different platforms was time consuming. Technical bugs were bound to occur in such an environment, as they do in any development mode. But students are very unforgiving when it comes to technical bugs and assessment. Because student feedback was both anonymous and of a public nature on the web discussion forum, even if only a few experienced problems, their comments could be loud and strong. While this was very discouraging during the developmental phase, it was a strong incentive to improve. Secondly, in the first three years, the calculation of the factors was very time consuming, taking up to 3 days on Excel because there could be between 250 and 300 groups. The latter was rectified in the generic version.
The lessons learned in this large class provided valuable insights into self and peer assessment and group work for the wider university community and also helped the subsequent development of the generic version.
SPARK was trialed in the subject in first semester 1999. Because of the development timeline, students first gained access to the system in the middle of the semester, at around the time they started the case study. The assessment criteria were taken directly from those used in Subject A, rather than being customised. The lecturer perceived that they were sufficiently appropriate as both assessment tasks involved a case study. Like in Subject A, students chose their own teams.
Evaluation of SPARK involved a questionnaire followed by a focus group with both the day and evening classes, and a reflective diary kept by the lecturer. The student questionnaire included rating and open ended questions. It asked questions about useability, reactions to the system, and perceptions of learning from the self and peer assessment process. Table 1 shows student responses to some of the rating questions.
| SA/Agree | SD/Disagree | |
| The system was accessible | 79% | 8% | 
| The system was easy to use | 70% | 13% | 
| The process helped me learn more about teamwork | 40% | 24% | 
| Identified aspects of teamwork I hadn't thought about before | 41% | 27% | 
| Items were appropriate for assessing contributions | 69% | 9% | 
| Encouraged greater effort | 40% | 33% | 
| Able to give an honest assessment | 78% | 11% | 
| Fair way of assessing team contributions | 69% | 18% | 
The percentage of students who reported that the process had helped them to learn more about teamwork was encouraging, considering that most students had encountered team tasks in previous subjects. It was also interesting that 40% felt it encouraged them to make more effort whereas 33% disagreed, the latter often commenting that they were self motivated to contribute or wanted to do well and did not need the external incentive to make an effort.
Responses to the open questions and the focus group suggested that many students perceived the purpose of the system as encouraging equal contributions by team members, or controlling free-riders. While most students perceived that the system was fair, some disagreed, particularly if they had worked in groups of three rather than four. Some students clearly did not understand exactly how the self and peer assessment ratings would affect their marks. These perceptions appeared to reflect the way that the lecturer introduced and explained the system. The lecturer perceived the main benefit to be reducing free riding.
Disadvantages for both the lecturer and the students focused on the useability of the system, in particular bugs and other technical problems which happened during the development. Discussions of useability resulted in some changes to the system, including simplifying the password system to make it the same as that used in the web based learning system, and providing feedback messages to confirm that students' ratings had been submitted successfully.
SPARK was introduced and used for formative feedback to teams at the end of the first team assignment and summative assessment at the end of the second task. This was an innovative use and some of the teaching team perceived that it should encourage teams to discuss the way they worked and work more effectively for the second task. The assessment criteria were the same as those used in Subject A. Students gained access to SPARK shortly before the end of the first task. Evaluation included rating and open ended questions on SPARK as part of a standard subject evaluation questionnaire, a student focus group, lecturer reflection and a focus group with the teaching team. Students were asked fewer questions than in Subject B, because the teaching team wanted to ask many questions about other aspects of the new subject. Table 2 shows some responses.
| SA/Agree | SD/Disagree | |
| Self and peer assessment feedback after assignment 1 helped the team to work more effectively on assignment 2 | 20% | 47% | 
| Items were appropriate for assessing contributions to the team assignments | 42% | 37% | 
| Team development tutorial activities helped me learn more about teamwork | 36% | 35% | 
Introduction of SPARK in this subject had more disadvantages than benefits for students and lecturers, resulting in some valuable lessons learned for the development team. Formative use of the system created breakdowns in some teams when team members who perceived themselves to have contributed equally ended up with different peer assessment ratings. Teaching team members reported more of these team problems than they usually experienced in similar subjects. Several factors appeared to contribute to this. The assessment criteria were taken from subject A rather than specifically chosen to reflect the team tasks that students had to do. While some items on the "subject A" form might be said to be generic qualities of teamwork (see Figure 3), others were not. Only 42% of students agreed that the items were appropriate for assessing contributions, compared with 37% who disagreed. In subject B, 69% had agreed and 9% disagreed.
Further analysis of the open ended responses and discussion in the focus group yielded other reasons. Students did not fully understand how the system worked, and in particular how ratings on each of the assessment criteria would affect the overall self and peer ratings. They also felt that they had spent tutorial time in discussing how their teams would work, and their discussions were not reflected in the criteria used in SPARK. Formative feedback was given only in the form of the overall self and peer adjustment factor, rather than as a profile of contributions which could be discussed in a team.
Despite the problems however, students generally perceived that SPARK had a useful purpose if it were appropriately implemented, as illustrated in the following quotes:
"made you think about how much each member and yourself contributed to different aspects of the assignments"Students also described difficulties with accessing SPARK, and complained about the time taken to access the web and complete the process in a subject where web enabled learning resources were not otherwise integrated. Teaching team members also complained about technical problems and difficulties in calculating the required factors for formative purposes. While some teaching team members sought to improve SPARK's use and maintain it in the subject, others sought to drop it entirely. For the development team there were some significant lessons learned, which are discussed more fully in later on in this paper."to evaluate contributions from each team member by team members to get a fair distribution of marks. It still didn't work."
The subject has eight objectives - two related to knowledge development, four related to the development of capabilities for using that knowledge (eg. critically evaluate problems and alternative solutions; effectively use analytical tools; competently use technology; communicate effectively to develop and maintain personal and professional relationships) and two objectives related to values (ie. able to work self critically in a group or autonomously, respecting different cultures, ethical and disciplinary approaches). Assessment is aligned carefully with learning activities to ensure subject objectives are achieved. 50% of the grade is allocated to individual work. The remaining 50% of the grade is based on four group assessment tasks. A team presentation worth 20% and 3 team tests where the average of their best two is worth 10%, are conducted in class but require significant preparation out of class. A team debate worth 10% and a team topic tracking exercise worth 10%, are completed out of class time but submitted online.
With 50% of the grade comprising group assessment tasks, students need to seriously deal with their own and others' abilities to work in a group. Not only do groups face the typically possible side effect of free-riders, but the potential for dysfunction in groups is higher because of language and cultural differences. Up to 70-80% of the student cohort are international students, coming from a large variety of countries where English is not their native tongue. To optimise the potential benefit of working in a group, the membership is static for the duration of the semester.
Following completion of the final team assessment task, students undertook to rate each team member. In 1998 the self and peer assessment process was completed on paper at the final face to face session and then manually entered by staff into an Excel spreadsheet which calculated the self and peer adjustment factor identified by Goldfinch (1994). In 1999, SPARK was used for data entry when the students rated each other online over a one week period, and staff used it for the subsequent calculation of the self and peer assessment adjustment factors. Sixteen 'prompting' criteria were specifically chosen for the four team tasks. Students for example evaluated their own and their peer's on two aspects for the topic tracking task (ie. 'quality of postings' and 'quantity of postings'), five aspects of the debate, six for the presentation, and three for the tests. This was followed by six 'final' criteria relating to an effective team. Evaluation of the process was carried out using student questionnaires, a structured phone interview with almost all students and reflective journals kept by the two lecturers who jointly taught the subject.
In the student feedback survey, only 8% did not feel that their ability to work in a team had been enhanced through the team exercises. This is a very positive result given that most are mature learners who would have had ample opportunities to develop this skill in their work experience prior to enrolling. The phone interviews revealed that the rating items were appropriate and that most felt it was a fair and honest solution for encouraging teamwork overall. Only some 10-14% disagreed with each of these criteria. Most students thought SPARK should be implemented where ever group work is used. Interestingly, some 40% said that they did not contribute a greater effort because self and peer assessment was used. Combined with the previous data, this is a positive outcome since it means that free riding was discouraged without pressuring already committed students to do more work.
Staff found SPARK saved them considerable time previously spent on data entry and calculation. They also felt satisfied that the process had encouraged students to achieve the subject objectives, including the development of their ability to work in a team, in the more flexible learning mode.
SPARK works best when students can see the valid reasons for having a team task in the subject, and for using self and peer assessment of team contributions. It is only one approach for ascertaining students' contributions to teamwork, and, like all approaches, needs to be educationally justifiable. Teachers wanting to use SPARK need to align its use as a learning activity and assessment tool with the learning objectives for the subject. This alignment is important in any subject, as it focuses students' learning towards desired outcomes (Biggs, 1999). Teachers need to consider what they hope students will learn from teamwork tasks and represent this in their subject objectives. Criteria for the self and peer assessment processes in SPARK then need to be chosen to reflect the objectives and the task and team management contributions necessary to complete the team task. Relevant criteria are crucial to the success of SPARK, as illustrated in the differences between subject C and the others where SPARK was trialed. Criteria are important in any form of assessment, but even more so when assessment involves processes which may be unfamiliar to many students. Involving current and/or past students in selecting the criteria can greatly enhance students' perception of relevance and fairness of the self and peer assessment process.
Once criteria are decided, the teacher then needs to decide which items will be used in any calculation. If SPARK is purely used to assess teamwork processes, some criteria may simply be used to prompt students' memories of the task activities that the group undertook rather than affect a self and peer assessment adjustment factor (cf Goldfinch and Raeside, 1990). On the other hand, criteria relating to team task components may be used in the final calculations. Whatever approach is chosen, students need to be fully informed about how each of the criteria affects the self and peer ratings which are used to adjust their marks. This is a critical point for increasing students' perceptions of the fairness of the system.
As with any assessment, some students may query their result. Self and peer assessment is no different and some group members may dispute the outcomes of the self and peer assessment process. The latter can be minimised by clear articulation of the process before the group task begins. Examples demonstrating the range of outcomes arising from different levels of contribution and ratings can be very illuminating for students, and are incorporated in the student interface. Other preventative measures or resolution mechanisms can help. For example, students can be required to keep an individual and/or group diary of effort and events. This would be the first resource in the event of a dispute. The message from this is that SPARK is not a hands off tool that the teacher put into the subject to manage teamwork. Teachers are still required to think critically about its usefulness, make the process as transparent and open as possible for students and maintain hands on processes for communicating with teams and resolving conflicts.
The above issues point to the need for teachers to think carefully about how SPARK is integrated into the assessment for the subject and how it is made clear for students. Another set of lessons relates to the fact that SPARK is a web enabled template. In subject C, SPARK was the main, if not the only, reason why students needed to access the web in the subject, whereas in subject D many of the students' learning and assessment activities took place in a web enabled learning environment. If SPARK is the only subject activity which requires access to the web, students tend to regard it as an add on and see access as much more of a problem. Our recommendation is that SPARK only be used in subjects where web enabled learning is already an integrated part of the learning environment.
A further major point relates to the context of trialing and evaluating a system while it is still in development. This has some major benefits for progressively improving the system, but also some disadvantages if development work does not keep to a planned timeframe. It is critical for any assessment related system to be accessible to students as early as possible in the semester and to remain accessible and easy to use throughout the assessment process. Downtime and bugs in SPARK were frustrating for students and stressful for staff if they were unable to gain instant solutions. With a small project under development and one programmer providing technical support it was not possible to provide 24 hours a day, seven days a week support for students and staff. This is an increasing expectation when systems are available via the web. Students and staff need to be aware of this, and staff need to have clear alternatives available if students find that they cannot gain access to systems at critical periods during the assessment.
In summary, the following factors were identified as characteristic of subject environments where SPARK was more successfully implemented:
Future directions for the development and use of SPARK include developing ways of providing students with formative profiles of their self assessments and the combined peer assessments against each of the individual criteria. This may enable students to see the differences between their own and their peers' perceptions of their contributions, and discuss these in their teams. This would be considerably more informative and hopefully more constructive than the approach used in subject C of giving students the numbers only. A second potential direction is in the use of SPARK for self and peer assessment using specified task criteria of tasks which are not initially done in teams. This use has not yet been trialed, and careful thinking will need to be done before moving in this direction.
We believe that SPARK has considerable potential as a "generic" template for improving group based assessment and students' capacity to work as part of a team, and has possible uses in other areas. However the above points need to be addressed in all contexts where it is implemented, and teachers who do implement it need to continue to evaluate its benefits and downsides.
Our preliminary recommendations for those involved in developing and implementing a "generic" template in their subject include the following:
Clearly the best approach to this issue would be to encourage teachers to adopt student focused conceptions and approaches to implementation, and this would be our advice to staff developers seeking to assist with implementation, but the learning design of the template also needs to be clear rather than hidden. Template designers therefore should make the learning intentions of the template very explicit, and make explicit links between these intentions and useful strategies for implementation.
Biggs, J. (1999). Teaching for quality learning at university. Buckingham: SRHE and Open University Press.
Cheng, W. and Warren, M. (2000). Making a difference: using peers to assess individual students' contributions to a group project. Teaching in Higher Education, 5(2).
Conway, R., Kember, D., Sivan, A. and Wu, M. (1993). Peer assessment of an individual's contribution to a group work project. Assessment and Evaluation in Higher Education, 18(1), 45-56.
Freeman, M. (1995). Peer assessment by groups of group work. Assessment and Evaluation in Higher Education, 20(3), 289-300.
Goldfinch, J. & Raeside, R. (1990). Development of a peer assessment technique for obtaining individual marks on a group project. Assessment and Evaluation in Higher Education, 15(3), 21-31.
Goldfinch, J. (1994). Further developments in peer assessment of group projects. Assessment and Evaluation in Higher Education, 19(1), 29-35.
Lejk, M., Wyvill, M. and Farrow, S. (1996). A survey of methods of deriving individual grades from group assessments. Assessment and Evaluation in Higher Education, 21(3), 267-280.
Pitt, M. J. (2000). The application of games theory to group project assessment. Teaching in Higher Education, 5(2), 233-241.
Prosser, M. and Trigwell, K. (1999). Understanding learning and teaching: The Experience in Higher Education. Buckingham: SRHE and Open University Press.
Prosser, M., Trigwell, K. and Taylor, P. (1994). A phenomenographic study of academics' conceptions of science learning and teaching. Learning and Instruction, 4, 217-231.
Ramsden, P. (1992). Learning to Teach in Higher Education. USA: Routledge.
Sampson, J. Cohen, R., Boud, D. and Anderson, G. (1999). Reciprocal peer learning: A guide for staff and students. Sydney: University of Technology, Sydney.
Samuelowicz, K. and Bain, J. D. (1992). Conceptions of teaching held by academic teachers. Higher Education, 24, 93-112.
| Authors: Mark Freeman, Faculty of Business, University of Technology, Sydney Mark.Freeman@uts.edu.au Jo McKenzie, Centre for Learning and Teaching, University of Technology, Sydney. Jo.McKenzie@uts.edu.au Please cite as: Freeman, M. and McKenzie, J. (2000). Self and peer assessment of student teamwork: Designing, implementing and evaluating SPARK, a confidential, web based system. In Flexible Learning for a Flexible Society, Proceedings of ASET-HERDSA 2000 Conference. Toowoomba, Qld, 2-5 July. ASET and HERDSA. http://www.aset.org.au/confs/aset-herdsa2000/procs/freeman.html |