|[ ASET ]
[ Proceedings Contents ] |
An application of speech recognition technology is being trialled in university lectures. A lecturer's speech is first digitally converted into electronic text for display via a data projector. After the lecture the transcript is made available online for students to access for revision. While this project is primarily aimed at students with disabilities, all students in the lecture have access to the screen text and can register to access the online lecture transcripts. The present preliminary study was designed to survey the reaction of the wider body of students in three university courses. The results showed that only a small proportion of the wider student body reported finding the screen text helpful in comprehending the lecture or in taking notes. While relatively few respondents said they had used the online lecture transcripts, a large proportion indicated they intended to use them for course revision. The findings, though not definitive, have implications for how speech recognition technology is presented so as to provide choice to university students about whether or not they access it to assist their learning.
Figure 1: Example of central screen comprising lecture slides and two lines of digitised screen text
n = 64
n = 84
n = 18
|1. Screen text was helpful||11||18||11|
|2. Screen text helped when something was missed||9||28||17|
|3. Screen text improved understanding||3||12||6|
|4. Screen text improved notetaking||6||15||11|
|5. Inaccuracies in screen text were distracting||92||82||94|
|6. Inaccuracies were less of a problem over time||47||65||#|
|7. Online lecture transcripts were helpful||12||12||39|
|8. Online transcripts used instead of taking notes||7||11||6|
|9. Plan to use online transcripts to revise||45||41||56|
|10. Technology had a positive effect on lectures||12||25||17|
|# Item inadvertently omitted from questionnaire|
Other aspects of the screen text investigated were whether it improved understanding and notetaking. Again, the percentages of respondents experiencing these potential benefits were small. Improved understanding of lecture material was reported by between 3% and 12% of respondents across the three courses (see Item 3). The percentages reporting improved notetaking ranged from 6% to 15% (see Item 4). Almost all respondents reported being distracted by inaccuracies in the screen text (see Item 5 with rates of 92%, 82% and 94% for the three courses, respectively). However, it appears that many students adapted somewhat to this distraction effect with between 47% to 65% reporting that inaccuracies were less of a problem as the semester progressed. This is supported by anecdotal reports from lecturers who observed a decrease during the semester in overt disturbance (e.g., student laughter and comments) because of screen text errors.
In most cases, only small percentages rated agreement with items 7 and 8 (see Table 1). The exception is that for course CPH252 there was a high rate of agreement (viz., 39%) that the online transcripts were helpful. It is notable that appreciable percentages of respondents in all three courses (41% to 56%) stated that they planned to use the transcripts to revise for their exams (see Item 9). This result has to be interpreted in terms of the proportions of enrolled students who had access to the online transcripts by previously registering to participate in the project (viz., ACC220 = 68%; BUS101 = 41%; CPH252 = 70%). It is likely that most of those who had registered reported they planned to use the transcripts for revision. Finally, a minority of respondents (12% to 25%) felt that the technology had an overall positive effect on how the lectures were delivered (see Item 10).
Analysis of variance carried out on the mean ratings for each of the 10 items showed significant differences across courses for the following items (df = 2 in all cases): Item 1 (F = 3.76; p < 0.05), Item 2 (F = 6.96; p < 0.001), Item 3 (F = 6.47; p < 0.05), Item 4 (F = 3.54; p < 0.05), and Item 10 (F = 3.85; p < 0.05). Post-hoc comparisons using the Tamhane test (assuming non-homogeneous variances) indicated that for all five of these items, the mean ratings were not statistically different for BUS101 and CPH252 but that these two courses had significantly higher ratings than course ACC220. It was hypothesised that any differences in accuracy rates across courses might lead to differences in student reaction to the speech recognition technology. The mean accuracy rates in the three courses based on differences between the spoken lecture and the screen text were: ACC220 = 74%; BUS101 = 85%; CPH252 = 56%). The fact that the course in which the technology was significantly less favourably rated on five of the ten items (viz., ACC220) did not have the lowest accuracy rate suggests that screen text accuracy is not solely responsible for the differences in student reaction. This is corroborated by the analysis of variance on the mean ratings for items 5 and 6 about distraction from screen text inaccuracies which showed no significant effect for type of course.
The results show that simultaneous screen text in lectures was of assistance to only a small proportion of the wider body of students in the courses investigated. Without demographic data, however, it is not possible to determine whether the screen text was more helpful for students with disabilities and those of non-English speaking background. The present findings contrast with those from a study of students at a Canadian university where the Liberated Learning Project was implemented using comparable technology (Leitch & MacMillan, 2001). The sample comprised 54 students in two classes where automated screen text was displayed during the lecture. In response to the question 'How useful was the digitized lecture in terms of improving your understanding of the lecture?', 94% of respondents rated it as useful ('extremely useful' = 13%; 'useful' = 52%; 'somewhat useful' = 30%). This contrasts with the present study where only 7.9% of respondents overall agreed with the statement that the screen text 'improved my understanding of the lecture material' (see Item 3). Similarly, in the Canadian study 92% of respondents rated the screen text as useful in 'improving the way you take notes' whereas only 11.4% of respondents overall in the present study agreed that the screen text 'improved my notetaking' (see Item 4). While the samples and questionnaire items are not identical, it is clear that there are major differences in the experiences of students in the two studies. A likely reason for these differences is that the present format involved dual visual inputs (2 or 3 lines of text presented below computer slides - see Figure 1) whereas in the Canadian case, students had a full screen of text with no additional visual input. Further research is needed to test this and other possible explanations.
It may appear that accuracy is the critical variable in speech recognition particularly in the light of the present findings that a minority of respondents said the screen text was helpful while the vast majority said inaccuracies were distracting (see Table 1). However, of the 13 respondents across the three courses who reported not being distracted by inaccuracies (by disagreeing with Item 5), only 4 also agreed in Item 1 that the screen text was helpful. Further, of the 24 respondents across the three courses who reported finding the screen text helpful, the clear majority (viz., 19 respondents) also reported being distracted. These results suggest that factors other than distraction from inaccuracies are involved in determining the perceived helpfulness of the speech recognition technology. Also, the analysis of variance results show that screen text accuracy does not explain the differences in student reaction across courses. The courses with the lowest and highest accuracy rates were rated equally on the various items (viz., CPH252 and BUS101, respectively).
Students who experience no difficulties hearing and comprehending lectures in English have the potential to access all the material presented (depending on their intelligence, concentration etc). Whether an additional input modality in the form of simultaneous screen text (with or without computer slides) enhances or impedes the learning of these students is a matter for empirical investigation. Research by Kalyuga and associates (Kalyuga et al., 1999; Kalyuga, 2000; Kalyuga et al., 2001) suggests that such students may experience 'cognitive overload' because of redundant information from the auditory and visual modalities. However, students who for one reason or another, are unable to input lecture material via the auditory modality may well find a benefit in accessing it visually from screen text. Also, online lecture transcripts derived from speech recognition technology provide a resource that has potential benefits for both students with disabilities and non-disabled students, particularly in course revision. Whether such transcripts can serve as flexible learning materials to be used instead of lectures is a matter for investigation. Because live lectures are typically more conversational than written text, lecture transcripts might not meet the instructional design requirements for fully effective resource-based learning (Moran, 1996). Clearly, speech recognition is an important and fruitful area for future research and development.
The present results indicate that only a small proportion of students reported finding that the screen text helped them in understanding the lecture material and in notetaking. This contrasts with the experience of students at a Canadian university where a different presentation format was used and the vast majority rated the technology as useful. Further research is needed to determine how simultaneous screen text in lectures affects learning in terms of both the subjective reaction of students and the objective impact on their comprehension and notetaking. The role of learning style is of potential relevance - visual learners, for example, may find screen text more useful than auditory learners. It is particularly important to test whether the use of multiple input media (spoken voice, screen text and computer slides) leads to cognitive overload as has been suggested by some previous studies.
Bain, K. & Paez, D. (2000). Speech recognition in lecture theatres: Liberated Learning Project and innovation to improve access to higher education using speech recognition technology. Proceedings of the Eighth Australian International Conference on Speech Science and Technology, Canberra, 5-7 December.
Birchard, K. (2002). Stanford U. will test a computerized transcription system. The Chronicle of Higher Education, 24 January. http://chronicle.com/free/2002/01/2002012401t.htm [viewed 11 Feb 2002, verified 13 Aug 2002].
Buckler, G. (2001). Recognizing voice recognition. Computer Dealer News, 17(22), 17. http://ezproxy.usc.edu.au:2053/pqdweb?Did=000000094519481&Fmt=4&Deli=1&Mtd=1&Idx=108&Sid=1&RQT=309 [accessed 11 February, 2002].
Coco, D.S. & Bagnall, J. (2000). The Liberated Learning Project: Improving access for persons with disabilities in higher education using automated speech recognition technology. Paper presented at PEPNet 2000 Conference, Denver, CO, 7 April. http://e-education.mtt.ca/display.jkg/SMULLP/LLP/Spring2000/Pepnet.htm [accessed 25 July, 2000].
Kalyuga, S. (2000). When using sound with a text or picture is not beneficial for learning. Australian Journal of Educational Technology, 16(2), 161-172. http://www.ascilite.org.au/ajet/ajet16/kalyuga.html
Kalyuga, S., Chandler, P. & Sweller, J. (2001). Why text should not be presented simultaneously in written and auditory form. Unpublished manuscript, University of New South Wales, August.
Kalyuga, S., Chandler, P. & Sweller, J. (1999). Managing split-attention and redundancy in multimedia instruction. Applied Cognitive Psychology, 13, 351-371.
Leitch, D., & MacMillan, T. (2001). Improving access for persons with disabilities in higher education using speech recognition technology: Year II progress report, Unpublished Report, Liberated Learning Project, Saint Mary's University, Halifax, Canada.
Marshall, P. (2002). Voice recognition: Sound technology. Federal Computer Week, 16(1), 32. http://ezproxy.usc.edu.au:2053/pqdweb?Did=000000098985328&Fmt=4&Deli=1&Mtd=1&Idx=23&Sid=1&RQT=309 [accessed 11 February, 2002].
Moran, L. (Convenor) (1996). Quality Guidelines for Resource-Based Learning, Working Party of Resource-Based Learning, National Council for Open and Distance Education, Canberra, October.
|Author: Andy Hede, University of the Sunshine Coast, Queensland. |
Email: firstname.lastname@example.org Web: http://www.usc.edu.au/
Please cite as: Hede, A. (2002). Student reaction to speech recognition technology in lectures. In S. McNamara and E. Stacey (Eds), Untangling the Web: Establishing Learning Links. Proceedings ASET Conference 2002. Melbourne, 7-10 July. http://www.aset.org.au/confs/2002/hede-a.html