ASET 2002: Hede - student reaction to speech recognition technology in lectures

Student reaction to speech recognition technology in lectures

Andy Hede
University of the Sunshine Coast, Queensland

An application of speech recognition technology is being trialled in university lectures. A lecturer's speech is first digitally converted into electronic text for display via a data projector. After the lecture the transcript is made available online for students to access for revision. While this project is primarily aimed at students with disabilities, all students in the lecture have access to the screen text and can register to access the online lecture transcripts. The present preliminary study was designed to survey the reaction of the wider body of students in three university courses. The results showed that only a small proportion of the wider student body reported finding the screen text helpful in comprehending the lecture or in taking notes. While relatively few respondents said they had used the online lecture transcripts, a large proportion indicated they intended to use them for course revision. The findings, though not definitive, have implications for how speech recognition technology is presented so as to provide choice to university students about whether or not they access it to assist their learning.

Introduction

Recent advances in computer technology have seen automated speech recognition become increasingly prevalent in the modern world (Attaran, 2000; Buckler, 2001) and a particularly important tool for those with disabilities (Marshall, 2002). One major application of this technology in higher education is the Liberated Learning Project which aims to enable students with disabilities (including hearing impairment, learning difficulties and lack of mobility) obtain full benefit from university lectures (see Note 1). The rationale is that the provision of supplementary modalities for accessing lecture material (viz., simultaneous screen text and online transcripts) may reduce the learning impact of a student's disability (Birchard, 2002; Bain & Paez, 2000; Coco & Bagnall, 2000).

Speech recognition technology implementation

The Liberated Learning speech recognition technology was applied in four courses offered at the University of the Sunshine Coast during second semester 2001, namely, ACC220 - Law of Business Associations; BUS101 - Applied Research Methods [in Business]; CPH252 - [Health] Needs Assessment and Planning; INT100 - International Politics: An Australian Perspective. The technology centres on a customised software package, Lecturer, which uses IBM's Via Voice speech recognition software. Each lecturer was allocated a high-end laptop computer and received extensive training in use of the software. Lecturers created their own 'voice models' so that the unique characteristics of their voice could be recognised and so that any special terminology or names could be added to the lexicon in the software. The technology was applied in two distinct stages. The first stage during lectures entailed lecturers wearing a microphone which transmitted their spoken words to the computer where the software converted it to digitised text. This text was then displayed via a data projector onto a central screen together with lecture notes presented as Powerpoint slides. Two lines of continuously scrolling text (3 lines in the case of CPH252) were displayed below the slides (see Figure 1). The second stage after lectures involved an assistant editing the lecture text to insert punctuation and remove any recognition errors. The lecturer then checked the edited transcript and made any corrections before forwarding it to be posted onto a dedicated website where it could be accessed by students who had registered to participate in the project.

Figure 1: Example of central screen comprising lecture slides and two lines of digitised screen text

Methodology

A survey was conducted of students who attended three of the courses in which the speech recognition technology was used (course INT100 was not included because the lecturer was unavailable). The survey sample included both students who had registered to use the technology and had access to the online lecture transcripts and students who simply experienced the screen text displayed during lectures. A set of 10 items was added to the 'student feedback on teaching' questionnaire administered during class at the end of the teaching period. Students were asked to rate their agreement or disagreement with statements using a 5-point Likert scale ('disagree strongly, disagree, neutral, agree, agree strongly'). The items covered a range of aspects of student experience with the technology, the specific wording being as follows:

I found it helpful to look at the lecturer's voice text shown on the screen during the lecture.
I looked at the screen text because it helped to fill in the gaps when I missed something the lecturer said.
I found that the screen text improved my understanding of the lecture material.
I found that the screen text improved my notetaking during the lecture.
I found the inaccuracies in the screen text to be distracting.
I found the inaccuracies in the screen text were less of a problem as the semester progressed.
I found it helpful to use the lecture transcripts on the Liberated Learning website.
I used the lecture transcripts on the website instead of taking notes myself during the lecture.
I plan to use the lecture transcripts on the website to revise for the assessment.
I think the screen text technology had an overall positive effect on the way the lecturer delivered the lectures.

The questionnaire consisted of a question sheet and a separate response sheet designed for computer scanning. Lecturers distributed the questionnaire for completion at the end of class and response sheets were collected by an independent person who forwarded them for analysis. The number of respondents for the three courses was as follows: ACC220 = 64; BUS101 = 84; CPH252 = 18. The response rates based on course enrolment as distinct from class attendance were: 73%, 20% and 69% for the three courses, respectively. The low response rate in course BUS101 was due to the unusually low class attendance on the day of the survey - it was observed that the questionnaire was completed by virtually all students present.

Results

The data were entered into SPSS for analysis. Table 1 lists the percentage of respondents expressing agreement (ratings of 'agree strongly' or 'agree') with the 10 questionnaire items for each course. Only a small proportion of respondents reported finding that the screen text was helpful during lectures (see Item 1 in Table 1) (viz., 11%, 18% and 11% for the three courses, respectively). Slightly higher percentages of respondents reported that the screen text helped fill the gaps when they missed something in the lecture (see Item 2). In the case of course BUS101, more than a quarter of respondents (28%) used the screen text in this way compared with only 9% for course ACC220. This difference (Chi Square = 7.63; df = 1; p < 0.01) may be due in part to differences in the presentation style of the two lecturers concerned. The former was observed to have a much faster speech rate than the latter and hence students may have been more likely to miss something that could be retrieved by checking the screen text.

Table 1: Percentage of respondents agreeing with statements about
speech recognition technology for the three courses

Questionnaire Item Course

ACC220
n = 64 BUS101
n = 84 CPH252
n = 18

1. Screen text was helpful 11 18 11

2. Screen text helped when something was missed 9 28 17

3. Screen text improved understanding 3 12 6

4. Screen text improved notetaking 6 15 11

5. Inaccuracies in screen text were distracting 92 82 94

6. Inaccuracies were less of a problem over time 47 65 #

7. Online lecture transcripts were helpful 12 12 39

8. Online transcripts used instead of taking notes 7 11 6

9. Plan to use online transcripts to revise 45 41 56

10. Technology had a positive effect on lectures 12 25 17

# Item inadvertently omitted from questionnaire

Other aspects of the screen text investigated were whether it improved understanding and notetaking. Again, the percentages of respondents experiencing these potential benefits were small. Improved understanding of lecture material was reported by between 3% and 12% of respondents across the three courses (see Item 3). The percentages reporting improved notetaking ranged from 6% to 15% (see Item 4). Almost all respondents reported being distracted by inaccuracies in the screen text (see Item 5 with rates of 92%, 82% and 94% for the three courses, respectively). However, it appears that many students adapted somewhat to this distraction effect with between 47% to 65% reporting that inaccuracies were less of a problem as the semester progressed. This is supported by anecdotal reports from lecturers who observed a decrease during the semester in overt disturbance (e.g., student laughter and comments) because of screen text errors.

In most cases, only small percentages rated agreement with items 7 and 8 (see Table 1). The exception is that for course CPH252 there was a high rate of agreement (viz., 39%) that the online transcripts were helpful. It is notable that appreciable percentages of respondents in all three courses (41% to 56%) stated that they planned to use the transcripts to revise for their exams (see Item 9). This result has to be interpreted in terms of the proportions of enrolled students who had access to the online transcripts by previously registering to participate in the project (viz., ACC220 = 68%; BUS101 = 41%; CPH252 = 70%). It is likely that most of those who had registered reported they planned to use the transcripts for revision. Finally, a minority of respondents (12% to 25%) felt that the technology had an overall positive effect on how the lectures were delivered (see Item 10).

Analysis of variance carried out on the mean ratings for each of the 10 items showed significant differences across courses for the following items (df = 2 in all cases): Item 1 (F = 3.76; p < 0.05), Item 2 (F = 6.96; p < 0.001), Item 3 (F = 6.47; p < 0.05), Item 4 (F = 3.54; p < 0.05), and Item 10 (F = 3.85; p < 0.05). Post-hoc comparisons using the Tamhane test (assuming non-homogeneous variances) indicated that for all five of these items, the mean ratings were not statistically different for BUS101 and CPH252 but that these two courses had significantly higher ratings than course ACC220. It was hypothesised that any differences in accuracy rates across courses might lead to differences in student reaction to the speech recognition technology. The mean accuracy rates in the three courses based on differences between the spoken lecture and the screen text were: ACC220 = 74%; BUS101 = 85%; CPH252 = 56%). The fact that the course in which the technology was significantly less favourably rated on five of the ten items (viz., ACC220) did not have the lowest accuracy rate suggests that screen text accuracy is not solely responsible for the differences in student reaction. This is corroborated by the analysis of variance on the mean ratings for items 5 and 6 about distraction from screen text inaccuracies which showed no significant effect for type of course.

Discussion

A major limitation of the present preliminary study is that the methodology restricted the amount of data that could be collected. By using the vehicle of student feedback questionnaires, only 10 Likert-scale items could be included and no demographic data were obtained. It is not known, for example, what proportion of respondents had registered to use the speech recognition technology and thereby access the online lecture transcripts. Nor is it known how much distraction students experienced because of screen text inaccuracies. A complete picture of student reaction to the technology will emerge only when further research is conducted. The methodology introduced two possible sources of bias. First, although the feedback questionnaires were distributed in a neutral manner, students would have assumed lecturers desired favourable responses about their teaching and the technology they used. Second, the survey sample was restricted to class attendees and may have been over-represented by those who were motivated in their studies and satisfied with the teaching they experienced. While it is not possible to determine the extent of any such bias, it is clear that in both cases it would have had a positive not negative effect on the results.

The results show that simultaneous screen text in lectures was of assistance to only a small proportion of the wider body of students in the courses investigated. Without demographic data, however, it is not possible to determine whether the screen text was more helpful for students with disabilities and those of non-English speaking background. The present findings contrast with those from a study of students at a Canadian university where the Liberated Learning Project was implemented using comparable technology (Leitch & MacMillan, 2001). The sample comprised 54 students in two classes where automated screen text was displayed during the lecture. In response to the question 'How useful was the digitized lecture in terms of improving your understanding of the lecture?', 94% of respondents rated it as useful ('extremely useful' = 13%; 'useful' = 52%; 'somewhat useful' = 30%). This contrasts with the present study where only 7.9% of respondents overall agreed with the statement that the screen text 'improved my understanding of the lecture material' (see Item 3). Similarly, in the Canadian study 92% of respondents rated the screen text as useful in 'improving the way you take notes' whereas only 11.4% of respondents overall in the present study agreed that the screen text 'improved my notetaking' (see Item 4). While the samples and questionnaire items are not identical, it is clear that there are major differences in the experiences of students in the two studies. A likely reason for these differences is that the present format involved dual visual inputs (2 or 3 lines of text presented below computer slides - see Figure 1) whereas in the Canadian case, students had a full screen of text with no additional visual input. Further research is needed to test this and other possible explanations.

It may appear that accuracy is the critical variable in speech recognition particularly in the light of the present findings that a minority of respondents said the screen text was helpful while the vast majority said inaccuracies were distracting (see Table 1). However, of the 13 respondents across the three courses who reported not being distracted by inaccuracies (by disagreeing with Item 5), only 4 also agreed in Item 1 that the screen text was helpful. Further, of the 24 respondents across the three courses who reported finding the screen text helpful, the clear majority (viz., 19 respondents) also reported being distracted. These results suggest that factors other than distraction from inaccuracies are involved in determining the perceived helpfulness of the speech recognition technology. Also, the analysis of variance results show that screen text accuracy does not explain the differences in student reaction across courses. The courses with the lowest and highest accuracy rates were rated equally on the various items (viz., CPH252 and BUS101, respectively).

Students who experience no difficulties hearing and comprehending lectures in English have the potential to access all the material presented (depending on their intelligence, concentration etc). Whether an additional input modality in the form of simultaneous screen text (with or without computer slides) enhances or impedes the learning of these students is a matter for empirical investigation. Research by Kalyuga and associates (Kalyuga et al., 1999; Kalyuga, 2000; Kalyuga et al., 2001) suggests that such students may experience 'cognitive overload' because of redundant information from the auditory and visual modalities. However, students who for one reason or another, are unable to input lecture material via the auditory modality may well find a benefit in accessing it visually from screen text. Also, online lecture transcripts derived from speech recognition technology provide a resource that has potential benefits for both students with disabilities and non-disabled students, particularly in course revision. Whether such transcripts can serve as flexible learning materials to be used instead of lectures is a matter for investigation. Because live lectures are typically more conversational than written text, lecture transcripts might not meet the instructional design requirements for fully effective resource-based learning (Moran, 1996). Clearly, speech recognition is an important and fruitful area for future research and development.

Conclusion

The format initially being trialled for speech recognition technology in university lectures presents simultaneous screen text to all students not only those with disabilities or language difficulties. The present study was designed to determine whether this technology was of assistance to the wider body of students in three courses at an Australian university. While not definitive, the finding that most students reported being distracted by inaccuracies in the screen text has implications for how the text is presented. In the current case, the screen text comprised two or three lines projected onto the main screen below the lecture slides. It may be difficult for students to avoid distraction and information overload with this presentation format. Given the finding that most respondents in this study did not find the screen text helpful overall, students should be given a clear choice about whether or not they access this technology. One option is to present the simultaneous text on a separate screen away from the main screen but positioned so that students can choose to sit so as to have both screens in their field of view. This would also allow more lines to be displayed thus making it easier for students to check for material that might be missed in the lecturer's oral presentation.

The present results indicate that only a small proportion of students reported finding that the screen text helped them in understanding the lecture material and in notetaking. This contrasts with the experience of students at a Canadian university where a different presentation format was used and the vast majority rated the technology as useful. Further research is needed to determine how simultaneous screen text in lectures affects learning in terms of both the subjective reaction of students and the objective impact on their comprehension and notetaking. The role of learning style is of potential relevance - visual learners, for example, may find screen text more useful than auditory learners. It is particularly important to test whether the use of multiple input media (spoken voice, screen text and computer slides) leads to cognitive overload as has been suggested by some previous studies.

Endnote

The Liberated Learning Project is coordinated by Saint Mary's University, Halifax, Canada, and is sponsored by IBM. The project is managed in Australia by Di Paez, University of the Sunshine Coast. See http://www.liberatedlearning.com/ [verified 13 Aug 2002]

Acknowledgements

The author gratefully acknowledges the research assistance of Michaela Wilkes in questionnaire design, Phil Gorbett in data collection and Maxine Mitchell in data analysis.

References

Attaran, M. (2000). Voice recognition software programs: Are they right for you? Information Management & Computer Security, 8(1), 42-44.

Bain, K. & Paez, D. (2000). Speech recognition in lecture theatres: Liberated Learning Project and innovation to improve access to higher education using speech recognition technology. Proceedings of the Eighth Australian International Conference on Speech Science and Technology, Canberra, 5-7 December.

Birchard, K. (2002). Stanford U. will test a computerized transcription system. The Chronicle of Higher Education, 24 January. http://chronicle.com/free/2002/01/2002012401t.htm [viewed 11 Feb 2002, verified 13 Aug 2002].

Buckler, G. (2001). Recognizing voice recognition. Computer Dealer News, 17(22), 17. http://ezproxy.usc.edu.au:2053/pqdweb?Did=000000094519481&Fmt=4&Deli=1&Mtd=1&Idx=108&Sid=1&RQT=309 [accessed 11 February, 2002].

Coco, D.S. & Bagnall, J. (2000). The Liberated Learning Project: Improving access for persons with disabilities in higher education using automated speech recognition technology. Paper presented at PEPNet 2000 Conference, Denver, CO, 7 April. http://e-education.mtt.ca/display.jkg/SMULLP/LLP/Spring2000/Pepnet.htm [accessed 25 July, 2000].

Kalyuga, S. (2000). When using sound with a text or picture is not beneficial for learning. Australian Journal of Educational Technology, 16(2), 161-172. http://www.ascilite.org.au/ajet/ajet16/kalyuga.html

Kalyuga, S., Chandler, P. & Sweller, J. (2001). Why text should not be presented simultaneously in written and auditory form. Unpublished manuscript, University of New South Wales, August.

Kalyuga, S., Chandler, P. & Sweller, J. (1999). Managing split-attention and redundancy in multimedia instruction. Applied Cognitive Psychology, 13, 351-371.

Leitch, D., & MacMillan, T. (2001). Improving access for persons with disabilities in higher education using speech recognition technology: Year II progress report, Unpublished Report, Liberated Learning Project, Saint Mary's University, Halifax, Canada.

Marshall, P. (2002). Voice recognition: Sound technology. Federal Computer Week, 16(1), 32. http://ezproxy.usc.edu.au:2053/pqdweb?Did=000000098985328&Fmt=4&Deli=1&Mtd=1&Idx=23&Sid=1&RQT=309 [accessed 11 February, 2002].

Moran, L. (Convenor) (1996). Quality Guidelines for Resource-Based Learning, Working Party of Resource-Based Learning, National Council for Open and Distance Education, Canberra, October.

Author: Andy Hede, University of the Sunshine Coast, Queensland.
Email: hede@usc.edu.au Web: http://www.usc.edu.au/

Please cite as: Hede, A. (2002). Student reaction to speech recognition technology in lectures. In S. McNamara and E. Stacey (Eds), Untangling the Web: Establishing Learning Links. Proceedings ASET Conference 2002. Melbourne, 7-10 July. http://www.aset.org.au/confs/2002/hede-a.html

[ ASET ] [ Proceedings Contents ]
This URL: http://www.aset.org.au/confs/2002/hede-a.html
Created 13 Aug 2002. Last revision: 13 Aug 2002.
© Australian Society for Educational Technology

Questionnaire Item	Course
Questionnaire Item	ACC220 n = 64	BUS101 n = 84	CPH252 n = 18
1. Screen text was helpful	11	18	11
2. Screen text helped when something was missed	9	28	17
3. Screen text improved understanding	3	12	6
4. Screen text improved notetaking	6	15	11
5. Inaccuracies in screen text were distracting	92	82	94
6. Inaccuracies were less of a problem over time	47	65	#
7. Online lecture transcripts were helpful	12	12	39
8. Online transcripts used instead of taking notes	7	11	6
9. Plan to use online transcripts to revise	45	41	56
10. Technology had a positive effect on lectures	12	25	17
# Item inadvertently omitted from questionnaire