IIMS 1992: Nelson - interactive digitised audio courseware on Amiga, Macintosh, and PC

Developing interactive digitised audio courseware on Amiga, Macintosh, and PC platforms: A comparison of common support facilities available

Larry R Nelson
Faculty of Education
Curtin University of Technology

Background

The impetus for the project described in this paper stems from work I have done in Indonesia, and from a new product release statement from London.

The Indonesian project was known as UNDP (United Nations Development Program) INS/831033. The project was meant to bring improved operating efficiency to a large branch of the Indonesian Ministry of Education. The part of the project I worked with had to do with the installation and application of MS DOS microcomputers; my 'team' focused on the development of computer training resources for staff.

As a peripheral, 'offline' aid to staff learning how to use a computer, CALL (computer aided language learning) software was developed by a subcontractor. The CALL system was written in Pascal. One of its subsections had to do with vocabulary building; a number of screens featuring pictures or drawings of computers and peripheral devices were used to help Indonesian staff learn the English language names for hardware components.

The CALL software was designed to be used on the simplest of PCs. It could run on a single floppy PC with 640 kilobytes of random access memory. It did require a colour display, but only at the 'CGA' level. No audio aids were built into the software; most readers would be aware that the audio playback capabilities of a simple IBM compatible PC are essentially nonexistent.

At about the time the first working prototype of the CALL system was released for use by staff, an article appeared in the November 1989 edition of the Australian Personal Computing magazine detailing the technical characteristics of a new, all solid state notebook size computer from PSION Computers, London.

The PSION MC-400 runs on an Intel 80C86 processor, uses eight conventional 'AA' (or MN 1500, LR6) batteries, and has not only an inbuilt speaker, but also a microphone. Along one of the sides of the MC-400 standard mini-plug jacks permit headphones and/or an external microphone to connect to the unit. A touch pad just below the unit's display screen emulates the use of a mouse as a pointing device; the screen itself has a matrix 640 pixels wide, by 400 high.

The addition of a voice processor module would allow users to record and playback their own diary notes, or to leave dictation for secretarial staff to type. According to a product information sheet from PSION, the new voice compression techniques employed in this module would permit eight minutes of speech to be stored in 64 kilobytes of central or secondary memory. If accurate, this would signify a giant step forward in digital speech recording technology; previously, getting a single minute of high quality, immediately useable digitised voice to fit in less than 512 kilobytes was doing well.

Many things about the MC-400 beckoned the would be CALL software developer.

I submitted an application for a small research grant from Curtin University's Division of Arts, Education, and Social Sciences, and, with an award of under $3,000, began an investigation of the suitability of the MC-400 for CALL.

Instructional design

I mapped out an audio based CALL system which would have four basic learning modes.

Given a screen full of objects, such as a drawing or digitised picture of a personal computer and common peripherals, Mode 1, 'Shoot', would allow a user to point to any object, click on it, and hear the object's name spoken in the language of choice, by either a male or female native speaker.

Mode 2, 'Sequential', would permit the user to sit back and have the system step through the various objects one by one, highlighting or pointing to them on screen and saying their names in the language and gender selected. In this mode the user has only to listen, but can interrupt the presentation at any time.

Mode 3, 'Random', would have the system randomly select one of the objects, play its name over the computer's audio channel, and give the user a certain amount of time to find and 'hit' the object with the pointing device (mouse, track ball, or touch pad).

Mode 4, 'Voice', has the software pointing to or highlighting one of the objects at random, and then asking the user to say the name of the object into a microphone. When the user has finished, her or his voice is played back, followed immediately by the voice of a native speaker saying the name of the object. The objective of this mode is to permit the user to compare his or her pronunciation with that of an 'expert'.

These learning modes are far from being the only ones which might be developed in an interactive audio lesson. In terms of programming and management of hardware resources, however, they are suggested as being comprehensive in scope.

Implementation on the PSION MC-400

Unfortunately there is not much to report here. While I have been using an MC-400 for over a year, the voice module has yet to materialise.

The MC-400 comes with a structured programming language called OPL; I have experimented with it, and believe that it has a fair chance of being capable of supporting the programming needed by each of the four learning modes mentioned above. However, the manufacturer has not yet made it clear, at least not to Australia based users, how to access the graphics ability of the computer, which is obviously considerable.

As of this writing, November 1991, the manufacturer was once again said to be on the verge of releasing the voice module. In fact I believe this to be true; PSION is, I understand, well regarded in Europe, where it has had a variety of computer devices on the market for many years.

I can report that the computer itself is a robust unit, and will indeed run for tens of hours on its batteries.

Implementation on Amiga hardware

I felt it highly desirable to have the CALL system operating on a portable computer. The lengthy delay in the release of the MC-400 voice module, however, necessitated a shift in focus.

For a variety of reasons, I decided to begin my experimentation with alternative 'platforms' by looking at the Amiga. A large factor here was the well known capability of the Amiga to support audio; an off the shelf Amiga 500 has no less than four audio channels built in, and the stock monitor has a set of stereo speakers in it, with a volume control. Another factor was price; the Amiga 500 is very economical. My thought was that, should the CALL system prove to be a potent language learning aid, a laboratory of 10 to 12 Amiga 500s could be set up for quite a bit less than an MS DOS lab, or a Macintosh lab.

I can summarise many hours of experimentation with the Amiga family by saying that it is without doubt capable of supporting CALL systems of the type addressed in this paper. I developed a working prototype on an Amiga 3000; it installed and ran easily on the next model down, the 2000. Given that it runs well on a 2000, there's no reason why it will not work on a model 500.

The Amigas do not come with a sound digitiser built into them; although all the playback facilities are there, one has to buy a separate digitiser and microphone for recording purposes. These are not expensive.

I strongly believe that language learners will favour the use of headphones over speakers. The Amigas do not have an output jack for phones; I bought an inexpensive headphones amplifier with bass and treble controls, and directed the standard audio out from the Amiga processing unit into this instead of the stereo amplifier in the Amiga's monitor.

Digitising as a transparent task

Getting the four learning modes to run as wanted was at times challenging.

An initial hurdle, one later re-encountered on MS-DOS machines and Macintoshes, had to do with the way manufacturers of sound digitisers appear to assume their hardware and software will be used. It seems that the orientation is one of providing a stand alone support facility, something which will allow sound to be digitised and stored on disk for recall by other software.

This design 'philosophy' is not compatible with the 'Voice' learning mode described above. I want to be able to open up the microphone for recording from within my application. I want the support of the digitiser without having it get in the way, in fact, without having the user even realise that the services of another program are being called on.

The REXX language is supported in the Amiga, where it is known as AREXX. Briefly, one could say that REXX is designed to support communication among applications running in a multi-tasking environment (somewhat like Window's DDE, dynamic data exchange). Unfortunately, the present Amiga audio digitisers available in Australia do not support AREXX When one which does comes along, it should be straightforward to use a digitiser as a transparent task.

This initial impasse was solved with the help of programming staff from a local Amiga supplier. A small program was written in C, compiled, and found to work well.

AmigaVision and CanDo

I had not used an icon based authoring system before this project, and so appreciated the opportunity to develop the first prototype with AmigaVision, Version 1.53g.

Some readers will have heard of Authorware and Icon Author for Macintoshes and IBM PC compatibles; AmigaVision is a similar product. It can be used by non-programmers to develop true multimedia applications on the Amiga; it is powerful, and does much to encourage the development of modularised 'code'.

Nonetheless, after getting the initial system working, I left it fairly quickly in favour of a hypertext system called CanDo. I felt the icons in AmigaVision at times cluttered the road, hindering rather than assisting. The version I used had no runtime support; it required users to have their own copy of AmigaVision (not a great expense at all, costing only about one per cent of a stand alone copy of Authorware, but still an inconvenience). And, AmigaVision 1.53g has no support for arrays. I wanted, for example, to have a Boolean array to indicate whether or not an object had been used in the lesson, and had to devise a work around using a long string variable of zeros and ones instead.

To be fair to AmigaVision, the new version, 1.7, is 'just about ready'. It is said that it will provide runtime support, and also arrays. (Will it appear before the PSION voice module?)

CanDo is a flexible, adept hypertext system which is totally integrated with the Amiga's WorkBench operating system. In terms of the MS-DOS world, CanDo does more than LinkWay, but less than ToolBook. In terms of Apple HyperCard, my impression is that CanDo is more powerful than HyperCard 1, but less capable than HyperCard 2.

Unfortunately, CanDo is under-documented, and lacks the ability to detect a mouse button press unless the mouse is over a clickable object. It also has a cumbersome interface for program development, with what on other systems would be a system of drop down menus popping up and down at the very bottom of the screen.

Still, it has extensive power in the Amiga environment, and was found to provide very good support for developing the four learning modes.

Resource organisation

The fundamental lesson 'events' for the four learning modes centred on screens full of objects. Each screen became a 'drawer', or directory, on the hard disk. Background information intrinsic to the entire screen, such as text files found in popup message windows, was stored in files on this directory.

The objects used by the screen were created by a standard paint program, 'DeluxePaint', and placed in a subdirectory. The audio files were created on the Amiga using the 'FutureSound' program, edited as needed with 'Audio Engineer', and stored in another subdirectory. The audio input to FutureSound was from tape.

One can get quite lengthy path prefixes for the various files needed by the running program. For example, the complete generic specification for an Amiga based audio file containing the recording of an Indonesian male saying the Bahasa Indonesia word for the keyboard object found on the IBM-PC-Components screen would be:

VolumeName/ScreenName/AudioSubdirectory/LanguageGenderObject(12) At run time the program makes appropriate substitutions for the key words in this string; for example, 'IBM-PC-Components' is put into ScreenName, 'Ind' is put into Language, and 'Male' into Gender. The 'keyboard' happens to be object number 12 on the IBM-PC-Components screen.

Hard disk sizes needed

Digitised audio files can consume large portions of a hard disk rapidly.

In the Amiga prototype which was developed, audio was digitised at 20 kHz with no compression. The 15 objects on the IBM-PC-Components screen, when spoken in English by a bilingual female adult (English as first language), took an average of 28,416 bytes per object, for a total of 426,245 bytes.

The 15 objects on this screen ranged from 'printer', taking 16,664 bytes, to 'left mouse button', which took 42,980 bytes to store.

It took about two megabytes of hard disk to store the audio clippings of four adults pronouncing the names of all 15 computer objects in two languages, Indonesian and English. In comparison, it took only 11,648 bytes to store the graphics of the objects, using the standard medium resolution video mode on the Amiga, with 16 colours.

Another screen developed in the Amiga prototype related to numbers. Thirty-two boxes were displayed on screen, each containing a different number. The average audio file for an object on this screen required 15 kilobytes, and, again, the total size of the audio clippings for four adults speaking the numbers came to about two megabytes.

This is obviously expensive in terms of hard disk space. The quality of the recorded audio, however, was judged to be excellent when played back through speakers, and still good to very good when headphones were used. By 'good' here is meant that the quality was perceived to be at least as good as that which would result had a conventional portable cassette recorder been used with headphones.

In the next prototype experiments will be made with lower sampling frequencies, and with file compression ratios. Initial trials in this regard indicate that it may be possible to obtain acceptable headphones quality with audio files taking only a third to a half of the densities reported here.

To be noted is the need for a good audio file editor. Significant bits of disk space can be saved through the use of an editor which allows pauses in speech to be snipped from the recording. A good editor will also permit the volume level of the digitised audio track to be adjusted up or down without having to go back to the analogue source (tape). The ability to 'ramp' at the start or end of a clip helps to control for noise, and is another worthy feature to have in an editor of digitised audio files.

Implementation on MS DOS hardware

When representatives from IBM Australia happened to view the Amiga prototype in operation, it was suggested that it might be worthwhile to see if IBM 'LinkWay' hypertext software, working through an 'M Audio Capture/Playback' card in a standard XT or AT, would do the job on the PC platform.

Several products have already been released which feature digitised audio playing back under LinkWay supervision, and IBM was able to provide LinkWay code which clearly demonstrated how both recording and playback can be accomplished using the M Audio card. This code suggests that it is possible to interact with the M Audio software in the background, even in record mode.

It would seem that there is potential here, but my efforts to obtain either an MCA or AT-bus version of the M Audio card were unsuccessful, despite several weeks of perseverance.

At the time this paper was prepared, Microsoft was just about to release the first version of its multimedia extensions for the PC. Hopefully these will bring much needed standardisation to the field, making it possible to use a number of audio digitisers and playback devices on the PC, with a choice of software to work with.

This is certainly not the case at the moment. LinkWay seems a capable program, and it runs under DOS (as opposed to Windows), but it doesn't support a range of sound digitisers. The 'ToolBook' hypertext system under Windows is as powerful as CanDo on the Amiga, and easier to use, but its initial version has no support for audio at all. The 'multimedia version' of ToolBook was released in Australia just as the due date for this paper came up.

The IBM 'StoryBoard Live!' live package, another DOS product, can interface nicely with the well known 'SoundBlaster' card for the PC, permitting both recording and playback with no additional software. StoryBoard Live!, however, is not a program one could use for developing an interactive language learning aid.

Authorware Professional for Windows, 'APW', was released in Australia as this paper was in preparation, and provides support for the playback of audio files which may have been produced by a number of digitisers, including those which comply with the standards prescribed in the to be released Microsoft Multimedia Extensions. However, at this time it is not at all clear how or if an author ofinteractive audio courseware could record sound with APW.

In short, the digitised audio 'picture' on the PC has not yet approached a steady state condition. I will be continuing my investigation into the possibility of using LinkWay and the M Audio card under DOS, and will look at ToolBook, Authorware, and Microsoft's Visual Basic system as potential means for developing interactive audio under Windows. Of all of these, I would give priority to the LinkWay/DOS route as this would be the most promising way to get interactive audio to the widest possible audience, at least at the present time.

Implementation on Macintosh hardware

Some readers may consider the Mac to be a most logical environment for the development of an interactive system such as that of concern to this paper.

The Mac range has had an audio out port since the first model appeared some seven years ago. The addition of a now familiar Mac peripheral, Farralon' s 'MacRecorder', made it possible to develop interactive audio HyperCard stacks years ago (in fact, an example of one shipped with what I am told were early versions of MacRecorder).

A barrier to systems development on the Mac has, in my opinion, long related to cost. In addition, I have felt the lack of colour support on the Macintosh to make it a less desirable delivery system for computer aided learning of any sort.

These impediments are disappearing. The very latest version of the Mac Classic, another product released in Australia as this project was on going, includes sound recording as well as playback. The Mac LC and the Mac IIsi have supported record and playback since their inception, and, as most readers will know, are colour based. Apple's Audio Palette, a weakly documented but quite capable sound editing resource, is standard system software on all new Macs. HyperCard 2.0 is now readily available, and Authorware Professional for the Macintosh, 'APM', has been in Australia since about mid 1991. Lower costs, colour, in-built digital audio recording and playback, and powerful authoring systems such as HyperCard 2.0 and Authorware bode well for the porting of the Amiga prototype to the Mac.

One of our postgraduate students, Geoff Rehn, had a working interactive audio CALL system going as of early November, 1991, running on a Mac LC, using both HyperCard and Authorware. The initial versions did not focus on all four of the learning modes mentioned above, but first impressions confirm that HyperCard will allow the job to be done quite easily, albeit without colour. APM will permit colour to be added, at some sacrifice in programming flexibility.

(Geoff Rehn's interest in interactive CALL relates to work with Aboriginal students at Edith Cowan University. Some of his work is further described in another Symposium paper; refer to Rehn, G. F., http://www.aset.org.au/confs/iims/1992/rehn.html)

Final comments

In the world of interactive computer based audio lesson material, it would not be at all inaccurate to apply the "hang about, mate, it's all happening now" maxim.

My own guess is that Australia based authors will have a full platter of interactive audio tools to work with as of mid 1992, no matter what platform they choose to operate with. In fact, the plate is already pretty well stocked in the Mac world; Amiga users presently need a non-standard special program in order to record without having the digitising software get in the way, but this need can be expected to disappear; PC users will require time to come to grips with the new Microsoft Multimedia Extensions. If there is haze in the picture as 1991 draws to a close, it would be in the PC area, but I do expect to be able to successfully pursue the LinkWay/DOS path mentioned earlier.

The technology behind PSION's about to be released voice module could represent a significant milestone for digital audio. PSION is known to be working with Texas Instruments, a firm which some readers will no doubt recognise as a major producer of audio chips and components. The ability to fit one minute of digitised voice in one kilobyte of memory and have it immediately useable, with no detectable delay for decompression, will be worth several gold medals (when it comes true).

Having myself been a student of a 'foreign' language, with a pressing need to have a working degree of listening comprehension in as short a time as possible, I see immediate value in interactive, audio based CALL. Colleagues and students who have seen the Amiga and Macintosh prototypes mentioned in this paper have been nearly effusive in their reaction and encouragement.

But they were not well briefed on the amount of work needed to assemble the interactive lessons. The final word is far from in. It may indeed be possible to develop and deliver interactive CALL using native speakers, but many questions remain regarding the price and nature of needed resources, and, above all, the cost effectiveness of the final product.

Please cite as: Nelson, L. R. (1992). Developing interactive digitised audio courseware on Amiga, Macintosh, and PC platforms: A comparison of common support facilities available. In Promaco Conventions (Ed.), Proceedings of the International Interactive Multimedia Symposium, 483-492. Perth, Western Australia, 27-31 January. Promaco Conventions. http://www.aset.org.au/confs/iims/1992/nelson.html

[ IIMS 92 contents ] [ IIMS Main ] [ ASET home ]
This URL: http://www.aset.org.au/confs/iims/1992/nelson.html
© 1992 Promaco Conventions. Reproduced by permission. Last revision: 5 Apr 2004. Editor: Roger Atkinson
Previous URL 30 Mar 2000 to 30 Sep 2002: http://cleo.murdoch.edu.au/gen/aset/confs/iims/92/nelson.html