Sophie Scott is the group leader of the Speech Communication Group at the Institute of Cognitive Neuroscience at University College London (UCL) (UK). She was awarded a Ph.D. at UCL in the acoustic basis of rhythm in speech and then spent several years as a postdoctoral researcher at the Medical Research Council Cognition and Brain Sciences Unit in Cambridge (UK). She currently holds a Wellcome Trust Senior Fellowship and has been funded by the Wellcome Trust since 2001.
Her research uses functional imaging to investigate the cortical basis of human speech perception and production, applying models from primate auditory processing to the neural basis of human perception. She is particularly interested in the different kinds of information conveyed when we speak and how the acoustic information in our voices can be processed in different ways in the brain.
We learn to read in a very different way from learning to speak. Speech is embedded in our social interactions from the minute we are born and even before birth we can hear our mother’s voice in utero_. These prelingual twins=JmA2ClUvUY show how you can understand verbal interactions, before you even have words at your disposal.
Learning to read, in contrast, is something that we largely learn to do when we are at school, where we are specifically instructed how to do it. There are different writing systems:
• Logographic (like Chinese) where a written word or a meaningful part of a word is represented by a single written element (though that symbol may contain phonetic and semantic information).
• Syllabaries: (e.g. Cherokee) where a written element conveys a whole syllable.
• Alphabetic: (e.g. English) where a single written element roughly represents a single speech sound.
Each of these systems has their own unique advantages and disadvantages. Notably, alphabetic writing systems can differ widely in how easy they are to acquire. Children learning to read English, which is highly irregular in both spelling and pronunciation, do less well at reading non-words after a year of reading, in comparison to children learning to read Spanish or Finnish (Aro and Wimmer 2003) English-reading children only catch up on their Finnish peers in grade 4.
In addition to the undoubted many values of literacy, we can see the impact of learning to read in a variety of ways. For example, it is harder to name the colour of the
ink in the word green than in the word grown. This Stroop effect is commonly used to demonstrate how meaning can interfere with cognitive processes – if you are naming ink colours as fast as possible, competing colour names will slow you down. Importantly, this can only occur because once you are a skilled reader, you can’t ‘switch off’ your reading when trying to name the ink colours, which is how the competing semantic information can get into the system. As skilled readers, it is nearly impossible for us not to read words.
The skill of learning to read also forces us to engage with sounds in ways that differ from what we have to do to understand spoken language. Some abilities in the manipulation of speech sounds are present before we learn to read (e.g. being able to tell that two words rhyme), while others emerge as a consequence of our learning to read. Thus, segmental skills – being able to break a word down into separate chunks corresponding to individual speech sounds – are something that we acquire when we learn to read. People who have never read find it hard to split ‘cat’ into ‘c’ ‘a’ and ‘t’ (though not completely impossible).
Maybe because of the skills we acquire when we learn to read, psychologists and cognitive neuroscientists often use these segmental skills as an index of speech perception ability, despite the fact that people who haven’t learnt to read and who therefore find such tasks hard, can understand speech perfectly well. Because we can break spoken words down into smaller chunks, it is often assumed that this must be a central aspect of speech perception.
This bias towards segments in speech may have had other, more central effects on how we construe the problems of speech perception. It has been suggested, for example, that reading with an alphabetic system has biased us into the belief that the smallest units of speech, phonemes, are perceptual realities in terms of how we process speech from sound to meaning (Boucher VJ 1994 Alphabet-Related Biases in Psycholinguistic Inquiries – Considerations for Direct Theories of Speech Production and Perception, Journal of Phonetics, 22.1: 1 -18). The argument goes that because we can segment speech into phonetic elements (a skill we acquire when we learn to read) and because we are immersed in a reading system which represents spoken words as sequences of alphabetic symbols, that we implicitly assume speech to have these characteristics.
This assumption has had scientific consequences. For a long time, theories and models of spoken word comprehension incorporated a phonetic level of representation (e.g. the TRACE model.) The problem with phonemes is that any one speech sound will be greatly altered by where it is in a word and the sounds around it – in English the sound ‘l’ is very different at the start of the word ‘leaf’ than at the end of the word ‘bell’. There are also co-articulation effects, which refer the ways that the same speech sound is affected by its neighbours: the ‘l’ at the start of ‘let’ differs acoustically from the ‘l’ at the start of ‘led’ because of differences between final ‘t’ or ‘d’ phonemes. This covariance is highly useful to the listener and it makes sense that the perceptual system would preserve this detail. If you are building a computer system to understand speech, for example, you don’t build one to identify particular phonemes, you build it to look across sequences of sound, either groups of phonemes or whole words. Indeed, more recent psychological models of human speech perception explicitly do not make the assumption that phonemes need to be identified prior to comprehension (e.g. Shortlist B, Norris and McQueen 2008.)
At a brain level, can we see any sensitivity in speech perception areas to phonemes, as opposed to sequences of speech sounds? We recently investigated the neural activity seen when people silently rehearse pseudo-words. We varied how long the pseudo-words were in syllables (e.g. sapeth vs sapethetis) and how phonetically complex they were (e.g. sapeth vs stapreth.) This enabled us to separately identify brain areas which are more activated when people try to maintain longer or shorter pseudo-words, from those that are more activated when there are phonetically complex sequences in the material we are rehearsing. Silent rehearsal recruits both auditory and motor brain systems, both of these systems were sensitive to the length of the pseudo-words. In contrast, only the motor output systems were sensitive to the phonetic complexity of the pseudo-words, being more active when phonetically more complex sequences were rehearsed. This finding suggests that auditory areas are less sensitive to specific phonetic details, unlike motor systems. In turn, this may mean that if phonemes are ‘real’ phenomena in the language system, they are implemented in the motor systems, not in perception systems. In other words, we may not need to extract phonemes to understated speech, but they may be important elements in speech production.
My son is currently learning to read and write and watching his delight at solving the ‘problem’ of the sounds in words and what rhymes with what, is a joy and a privilege to see. Overhearing his dad explaining why ‘bird’ contains an ‘r’ letter (short answer, he had a heroic go at pronouncing it as ‘biRRRd’ as if he was from the Scottish highlands) showed me both the problems that written English presents to someone learning it, as well as the dominance reading can cast over what sounds we think there are in words.