Thursday, April 30, 2015

Factors affecting English vocabulary levels in foreign language speakers

The Test Your Vocabulary site has found a few factors that influence the size of a non-native speakers vocabulary.

Academic performance helps, up to doubling your vocabulary size. But that doesn't tell us what helps academic performance.

Classroom participation matters too, but it's not the top factor. It appears to give you up to a 50% boost in vocabulary.

Outside of class is the biggest difference. Students who do "lots" of things in English outside of class have more than twice the vocabulary of those who "don't do much."

Living abroad gets you to and beyond 10,000. Up to one year abroad brings the average student from around 7,000 to around 10,000 words. After that, every year abroad gives you around 850 more words, or around 2.35 per day. (Compare that to the average American adult who learns 0.85 per day.)


Tuesday, April 21, 2015

100 most common spoken collocations

Here's a study that has attempted to identify the top 100 collocations in spoken English to be used as a teaching tool for beginner to intermediate English language learners.

Spoken collocations are very frequent. The researchers found 4.698 collocations using the top 1,000 top pivot words.

"For most items at the same rank in the two lists [lists of collocations in spoken English and written English], spoken items are 50 per cent to 100 per cent more frequent than the items at the same rank in the written list. For example, the most frequent spoken collocation ‘you know’ has 27,348 occurrences in the 10 million running words, while the most frequent written collocation ‘of course’ has 2,698. Thus, collocations are particularly important in spoken language and courses focusing on spoken language should give particular emphasis to them, perhaps more so than in written English."


Figure 1 is a frequency comparison between single word types from the BNC spoken section (Leech et al.’s list (2001)) and collocations to show how many collocations would meet the frequency cut-off points to get into the first four thousand words of English.


1 you know 27348
2 I think (that) 25862
3 a bit 7766
4 (always [155], never [87]) used to {INF} 7663
5 as well 5754
6 a lot of {N} 5750
7 {No.} pounds 5598
8 thank you 4789
9 {No.} years 4237
10 in fact 3009
11 very much 2818
12 {No.} pound 2719
13 talking about {sth} 2489
14 (about [91]) {No.} percent (of sth [580], in sth [54], on
sth [44], for sth [38])
2312
15 I suppose (that) 2281
16 at the moment 2176
17 a little bit 1935
18 looking at {sth} 1849
19 this morning 1846
20 (not) any more 1793
21 come on 1778
22 number {No.} 1661
23 come in (swe, sth) 1571
24 come back 1547
25 have a look 1471
26 in terms of {sth} 1463
27 last year 1347
28 so much 1334
29 {No.} years ago 1314
30 {Det-the [879], this [39], a [21]} county council 1273
31 this year 1255
32 go back 1250
33 last night 1244
34 rather than 1243
35 come out 1163
36 very good 1160
37 I hope (that [455]) {N, S V} 1155
38 {No.} times 1147
39 that way 1145
40 said well (that, what) {S V} 1135
41 at the end (of sth [737]) 1122
42 {Det-that [425], this [146], the [142]} sort of thing 1113
43 for example (if S V [30]) 1107
44 as far as 1079
45 said to {smo} 1076
46 mean (that) {S V} 1066
47 come on (to swe, smo [65]) 1059
48 {FREQUENCY, QUANTITY} a week 1056
49 all the time 1044
50 thank you very much 1041
51 too much 1034
52 over there 1017
53 that sort (of sth [953]) 1016
54 looking for {sth} 990
55 make sure (that [394]) {S V} 990
56 very well 987
57 {Det-the [47]} last week 956
58 in the morning 952
59 it seems {N, A, to INF, that S V} 945
60 next week 940
61 a number of {sth} 929
62 out there 929
63 what I mean 929
64 get in (swe, sth) 912
65 find out {sth} 908
66 know that (S V) 889
67 leave it 886
68 at home 884
69 and so on 872
70 (about [226]) {No.} minutes 867
71 (do) n’t mind (sth) 862
72 other people 839
73 not really 837
74 talking to {smo} 829
75 mind you 822
76 want it 819
77 much more 816
78 looked at {sth} 805
79 the other one 805
80 (at [207], about [110], till [50], by [24]) half
past {No.1 12}
798
81 some people 797
82 this week 794
83 this time 787
84 very nice 784
85 I see 756
86 I bet (S V) 746
87 these things 742
88 call it (A, N) 737
89 (be-verb) not sure 721
90 at the time 717
91 thought that {S V} 714
92 going out 712
93 it comes 712
94 go out 711
95 quite a lot 711
96 even if 707
97 last time 704
98 hang on 701
99 believe that (S V, N) 696
100 (be-verb, become-verb) interested in {sth} 689

* { } signals an obligatory type of word that needs to occur in the collocation, ( ) signals
an optional but a possible part of the collocation, and [ ] brackets the ‘frequency figure’.
RK refers to Rank and FRE to Frequency (the number of occurrences in the corpus).

Source: http://www.victoria.ac.nz/lals/about/staff/publications/paul-nation/2008-Shin-Collocations.pdf

Collocation categories

Grant and Nation (2006) suggest that further distinctions can be made between different types of collocations on the basis of the way the meanings of the parts contribute to the meaning of the whole. Grant and Nation (2006) distinguish core idioms, figuratives and literals using the two criteria of compositionality and figurativeness (Grant & Bauer, 2004).

Core idioms are non-compositional (the meanings of the parts do not really reflect the meaning of the whole) and non-figurative. Frequent examples include so and so , and what have you, by and large, etc. Grant and Bauer (2004) found 104 core idioms, but the three core idioms, as well , as well as, and of course were additionally found in this study. These three core idioms were not included in idiom dictionaries used as the data source in their study.

Figuratives are also non-compositional but they are figurative in that by using an interpretation strategy the literal meaning can be linked to the figurative meaning. Frequent examples include stepping stones, at the end of the day, head over heels . Literals are compositional and non-figurative. The parts directly relate to the meaning of the whole — thank you very much, all the time, twice a week . It is worth classifying these items into different categories because different categories of multi-word units need to be treated in a different way when they are taught and learned. Table 4 summarises the types of collocational groups discussed above.

TABLE 4 Types of Collocational Groups
Core idioms (e.g. by and large)        Cannot be predicted or analysed
Figuratives (e.g. stepping stone)      Cannot be predicted and need to be interpreted
ONCEs (a long face)                        Only one element cannot be predicted
Literals (e.g. twice a week)              Can be predicted or analysed

The use of the category of ONCEs could be unnecessary because the one non-compositional element that ONCEs contain could be considered a polysemous or homonymous use of a word. For example, the word long of long face is used with the meaning of gloomy or worried which is not related to the notion of length. So it was considered non-compositional but some might argue that use of long comes from its polysemous use. That is, the word long could be used in more than one sense. [So ONCEs could be considered Core Idioms.]

Collocation Research

Here is an interesting study of collocations in the English language:
https://www.academia.edu/8563631/The_high_frequency_collocations_of_spoken_and_written_English

There are two significant findings by the researchers:
  1. Differences between collocations between spoken and written sets. In particular how different the collocations are between the sets: Each has a very distinct set of collocations. And secondly, the high frequency of collocations in spoken language compared to written language.
  2. The large number of collocations meeting the criteria, and the large number of these that would qualify for inclusion in the most frequent 1,000 items in English if no distinction was made between single words and collocations.
The here-and-now nature of spoken language is reflected in items like this morning, at the moment, last night, and over there , and the personal and interactional nature is reflected in items like thank you, thank you very much, you know, I think , and come in.

Personally, I find the most significant finding the enormous difference in the frequency of the items. Although the total number of different items meeting the various criteria was virtually the same in both corpora (2,261 in the spoken corpus and 2,266 in the written corpus), the top 50 spoken collocations occurred 147,217 times, while the top 50 written collocations occurred only 48,782 times. That is, the top 50 spoken collocations occurred almost three times as often as the top 50 written collocations. Spoken language makes much more frequent use of its common collocations than written language does. These results show that spoken collocations have a more important role in spoken language than written collocations do in written language, thus, spoken collocations particularly deserve attention in language teaching.


There were approximately 2,300 spoken and written collocations.

All top fifty spoken collocations would qualify for entry into the most frequent 1000 words of spoken English. All the top 50 spoken collocations are within the cut-off point for the first 1,000 single word types, and by comparison 14 written items would make the top 1,000.

There are 162 collocations in the spoken corpus which would get into the top 2000 words of spoken English, and 56 of these would be in the first 1000. There are 41 collocations which would get into the top 2000 words of written English, 14 of these would be in the first 1000.There are thus a large number of collocations that are of very high frequency.

Here is a list of the top 50 spoken collocations

Corpus resources

Word and Phrase

Word and Phrase focuses on collocations:
http://www.wordandphrase.info/frequencyList.asp

It shows:
  • The rank by frequency in the corpus
  • Definition
  • Collocations by frequency
  • The relative frequency in spoken, fiction, academic, newspapers, magazines and well as the number of occurrences
  • Examples in use
It allows you to sort on a variety of parameters. Especially useful is the ability to look up collocations quickly.

The Corpus of Contemporary American English

This is the largest freely available corpus. It consists of 450 million English words from the period of 1990 to 2012. 


Monday, April 13, 2015

Ijus ruhlized tha ti dont speak Engluhsh

As someone who is just becoming an ESL teacher, I'm becoming aware of language in a way similar to a fish realizing that it's wet. Because I'm immersed in my own language, it's ubiquitous. Like a fish in water, I  have no awareness of my liguistic environment because it's such an integral part of my being. 

Something that has started to enter my awareness is the extent to which spoken language is nothing like the written word. In fact, they are two completely separate languages. Like every native speaker, I assume that speaking is just an oral rendition of what's written. But as I listen to my own language for the first time, I feel like I'm listening to a very foriegn tongue. 

For my students, I go up to the board and write the conjugation for the verb to be: 
I am
You are
He is

But as I say the words out loud for them, I hear a completely different sound than what those letters would indicate. As I read them to my students I hear myself saying:
I yam
You ware
He yiz

Then suddenly I'm aware of myself dropping these y's and w's everywhere. I'm peppering my entire vocabulary with these extra little sounds. 

I yasssume that the yextent to which the y yand w sounds begin to wenter into the spaces between the yuntterances I make is to wa foriegner very strange. 

See what I mean! I am speaking a strange language that I'm starting not to recognize. 

And then I break up words so that every syllable I say starts with a consonent so that asking something simple like, "Can I have a bit of egg?" becomes, "Ca ni ha va bi tuh vegg?" 

Seriously! I speak like that and if you listen to yourself you'll hear the same thing come out of your mouth. 

And it only gets more bizarre. I realize that an uh sound replaces numerous vowel sounds in words with more than two syllables. 

The ruhlization that uh ruhplacuhs numuhruhs vowuhl sounds is buhzarre. 

Not only yam I droppuhn the y yand w buht cuhmpletely yignuhruhn vowuhl sounds uhn gruntuhn my sentuhncuhs. 

Anifya havuhntauhready notuhced, I drop suhm ledders cuhmpletely, sluh ruthurs tughether, uhn turn allo duh t's intuh d's. 

The language we speak is not the English we write. They are completely different! It's a wonder that any forienger can understand us!