Following her webinar, Annette and the English Profile team have kindly answered some of the questions we didn't have time for yesterday...
1. There is no data from my country. What can we do to get data from our country into English Profile. Who can submit data to English Profile?
The map that Annette showed indicated which countries we have data from for our Cambridge English Profile Corpus. Remember that the main source of data for English Profile was actually the Cambridge Learner Corpus, which has data from exam students from all over the world. The Cambridge English Profile Corpus is an additional corpus designed to collect non-exam data – to give a better balanced view of learner language. Yes, there are plenty of countries we would like to get more data from – so please do get involved. Anyone can become a data contributor – we welcome contributors from any country. If you would like to get involved, please go to www.englishprofile.org, click on ‘Get Involved’, and fill in the form. We will then help you to prepare and submit data.
2. Do you have to work in a school to contribute data?
No, anyone can contribute data to the Cambridge English Profile Corpus as long as they have access to learners of English. Also Individual learners can submit their own data independently, if they are over 18. They simply need to complete one of our consent forms. However to gain access to the corpus we ask that each contributor submit the work of at least 10 students.
3. Is English Profile free to everyone?
The English Vocabulary Profile is available to everyone for free at the moment. Subscribe for free on the English Profile website. Anyone can use it for personal reasons or for their teaching, but you need permission from Cambridge before you can use it for commercial purposes.
4. Is the corpus free for everyone to use?
The Cambridge Learner Corpus is an important source of data for English Profile. It is not available to the public directly, though teachers and learners benefit from it because Cambridge uses it to improve its learning materials.
5. How do you tag the data in your copora?
Our corpora are tagged for part of speech information using an automated process. We also error code our learner corpus manually using a team of trained coders.
6. Who decides what level these words and expressions are? And based on what criteria?
English Profile is unique in that it draws heavily on learner data. With millions of words from exam scripts (where we know the precise level of the learner), we are able to identify which words, senses and phrases are typically mastered at each level. A team of lexicographers, led by Annette Capel, combined this information with other sources – major course-books, commonly used wordlists, exam item writer guidelines, etc – to produce the most authoritative view available on language levels. After the English Vocabulary Profile was first developed, it was trialled and validated for a year before release – and a number of changes were made as the result of feedback during that stage. There are articles in the English Profile Journal with more information on how levels were decided – visit www.englishprofile.org to see the Journal for free.
7. How many learners need to use a word for it to be considered typical for the level?
It was not an exact science. When using the Cambridge Learner Corpus the minimum number of citations to be found was usually 14 and representing a variety of first language backgrounds, though this was not always the case with phrasal verbs and idioms, which will always be found in fewer numbers in text corpora. As said during the webinar, the CLC learner evidence was supplemented by other sources (see articles in the EP journal for more details).
8. Are word lists available for download to do lexical analysis (i.e. upload a student's writing and analyse the CEFR level of each piece of vocabulary in his/her writing)?
At the moment, the English Vocabulary Profile doesn’t have this function. Cambridge is able to do this type of activity in-house and have used it in research collaborations with other organisations, but unfortunately we’re not able to make it available to the public at the moment.
9. Do the samples you use in the corpus cover both written and oral performances of students?
At the moment, we have relied mainly on the Cambridge Learner Corpus, which contains written texts. However, we are building up the Cambridge English Profile Corpus with non-exam data and aim to add more spoken data. The data and insights from English Profile will be continuously developing as we increase the range of data.
10. Doesn't the data collected reflect more on the teacher's use of English, rather than the student's?
By collecting data from a huge range of sources, countries, types of school or college, we have reduced the likelihood of specific courses or teacher input being significant. Even with a single class and teacher, there is a big difference between what the teacher teaches and what the learner learns! This is especially true as learners now get a lot of input and practice outside the classroom, via the internet.
11. Is there a learner dictionary tagged with CEFR levels, using this English Profile information?
Yes, the Cambridge Learner’s Dictionary and the Cambridge Essential English Dictionary both have this. Both are from Cambridge University Press.
12. How is CEFR connected to directives of EU? are the EU countries obligated to use CEFR?
There is no general directive that obligates EU countries to use the CEFR. However, countries throughout the EU and in more and more countries around the world have realised that the CEFR helps them clarify learning goals, set standards and develop a more communicative approach to language teaching and assessment. Individual countries and institutions are increasingly adopting CEFR levels as the basis for describing their curriculum and objectives. Schools and teachers then find they need to understand the CEFR to know what is expected of them. But the CEFR is language-neutral – it describes what a learner at each level can do (e.g. talk about their family) but doesn’t say what language is needed to do that. English Profile aims to help teachers by working out what language is generally learned at each CEFR level, and using this to develop learning materials and exam that are appropriate for the level. With the English Vocabulary Profile, Cambridge developed an online tool to share that some of that information with teachers around the world.
13. Does English Profile only look at British English and American English? What about other varieties, e.g. the idea of English as a Lingua Franca, or different interlanguages?
English Profile is all about interlanguages – what are the features of learners’ language. But we have organised the data into two broad categories – those learners who are using British English as their target, and those using American English as their target. Most other varieties are not codified sufficiently to act as a ‘target’, which makes it difficult to categorise what is correct or not.
14. What age were the learners in the project?
We have focused on secondary and adult learners worldwide. Young learners (under 11) tend to have different lexical requirements!
15. Can teachers use English Profile to see typical mistakes by students?
The data underlying English Profile does show us typical learner errors. We can analyse these by age, level or first language. That data is directly accessible to teachers, but Cambridge has produced a number of books which provide information on common mistakes – and how to avoid them. Check the Cambridge University Press website and search for “Common Mistakes”.
16. Are there copyright issues for, e.g. ELT materials writers using EP data for writing ELT materials for CUP's publishing competitors?
Yes. Cambridge has made this resource available for teachers. If writers producing materials for other publishers or exam board want to use it, they need permission from Cambridge first.
17. How can researchers benefit from this project?
Have a look at the researcher pages on our website. Interested researchers can attend our events, (including the English Profile seminar held in Cambridge every February), use our free resources (such as the English Vocabulary Profile, and the English Profile Journal) and submit a research proposal here. While English Profile is not a funding body, we do allow access to our corpora for approved researchers.
18. If the vocabulary used by a student writing in the FCE exam, for example, is mostly at level B1, will it have any influence on the results?
Vocabulary listed as B1 in the English Vocabulary Profile will form part of the B2 learner’s lexicon and if used appropriately within the task set, this would be perfectly adequate. That said, as an FCE examiner, I have often been impressed by words and phrases above B2 level, and confident use of a range of vocabulary is usually an indicator of better than average ability. Remember though that candidates are assessed on a number of scales for the Cambridge examinations: content, communicative achievement, organisation and language.
19. I introduce "I'm blue" in level b1, I want my students to know colloquial expressions. What do you think of that? My question: What level is "how come?" , I couldn't find it when you showed COME.
As yet, we haven’t had much evidence of colloquial use to draw on. I would also say that this use of ‘blue’ may be less common now than it was within first language use – this was certainly the opinion of some lexicographers I worked with on the project. Great that you are teaching your students colloquial expressions though! How come? is currently given C1 level in the EVP and is listed both at ‘how’ and ‘come’. If we had spoken learner data I suspect this level might come down, and your own feedback will be taken into account as well.
Certificate in Advanced English; C1; Danish
20 Do you include examples with mistakes in the corpus, then?
The Cambridge Learner Corpus stands at around 50 million words at present, and over half of that text is error coded. In selecting learner examples for the EVP, I needed accurate uses of the word or phrase being described, and any errors around that use were corrected within square brackets. See the learner example at come to light for an example of the correction technique used.
21 What do you hope to get when you finish this project? Design standardized tests?
Cambridge University Press and Cambridge Assessment are currently using English Profile findings to develop and improve teaching and testing materials which are aligned to the CEFR. We also hope that this project will influence the field of ELT in a variety of ways, not least in helping teachers and syllabus designers understand what the CEFR means for English in terms of the actual language (vocabulary, grammar, etc) learners can use at each of the six levels.
22. What percentage of the learner corpus is hand written?
Most of the Cambridge Learner Corpus is based on written exam scripts, which were mostly hand-written, and had to be keyed in manually. However, the Cambridge English Profile Corpus is designed to accept data electronically.
23. Are there any corpora available now that you would recommend to teachers?
There are several native speaker corpora available online which can be useful to teachers in a variety of ways (see list below). Also remember that if you contribute data to our corpus you will automatically gain access to that too.
British national corpus http://www.natcorp.ox.ac.uk/corpus/
American national corpus http://www.americannationalcorpus.org
ICE – international corpus of English http://ice-corpora.net/ice/
MICASE – Micheigan Crpus of Academic Spoken English http://quod.lib.umich.edu/m/micase/
Corpus BYU http://corpus.byu.edu/bnc/
Corpus of contemporary American English (COCA) http://www.americancorpus.org/