Nicholas LaCara : : Looking for one in the bnc – Part IIBack to blog index | Back to website

Looking for one in the bnc – Part II

Nicholas LaCara – February 2020 – Boston, ma

This is an update on some corpus work I've been doing on anaphoric one. To read up on the background, see my previous post, or go read the stunning conclusion.

Updates

So at this point I've worked through retagging the examples I initially collected. I can definitely say I've learned a lot by doing this. I worried in my last post that the tag set I thought up would be too detailed, but I encountered a number of issues I didn't anticipate or think about when I actually sat down to do the tagging. Given that things didn't go quite how I anticipated, I'll need to revise the tag set a bit and go through the data again, but now that I have a better sense of what I'm doing it shouldn't take so long.

Numeral one

I did end up collapsing two of the ambiguous tags I had thought about using: 1UA (ambiguous in context) and 1UC (not enough context) since I couldn't even remember what distinction I was trying to capture there. I added a new token, though, for tokens of one that occurred at the beginning of a title or name (which happened around twenty times after a possessive).

There were several cases, though, mostly having to do with uses of cardinal one, that I don't think my original tag set did a good job of capturing. Most notably, many tokens of one occur in contexts where it does quantify over the noun phrase but is instead part of a compound or else part of a complex cardinal or ordinal number. Below I give some examples of the cases I encountered.

Of the 589 apparentAs I'll discuss, some tokens of it's and you're are mis-tagged. tokens of one following a possessive, I tagged 440 as the cardinal numeral, nearly 75%. However, despite this technically being the right tag for many of these tokens, I worry that they do not accurately reflect the relation between possessives and one. npe following one after a possessive. When it comes to numeral one, I'm particularly interested in cases where it either quantifies over a noun (phrase), as in (1) or else when it is stranded by noun phrase ellipsis, as in (2):

  1. CAB 581He pushed at the gate and after a struggle it creaked and groaned open on its one rusty hinge.
  2. B03 2629They were unbeaten all afternoon, despite six of their opponents having three serves, to their one!

However, many tokens of numeral one occur in contexts where they do not do this. Several of these tokens were in numeral–noun compounds, such as one-point rise and one-pound coin below (the corpus data does not include the dash):

  1. A37 56STERLING failed to make further headway yesterday following Thursday's one point rise in base rates
  2. AAV 1027I stand in the queue for one of the new-fangled ticket machines clutching my one pound coin […]

Other places numeral one appears include as a component of cardinal and ordinal numbers, such as one hundred thousandth, and as part of modifiers such as one day:

  1. CBW 616It is to welcome its one hundred thousandth member to the fold […]
  2. BNP 250Home The Lynce, a small house in the grounds of Blenheim Palace, which will be Jamie's one day.

Also, as mentioned last time, there were a number of tokens of one that were the first word in a title (One Hundred Years of Solitude by Gabriel García Márquez appeared several times).

  1. AAF 523[…] something of the flavour of this part of Panama can be gleaned from Garcia Marquez's One Hundred Years Of Solitude […]

These tokens of one are all, properly speaking, cardinal numeral one, but from a syntactic perspective, they don't directly tell us much about the relation of one to possessives. They are part of syntatically distinct units – compounds, titles, and sentential modifiers – that don't interact with the structure of a noun phrase the way a simple numeral one does.

So as I refine my tagging going forward, I'm revising my tag set to take account of these differences. I suspect that this is overkill, but since I'm interested in patterns deeper than pure word order I'd like to be able to at least look at these distinctions in a more detailed way.

Mis-tags in the corpus

Somewhat surprisingly, the automatic tagging in the corpus occasionally mistags it's as possessive. For example, the substring it's one thing in the example below is not tagged as if it means ‘it is one thing…’ but rather as ‘its one thing’:

  1. CEN 2250It's one thing Keith finds so frightening – an awful responsibility.

This is actually pretty easy to take care of, and I've already modified my re-tagging script to allow me to change the tag of a preceding element to address the handful of cases like this and get more accurate counts.

More troubling is that some cases of actual possessives appear to be mis-tagged as the contracted from of is. In reading some of the contexts for the target sentence, I also came across tokens of possessive -'s that must have been mis-tagged VBZ (3rd person singular present of be). As an example:

      1. well
      2. AJ0
      1. Angie
      2. NP0-NN1
      1. 's
      2. VBZ
      1. one
      2. PNI
      1. 's
      2. VBZ
      1. like
      2. AV0
      1. boiling
      2. AJ0-VVG
      1. hot
      2. AJ0
        ‘[…]well Angie's one's like boiling hot’
    KP5 300

This example was not caught by my initial search, and when I searched for it I discovered that the token of 's following Angie is mis-tagged as VBZ rather than POS. Since this is a clear case of anaphoric one immediately following a possessive (one refers to radiators in context), this is exactly the kind of case I am interested in and so I would like for it to be included in the sample. At the moment, I will probably need to more carefully comb the data looking for these sorts of tokens.

tl;dr: This has yielded results

The upshot of all of this is that I came away with some real examples of anaphoric one in use after possessives:

    1. It's okay. I haven't got any more anyway. My present was better actually.
    2. No, my one was.
    1. I had a cardigan on.
    2. What cardigan?
    3. Your one.
  1. she changed the film cos sixteen pictures of it were my ones cos I hadn't used the film

So there has been some pay-off! But I'm looking forward to doing some more re-tagging to address the issues I discuss above and get clearer results. Check back soon.