CapitalizationStandardEnglish clarifications

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

CapitalizationStandardEnglish clarifications

Brian Schweitzer
Hi all,

This email is prompted by comments made in IRC by nikki and voiceinsideyou, with reference to words that should always be lowercase, and the rewrite of Guess Case I am currently working on.

1st item:
--------------------------------------------------------------------------------
Currently, the line for prepositions reads:

"Short prepositions (three letters or less): as, at, by, for, in, of, on, to -- except when used as adverbs or as an inseparable part of a verb"

This is problematic in two different ways.  First, it gives a definition of words that should always be lowercased.  It defines these words as being prepositions *three* characters or less.  However, it then gives a list of words, all only *two* characters long.  It notably does not use any wording to indicate whether this is a list that should be considered to be examples, or a list that should be considered definitive.

The Oxford Dictionary, as I pointed out in discussion on that page back in 9/2007, lists "but, cum, mid, off, per, qua, re, up, via" as the only *three* character long prepositions in the standard English language.  "but" is already covered by rule 2, regarding conjunctions, but the other 8 words are left unclear.

So, the question is, should the list "as, at, by, for, in, of, on, to" be considered definitive, and the language changed to read "Short prepositions (two characters or less):", or should the list be made complete, per the current definition, to read:

"Short prepositions (three letters or less): as, at, by, for, in, of, on, to, but, cum, mid, off, per, qua, re, up, via -- except when used as adverbs or as an inseparable part of a verb"

It's been suggested that "off" can also be used as a non-preposition.  However, I would suggest that the exact same argument works for the counter-preposition, which is in the current list of 8 words - "on".  Also, I would suggest, of any of the 17 words, the most problematic also is one in the existing list - 'to".

2nd item
--------------------------------------------------------------------------------
While I'm talking about words missing in those lists, I should also mention that "for", "yet", and "so" are missing from the list of English coordinating conjunctions,

3rd item
--------------------------------------------------------------------------------
Point 4: Capitalize contractions and slang consistent with the rules above to the extent that such clearly apply. For example, do not capitalize o' for "of", 'n' or n' for "and".

While I see us using this for 'n', I don't see it for o'.  Do we really want "Ten o'Clock" over "Ten O'Clock"?  I've seen many more edits moving towards O'Clock type o'foo constructs, vs towards o'Clock.  (The current behaviour of Guess Case isn't correct either way, it turns it into "Ten O'clock")

---------------------------------------------------------------------------------

If at all possible, if we could come to a decision on this sooner, rather than having a long, drawn-out debate as we sometimes do, it'd be helpful, to  allow me to make sure that, when we move the TemplateToolkit code into beta in a month or so for widespread testing, the new Guess Case functions are relatively locked, rather than my making last minute changes to basic routines that will need testing for cat-corner cases.

So, any objections to adding "but, cum, mid, off, per, qua, re, up, via" to the list in rule 2, so it is then comprehensive, and there's no confusion as to what exactly the list means?  How about adding "so, yet, for" to the list of coordinating conjuctions?  These two changes would make those lists complete for English.  Lastly, how about removing " o' " as the preferred capitalization, replacing it with " O' ", and a clarification that the word following o' should also be capitalized, giving "O'Clock"?

Thanks!
Brian

_______________________________________________
Musicbrainz-style mailing list
[hidden email]
http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-style
Reply | Threaded
Open this post in threaded view
|

Re: CapitalizationStandardEnglish clarifications

swisschris
+1

On Thu, Feb 19, 2009 at 10:38 AM, Brian Schweitzer <[hidden email]> wrote:
Hi all,

This email is prompted by comments made in IRC by nikki and voiceinsideyou, with reference to words that should always be lowercase, and the rewrite of Guess Case I am currently working on.

1st item:
--------------------------------------------------------------------------------
Currently, the line for prepositions reads:

"Short prepositions (three letters or less): as, at, by, for, in, of, on, to -- except when used as adverbs or as an inseparable part of a verb"

This is problematic in two different ways.  First, it gives a definition of words that should always be lowercased.  It defines these words as being prepositions *three* characters or less.  However, it then gives a list of words, all only *two* characters long.  It notably does not use any wording to indicate whether this is a list that should be considered to be examples, or a list that should be considered definitive.

The Oxford Dictionary, as I pointed out in discussion on that page back in 9/2007, lists "but, cum, mid, off, per, qua, re, up, via" as the only *three* character long prepositions in the standard English language.  "but" is already covered by rule 2, regarding conjunctions, but the other 8 words are left unclear.

So, the question is, should the list "as, at, by, for, in, of, on, to" be considered definitive, and the language changed to read "Short prepositions (two characters or less):", or should the list be made complete, per the current definition, to read:

"Short prepositions (three letters or less): as, at, by, for, in, of, on, to, but, cum, mid, off, per, qua, re, up, via -- except when used as adverbs or as an inseparable part of a verb"

It's been suggested that "off" can also be used as a non-preposition.  However, I would suggest that the exact same argument works for the counter-preposition, which is in the current list of 8 words - "on".  Also, I would suggest, of any of the 17 words, the most problematic also is one in the existing list - 'to".

2nd item
--------------------------------------------------------------------------------
While I'm talking about words missing in those lists, I should also mention that "for", "yet", and "so" are missing from the list of English coordinating conjunctions,

3rd item
--------------------------------------------------------------------------------
Point 4: Capitalize contractions and slang consistent with the rules above to the extent that such clearly apply. For example, do not capitalize o' for "of", 'n' or n' for "and".

While I see us using this for 'n', I don't see it for o'.  Do we really want "Ten o'Clock" over "Ten O'Clock"?  I've seen many more edits moving towards O'Clock type o'foo constructs, vs towards o'Clock.  (The current behaviour of Guess Case isn't correct either way, it turns it into "Ten O'clock")

---------------------------------------------------------------------------------

If at all possible, if we could come to a decision on this sooner, rather than having a long, drawn-out debate as we sometimes do, it'd be helpful, to  allow me to make sure that, when we move the TemplateToolkit code into beta in a month or so for widespread testing, the new Guess Case functions are relatively locked, rather than my making last minute changes to basic routines that will need testing for cat-corner cases.

So, any objections to adding "but, cum, mid, off, per, qua, re, up, via" to the list in rule 2, so it is then comprehensive, and there's no confusion as to what exactly the list means?  How about adding "so, yet, for" to the list of coordinating conjuctions?  These two changes would make those lists complete for English.  Lastly, how about removing " o' " as the preferred capitalization, replacing it with " O' ", and a clarification that the word following o' should also be capitalized, giving "O'Clock"?

Thanks!
Brian

_______________________________________________
Musicbrainz-style mailing list
[hidden email]
http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-style


_______________________________________________
Musicbrainz-style mailing list
[hidden email]
http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-style
Reply | Threaded
Open this post in threaded view
|

Re: CapitalizationStandardEnglish clarifications

Nikki-3
In reply to this post by Brian Schweitzer
Brian Schweitzer wrote:
> So, any objections to adding "but, cum, mid, off, per, qua, re, up, via" to
> the list in rule 2, so it is then comprehensive, and there's no confusion as
> to what exactly the list means?  How about adding "so, yet, for" to the list
> of coordinating conjuctions?  These two changes would make those lists
> complete for English.  Lastly, how about removing " o' " as the preferred
> capitalization, replacing it with " O' ", and a clarification that the word
> following o' should also be capitalized, giving "O'Clock"?

I don't agree with adding 'off', 'up', 'so' or 'yet'. Looking at a
selection of titles, 'off' and 'up' mostly occur as part of a phrasal
verb, 'so' and 'yet' mostly occur as an adverb. I would rather just keep
them capitalised than require that people understand English grammar in
enough detail to determine how it should be capitalised. If other people
would rather we make it more complicated, I would still oppose having
guess case lowercase them because they are predominately in the
uppercase category.

I would say o' on its own, O' attached to a noun.

Nikki

_______________________________________________
Musicbrainz-style mailing list
[hidden email]
http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-style
Reply | Threaded
Open this post in threaded view
|

Re: CapitalizationStandardEnglish clarifications

Brian Schweitzer
On Thu, Feb 19, 2009 at 5:25 AM, Nikki <[hidden email]> wrote:
Brian Schweitzer wrote:
> So, any objections to adding "but, cum, mid, off, per, qua, re, up, via" to
> the list in rule 2, so it is then comprehensive, and there's no confusion as
> to what exactly the list means?  How about adding "so, yet, for" to the list
> of coordinating conjuctions?  These two changes would make those lists
> complete for English.  Lastly, how about removing " o' " as the preferred
> capitalization, replacing it with " O' ", and a clarification that the word
> following o' should also be capitalized, giving "O'Clock"?

I don't agree with adding 'off', 'up', 'so' or 'yet'. Looking at a
selection of titles, 'off' and 'up' mostly occur as part of a phrasal

Well, yes - the same is also just as true of "on".  I think there's two separate questions for each word:
1) Should Guess Case make the word lowercase by default?
2) Should it be in the list as part of the guideline to English capitalization?

Re #1, I could agree with adding "but, cum, mid, per, qua, re, via" to Guess Case, leaving "off, up" out of Guess Case's lowercased words exceptions list, and removing "on", for the same reasons as we would be excluding off and up.

Re #2, I think the list in the guideline itself should be comprehensive, and should include all of the words.
 
verb, 'so' and 'yet' mostly occur as an adverb. I would rather just keep

In backwards order, as #1 here is kind of long:

Re #2, I would still add all three to the list in the guideline.

Re: #1 I could agree on yet being left out of Guess Case, but I think so is has enough that we can figure out somewhat simply which way it is being used. 

not clearly defined as either in English: "My Code Was Late so I Worked All Night"
adverb: "My Code Was Late and so I Worked All Night"
 
The former is a situation where so really is performing as both adverb and conjunction, but lacking a alternate conjunction, so's position is primarily conjunctive, not adverbial.  In the latter, the presence of "and" make the role of so clearly adverbial, as there is another conjunction present.

To be honest, I actually think it would look more correct with the conjunctive so capitalize above, but the adverbial so made lowercase.  But that would violate both our English guidelines for not capitalizing conjunctions, and for capitalizing adverbs.

I could stand to see so, in either useage, simply always lowercased.  However, if we were to read the current guideline's example, without so in the list, as non-comprehensive, it still seems it would be possible for so to be capitalized conditionally, by Guess Case, with a low chance of it guessing incorrectly.  There's really not many ways so can be used in a principally conjunctive role, given how few conjunctions there are in English, and how few of those actually ever are used with so, to make so an adverb:

...but so...  - not used in English
...and so... used in English
...or so... - used in English
...nor so... - not used in English
...yet so... - not used in English
...for so... - used (rarely) in English
...so so... - not used in English (not to be confused with so-so, where both are adverbial and modify each other)

That gives us a pretty basic pattern to look for: (and|f?or)\sso

However, that would still leave us with the situations where so appears in a non-ambiguously adverbial manner:
"He went so far away", "I got so wet!", etc.

Unlike the ambiguous case, there's just too many adjectives that so could be used with for us to check for them.  However, there is still room for us to catch the majority of cases where so is used a conjunction, yet not "break" those cases where it is used as an adverb:  Ignore the above pattern match, and instead of matching on just \sso\s, match on ,\sso\s - ie, match on anywhere so is immediately preceeded by a comma.  The comma does still get omitted, but correct English suggests a comma there, so most people do use one (as I just did here, actually).  This also has the benefit of being pretty much plug-n-play with the existing new GC rules, without a special case exception to be handled.  That would give us (as GC expected output):

1 This is a Song for My Mother, so I Named It For Her
2 It Was Raining and So I Ran
3 You Went So Far Away
4 My Code Was Late So I Worked All Night

Cases 1, 2, and 3 are correct, only case 4 would still be incorrect - but it's strained grammar, and not used anywhere as frequently as case 1, so we'd still be auto-fixing most, if not all, cases.

them capitalised than require that people understand English grammar in
enough detail to determine how it should be capitalised. If other people
would rather we make it more complicated, I would still oppose having
guess case lowercase them because they are predominately in the
uppercase category.

I would say o' on its own, O' attached to a noun.

That makes the most sense to me.  Would you agree that it should be O'Clock, or would you think O'clock?

Brian

_______________________________________________
Musicbrainz-style mailing list
[hidden email]
http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-style
Reply | Threaded
Open this post in threaded view
|

Re: CapitalizationStandardEnglish clarifications

Brant Gibbard
 
I would dispute several of these, with counterexamples (although they may not be in quite the grammatical sense you meant them):
 
 
...but so...  - not used in English 
 
I would say it is used a great deal in English
 
 
...and so... used in English
...or so... - used in English
...nor so... - not used in English 
 
rare, and perhaps a bit archaic, but quite grammatical
 
 
...yet so... - not used in English
...for so... - used (rarely) in English
...so so... - not used in English (not to be confused with so-so, where both are adverbial and modify each other)


_______________________________________________
Musicbrainz-style mailing list
[hidden email]
http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-style
Reply | Threaded
Open this post in threaded view
|

Re: CapitalizationStandardEnglish clarifications

Brant Gibbard
Ick, ick, ick!! I just realized that the second link, which I used because of the words in the page title and the quote at the top, is apparently a link to a site run by a bunch of Jörg Haider-type bigots. I do apologize. It is of course originally a quote from Romeo and Juliet.
 

Brant Gibbard
Toronto, ON
http://bgibbard.ca

 


From: [hidden email] [mailto:[hidden email]] On Behalf Of Brant Gibbard
Sent: February-19-09 10:13 AM
To: 'MusicBrainz style discussion'
Subject: Re: [mb-style] CapitalizationStandardEnglish clarifications

 
I would dispute several of these, with counterexamples (although they may not be in quite the grammatical sense you meant them):
 
 
...but so...  - not used in English 
 
I would say it is used a great deal in English
 

...and so... used in English
...or so... - used in English
...nor so... - not used in English 
 
rare, and perhaps a bit archaic, but quite grammatical
 

...yet so... - not used in English
...for so... - used (rarely) in English
...so so... - not used in English (not to be confused with so-so, where both are adverbial and modify each other)


_______________________________________________
Musicbrainz-style mailing list
[hidden email]
http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-style