Back to the Math and Logic Page

Update: Google has indexed this page since I originally wrote it, but if you click the first two links I still show up as the only search result.

On Language and Possibilities

 

Every day of your life, you say something that has never been said before. Take that sentence I just uttered. Do a search for it in quotes on Google and you'll find out that no one else in the history of the internet has ever composed a phrase exactly like that.

The phrase "every day of your life" shows up 288,000 times on the web, 2760 times on print; it's a little more common. Similarly, the phrase "something that has never been said before" shows up 161 times on the web, 11 times in print. But no one recorded in publishing or on the web has linked those two phrases together in precisely the way I just did. Granted, people have expressed that same thought in plenty of different ways, and made similar observations in literature or language courses all over the world, but no one's ever taken the time to express it exactly how I did.

As another exercise, take a random phrase you say in a day, five or six words long, and search for it in quotes on Google. The odds are that for most things you could come up with that aren't pure clichés there are only a handful of results, and it should be no trouble at all to come up with an original phrase you've said but no one else has ever recorded.

Conversations are like snowflakes, no two are ever likely to be exactly alike. The reason that's true for snowflakes is that you're talking about over a quintillion water molecules which can all have different configurations, giving you more possibilities than you could ever keep track of. There are a septillion snowflakes a year, but even so it could rain snowflakes from now until the end of the universe and the odds are virtually nil of there ever being an exact duplicate.

Sentences are a little more limited in structure than snowflakes, but the range of possibilities are still baffling. Let's say to save yourself some time you wanted to get an index of every possible sentence in the English language, that was around fourteen words or less. This may seem a bit long, but keep in mind the last sentence I used was 28 words and this one is 25 words long. If we say there are a million possible words we can use to construct these sentences, that gives us about 10^84 possible sentences. Just so we're clear, there are less than 10^80 atoms in the entire universe. Meaning, you couldn't write down every 14 word sentence even if you had all the matter in the universe to write on.

These numbers are all pretty mind-boggling, so I'm going to have to start putting them in context. That last comparison is kind of cheating since most of us don't run off of a vocabulary of a million words. The Oxford English dictionary stores 290,000 words with 616,500 forms—omitting slang, technical terms or jargon, and regional words. There are supposedly only 200,000 words in active use. Educated people have vocabularies of around 20,000 words, and use about 2,000 in a week's conversation. So if we go for that 2000 figure, there are 10^46 possible sentences to worry about. Just to put this in context, there are less than 10^43 seconds in the entire lifetime of the universe, before every single atom is destroyed by protonic decay. Meaning, you could recite sentences at the rate of one per second from now until the universe ceases to exist in any meaningful way and you still wouldn't have nearly enough time.

Now, you can reduce this number somewhat by only considering valid syntax forms, noun verb object, preposition object, etc. But as Yoda taught us, good writing and communication can break all the rules, and there's no saying that some sentence we regard as meaningless at the moment might not be downright profound in another context. So what this all means is that there are more simple sentences than all the matter and time left in the universe. But let's try bringing this all down to a more relatable context.

All the printed material in the WORLD adds up to roughly 5 terabytes, which is five trillion bytes, or simply five trillion characters. All of that space is barely big enough to hold every possible four word phrase (assuming a tiny 2000 word vocabulary). Meaning, it's an empirical fact that you can come up with a four word phrase that has never made it into print. If you consider an educated 20,000 word vocabulary, then all the books and journals ever written couldn't hold every three word phrase.

Take the internet as another example. It has well under 5 quadrillion bytes of pure text, which is still over a thousand times more information than all the printed material in the world. And even that's not big enough to store every four word phrase constructible with an "educated" vocabulary. And as one individual person, you're unlikely to be able to read and hear as much as 4 billion words (or 20 billion characters) over your entire life, meaning you don't even have enough time in your life to hear every three word phrase that's out there.

In other words, you can come up with a one-liner no one in the history of the world has ever written down, a simple phrase no one's taken the trouble to say, or express ideas in a way no one's ever bothered to do. There are more possibilities in the English language and more ideas to communicate than there are peoplearound to say them.

And as a corollary, it also means that plagiarism is really really easy to detect. If you find a paragraph copied word for word, the odds of someone arriving at that same paragraph by pure chance is slim on the scale of a meteor burning through your roof one day and landing cleanly on your dinner plate.

But take heart. Every day of your life, you likely have something original to say. There are more possibilities than we can even begin to conceive of. The trick is just sifting through all of them and finding the ones that matter the most.

 

 

email me