The hard part of the Babel Fish is blocking the drain in the floor.
Google Translate on a phone is 80% of the way there, for pure translation. I have seen people use it this way; hold the phone up and take turns talking while it translates in near real time. To be Babelfish-magical it would need to be 10x faster and on a device 1000x smaller and you'd need to be comfortable implanting it in your ear canal, which seems like the actually hard part
If you'd settle for something that looks like an earbud, I think this is less than 10 years away. Maybe without the feature of auto-detecting the language in real time (though that may actually be easy, I don't know).
And then God will make your blueteeth incompatible and you'll just have a useless lump of copper in your ear.
The box of q-tips says not to put them in your ear, but I rarely use them for anything else.
For true real-time, you're always going to be somewhat limited to awkward translations given linguistic typology. If you're translating to English from a language that tends to put the verb last, either you have to wait until you hear that verb or like Yoda it will sound. I don't know if I would prefer that sort of output over waiting a few seconds so it can arrange the entire sentence appropriately.
I only want one if it comes with the option of translating things I don't want to hear into a language I don't understand. Then all I'll need are some peril-sensitive sunglasses.
Yeah, I think you can see this emergently in stuff like 90 Day Fiance where they're communicating primarily with Google Translate on a phone.
I would like to award Moby's comment NO TEA.
BTW, the boy just applied at your local university.
The engineering school in Believeland? We have multiple commenters who work there, if he has any questions.
Thanks. He may. He mostly wants a small liberal arts college, but his guidance counselor wanted him to look at a couple of different types and really has a good opinion about that school.
Accents and dialects? My parents basically can't use US automated phone menus, their English is accented.
I'd be curious how well Spanish voice transcription trained on whatever silicon valley thinks is appropriate works with Argentine, emphatic Cuban, or Salvadoran speakers.
I think one of the things I don't enjoy about a language barrier is how difficult it is to gauge whether you're actually connecting with the other person or not. You can have preliminaries, but you can't have a particularly interesting conversation. If I had a little babelfish, it all seems less intimidating and more interesting to actually connect with someone who I wouldn't otherwise be able to communicate well with.
I admire your extroversion. My concern about going to a place where I don't speak the language is getting lost and being unable to ask for directions, getting scammed and not having a clue about it or any recourse, needing anything that can't be expressed with gestures and being unable to get it... the inability to bond on a personal level with the tour guide or concierge is a low priority.
I find 13.last to be overly America-thentric.
It won't fit in your ear, it'll be bluetooth and run off your phone. It won't be fast enough to actually be seemless, but still could be a game changer in a lot of situations.
13 will be a problem, though they've been getting better at that kind of thing.
I'm trying to study Thai again and part of the app (not Duolingo but a close imitator) is speech recognition that has me say a sentence and displays, in Thai, what it heard me say.
13.2: Apparently Castilian Spanish sounds classy and highly educated to Spanish speakers in the Americas, much like the British illusion.
I was complemented on my Spanish accent once.
17: I'm taking Duolingo French and when I get a questuon requiring a spoken answer, the app pretends not to understand and ignores me if I say it wrong, just like a real French person.
17.2: Sally could apparently crack her (mostly Dominican or NY Latino) Spanish class up hard by doing a Castilian accent picked up from some Spanish soap opera about a boarding school with Ana de Armas in it. It may sound classy, but that includes hilariously classy.
Is it the "c" as "th" thing that's funny?
17.2 The core of the British illusion in the US seems to be the belief that most British people talk like old fashioned BBC newsreaders, when the country is in fact full of regional dialects which are almost incomprehensible to middle class south easterners. I recall going for a walk with my parents near here when we were overtaken by a couple of guys talking broad Derbyshire. After they were out of earshot my mother asked, "What language were those two speaking?"
I believe the same is true of Spain. I have it on the word of a native son of that kingdom that middle class Madrileños have problems with working class Andalusians, although they are technically speaking the same language. I wonder how far you can deviate from posh metropolitan Castillian and still sound classy to American speakers.
The translation misfires may produce inappropriate emotional reactions that can frustrate speakers in real time. I remember some automated transcription software that was doing closed captioning for the 2019 Hugo Award ceremony, where some speakers were giving very serious speeches while the audience was cracking up loudly about the mistranscription. One I remember was "Lord of the Rings" got mistranscribed as "Bored of the Rings," which is the name of a somewhat infamous parody from the '70s. The convention chair wound up publishing an apology. I've got to imagine that translation mistakes could be even worse in some circumstances, where the audience may not be aware that the software has misfired.
I routinely mix up non-U English accents with Australian, both directions.
the belief that most British people talk like old fashioned BBC newsreaders
Mostly Hollywood, no? Not precisely like newsreaders, but in a compatible register.
22: I think that's most of it, yes.
I admire your extroversion. My concern about going to a place where I don't speak the language is getting lost and being unable to ask for directions, getting scammed and not having a clue about it or any recourse, needing anything that can't be expressed with gestures and being unable to get it... the inability to bond on a personal level with the tour guide or concierge is a low priority.
Now that you mention it, I actually don't like talking to people very much. Maybe I shouldn't wish to remove the language barrier.
I'm skeptical that a babelfish gadget could translate well enough to enable the kind of real connection that heebie seeks, at least for some (non-western?) languages. I've been thinking about translation lately because M and I have been watching a lot of Korean dramas, and the subtitles are so bad. I find myself frequently interrupting to correct the subtitles, but sometimes (often) there's just no adequate one-to-one translation of a word or a sentence, and I end up pausing the show to give a longer explanation of what's being said, or what it means, or why it has the impact that it does. (I'm an annoying person to watch tv with, and this is only one of the reasons.) There's just no easy way to translate the social meaning of spoken language.
On the other hand, I have a friend for whom English is his second language, and I swear, for years after meeting him I thought of him as being basically fluent. He's very charismatic and expressive and hilarious, so it's so easy to get his jokes and get what he's saying. A conversation with him is always a good time. Then I got an email from him and it made no sense at all, and I realized that actually, his English is terrible. It's his intangible communicative skills that make him so easy to connect with, despite his lack of actual language ability. I think you'd lose those intangible aspects if you were to try to communicate with people through an app.
Ok, what if instead of being in person, it's a blog, and instead of living in my ear, it's another website, and all you have to do is cut and paste each comment? I think that resolves everything.
mistranscription
The 4-foot tall duo of Pawnee Christian rappers who embellish Reservation Dogs episodes sell T-shirts with the weird nonword "skoden". That's transcriptionese as it appears on the show's autogenerated captions for "let's go then", which is the way the kids on the show say "let's book" or whatever is the standard English way to say that in a casual register. Lil Mike and Funny Bone are their stage names.
Simultaneous translation is as much of a myth as simultaneous comprehension.
There's just no easy way to translate the social meaning of spoken language.
preach.
Also, let me recommend the hilarious song "Facebook Drama" by Northern Cree on Spotify, maybe elsewhere too?
30. Now do humor or irony
My wife can tell which small village someone comes from in her home county (185 sq mi) by their accent.
I like to learn a few words of the local language when I go somewhere, especially places that are either American- or tourist-infested. Saying 'thank you' in Ladin in the Dolomites is worth it. Or Dari in Kabul.
I'm kind of charmed by the extent to which people I know from Kanton Uri post in Schwyzertütsch of Facebook. If I say the words out loud, I can usually figure out what is going on, but then it's not like Facebook comments are that complicated. The Facebook translation function just can't figure it out at all.
I know I linked this before, but I'm having an earworm, and maybe some of you will enjoy this again. https://www.youtube.com/watch?v=cHdAQL8WElU
31: 'Skoden' isn't just an auto-transcription thing: it's a real word!
https://www.cbc.ca/news/canada/sudbury/skoden-graffiti-water-tower-1.4746575
Yeah, it's a widespread thing in Native circles that's taken off especially with social media.
huh, that'll teach me to write without checking. New word to me, and not transcriptionese then. It's the opposite, a transcription that correctly preserves and records a new word.
As we've been saying for decades.
re: 33
I'm getting rusty, and out of touch, after 20+ years living in England, but at one time I could often place people quite accurately within Scotland based on accent alone, and if you add in dialect and vocabulary, that also helps.
I had experience of simultaneous translation (by translators) at the launch for: fl0rentinec0dex.getty.edu (replace the zeros with the letter o to find the link)
It's a cool project (my team did the digital side of the whole thing).*
They had translators translating from English to Spanish, and from Spanish to English during the launch conference/video call and I was impressed at how good it was given the terminology was often quite complex and academic (for the scholars) and quite technical (me) for the tech side.
The time anyone complained though, was when I was talking, because apparently I talk really quickly. Which would be true, as I tend to find academic-type North Americans talk slowly.
*N.B. if you have some aggressive pop-up blocker you might not get in, as there's a GDPR cookie consent banner.
On the site linked above, scroll down to get some canned links to different books and different kinds of content/searches.
"I'm hungry."
"Me too, let-squeet."
++++
"Lunch?"
"Yeah, squeet."
I handle written work by non-native English speakers, and they keep coming up with words that are perfectly good English -- and that I've never heard before. I assume it's a result of Google translate. So, for example: tergiversate, imprescriptible or isonomy.
(And yes, I know you smart people probably all know those words, but they were new to me.)
A colleague of mine is a miserably inept writer who would exhaust each person in the department sequentially, asking them to edit anything written he's produced. He recently switched to a different department and sent us a really kind email of farewell. I read it and immediately (uncharitably) thought, "AI wrote this," and then my other colleague revealed that he had the same exact response.
43: in some cases it may also be the result of having learned Indian English, which, as any reader of the Times of India will know, has some absolutely splendid bits of vocabulary.
I had such a strong kneejerk reaction to this post that I completely ignored it for two days. Now I'll tell you though!
1. We are not close. We will never be close. Computer people do not actually understand how language works. Machine translation is great at English text, pretty good at some other languages' written forms, pretty good at text-to-speech for a few dialects of a few languages, and nowhere near comprehensible for everything else. And since every dialect of every language changes constantly, and very little of it is included in any of the machine-learner data set, there's no way it will ever catch up.
Also, actually-simultaneous translation/interpretation is impossible. Translation happens at the phrasal (and bigger) level, not word by word.
2. OMG hearing people (or at least monolingual Americans.. but I repeat myself) are such BABIES. There are 1,000 ways to have interesting interactions with people across language barriers. No wonder everyone is always making the deaf people shoulder 100% of the communication burden.
(Present company excluded, I suppose)
There is a company that is selling glasses that have a little holographic ASL interpreter in the lenses though. I'll try that out as soon as I have several thousand extra dollars lying around. (the interpreters are real people, not machines, who have zero ability to decipher any signed language AFAIK)
They're babies. I'm fine. I rarely pay attention to people regardless.
46 Somehow, google (or whoever) has started just automatically captioning any audio that emanates from my computer. OMG it's really awful. It can't get two sentences in a row right, and usually can't get 5 words in a row right.
BTW, I was at a thing yesterday with the new director of that local art center, and she made a big deal about access., esp for deaf people. I don't think this was for my benefit -- it seemed to be a matter of genuine conviction.
40- I watch all recorded lectures or training videos at 2x. There are very few people who speak fast enough at 1x that 2x isn't comprehensible.
There's a lot of things ChatGPT doesn't understand, but it really does produce impressively grammatical fluent English. The viewpoint that language was too hard for current technology was pretty convincing 5 years ago, but I find it a lot less convincing now.
48- They talk big but haven't made any actual changes that I know of. Including they still don't even put accessibiity info in individual event announcements which is the bare minimum simplest thing I've talked to them about at like 4 different meetings. So I'll believe they're actually working towards accessibility when I see something happen.
50- I don't think language is too *hard*, I just think it's a fundamentally different things than machines can do.
51 Agreed. It's new, and her statements were future-looking. I'm not suggesting that you give then benefit of the doubt. I think they meant it, and I think when they say they're going to build it into the budget for future events, that's a hopeful sign.
Proof is in the pudding, of course.
They've been saying that for a couple of years now...
It's a new "they" and I think this time is different.
I would love that! Here's to new Theys.
"We are not close. We will never be close. Computer people do not actually understand how language works. Machine translation is great at English text"
These sentences don't seem to be consistent with each other. Machine translation is great at English text, presumably it could be great at other languages' text even if it isn't yet - there's nothing special about English - and machine speech recognition is also very good and getting better, but we will never have accurate speech-to-speech translation?
The point about word by word translation is believable but that doesn't undermine the whole project. And I simply don't believe that dialects are evolving slowly enough that human speakers can keep understanding them but too fast for ML, with access to a far larger corpus, to keep pace.
OMG hearing people (or at least monolingual Americans.. but I repeat myself) are such BABIES. There are 1,000 ways to have interesting interactions with people across language barriers. No wonder everyone is always making the deaf people shoulder 100% of the communication burden.
Oh, piss off.
Modern language models don't translate word by word, they predict the next token in the context of the whole text or the fraction of it that fits in their attention window, whichever is shorter. It's not the same thing, and it's a huge part of the difference between eg 2015's word2vec and everything after Transformer in 2017.
I also don't see a good reason in principle not to have a long output window and predict the next X tokens, which amounts to predicting phrases, and iirc having longer attention and output windows is a fairly active research field as that's also useful for working with big documents, complex iterative prompts, and other stuff AI projects do all the time.
I'd also point out that if computer people don't understand how language works, at least they don't believe either that you need universal grammar and an innate language acquisition device OR universal general intelligence to get cogent text
... I don't, either?
You all understand that speech and text aren't the same thing, right?
58- English has a special amount of data in the corpus, including far more online text and far more annotated speech, than any other language; it's also the language that the most people have been spending the most time working on.
This is also changing rapidly. If you want a SOTA Arabic model that's open source, it exists and at 7B parameters it's a relative snip, Apple Silicon Macs can run that.
There absolutely definitely are language models for signing! Here's one for Indian sign language that claims 99.75% F1 accuracy:
I think the key thing about simultaneous translation is that there's no reason why it would have to be word for word, which, as already pointed out, won't work in many cases, because that's not how language generally works.
But it doesn't have to be simultaneous word-by-word translation. All that matters is that it's quick enough to facilitate practical interactions between people. Human translators can do this, obviously.
It's absolutely true, though, that when good/adequate quality practical real-time translation becomes available--and I'm sure it will--speakers of languages with lower numbers of speakers, dialect speakers, and (as E.Messily has already pointed out) sign-language users are absolutely going to be at a disadvantage.
re: 66.last
Disadvantaged by scale and by funding, I meant, rather than anything intrinsic to the properties of those languages.
re: 49
That's not the same thing, though. The translator has to keep a certain amount of information in their head while they are "buffering" and getting ready to translate into the other language, and if I'm talking fast and using a fair amount of moderately technical language, they can struggle to keep up. Those self-same people, I'm sure, could do just fine if they were only listening to me, in terms of keeping up since they were all expert speakers of both of the languages involved.
64 et seq, yes, and some large entities have interests in improving non-English models. The EU, frex, and a good number of countries with legal obligations to support multiple languages. (By contrast, interestingly, the PRC holds that there is only one language and one dialect in China, and everyone already speaks it).
Despite which there are of course intra-Chinese translation apps which people have been relying on for years. One can imagine the technology and ideology interacting in ways humorous and dystopian.
65: are you joking? Did you read the abstract of that? Did you read the first two sentences? It is utter, obvious bullshit. I encourage you to explore the data set used, with special attention to whether still images or video was being analyzed, and whether or not the data consisted entirely of alphabet letters and the numerals 1 through 10. Also worth considering: how many people fluent in any sign language were involved? Or who had ever met a deaf person?
Written language changes much more slowly than spoken/signed language. I'm sure we will slowly see more and more accurate text-to-text translation for more and more languages. That doesn't get us any closer at all to simultaneous speech-to-speech interpretation.
The two paragraphs of 71 are responding to different things
"Normal verbal language is much more creative and cultivated than normal verbal language. The artistic spirit of life is given by the hand moment, body, and facial expression." So true.
Anyway as I was saying, computer people don't understand how language works
I also have a real knee-jerk reaction to E. Messily's bit in 46: We will never be close, because that implies that humans and human brains are magic, and I want to completely reject that.
It's absolutely true, though, that when good/adequate quality practical real-time translation becomes available--and I'm sure it will--speakers of languages with lower numbers of speakers, dialect speakers, and (as E.Messily has already pointed out) sign-language users are absolutely going to be at a disadvantage.
I don't mean this in a jerky way, but more than they already are? I mean, the current system is massively structured against anyone who uses a language with a small number of speakers/signers.
re: 75
I just meant specifically in the context of machine-translation.
But sure, they already are in a multitude of other ways, with no disagreement from me at all. So, potentially, made worse off than they already are, I guess, if machine translation and other seq2seq type use cases become more central to various areas of our lives.
74- Why? I mean why would that mean human brains are magic? There's lots of things computers can't do, that humans can. And vice versa.
(I can't tell how my tone is going to come across here but I'm just genuinely curious, not trying to be argumentative)
Technology, obviously, has the potential to enrich the lives of minority* language speakers in various ways. **
But I'm not optimistic in the short term, because, well, that's just not how current capitalism is working out.
* in the sense of languages with less speakers, not in the sense of languages with lower importance.
** and I've been involved in that in small ways in my job
And this specifically is a thing that humans can't do- no one can interpret literally simultaneously, and no one knows all the languages.
Although I probably think that we'll never be close to having computers do other things related to language that you think we will be, so maybe that's what you meant.
"Literally" simultaneously is a bit of a red herring, though. Even consecutive is useful, and you know what's a completely normal code switch? Slowing down, leaving out idiosyncratic words, and using a higher/more standard register when you talk to a foreigner!
82 I'm assured that speaking really loud helps too
Also, naming no names, mocking any slips they may make...
I take "things computers can't do" to be a strong claim that those things are algorithmically impossible, and I think physical simulation could be good enough to simulate brains, if not especially plausible, so a claim that there's a thing a brain can do that a computer can't is a claim that brains have some special quality that is impossible to simulate.
I don't know the right kinship term, but my wife's sister's husband's daughter (step-niece?) and her husband are both translators at the EU, and their entire job is simultaneous translation. I don't think anyone is confused about exactly what "simultaneous" means. The fact that it's slightly delayed, or the translation is only provided once the appropriate unit of language has finished (phrase, sentence, whatever)* and the meaning is clear enough to translate doesn't mean that it's not "simultaneous".
One thing I did learn from talking to them, which I didn't know before--although probably should have known if I'd thought about it at all--is that they always translate from another language into their native language, and not the other way round. They have at least three languages in common, but don't translate in the same direction.
* when I studied linguistics, we were provided examples, sometimes slightly artificial, in which the context window to determine meaning was surprisingly large. That is, where the meaning of some sentence or other was determined not just by the words within that sentence but by words in sentences elsewhere and not just in the sense of resolving coreferences.
85- we're starting from very different priors
the context window to determine meaning was surprisingly large. That is, where the meaning of some sentence or other was determined not just by the words within that sentence but by words in sentences elsewhere
In Korean, a lot of the parts of speech that give a statement specific meaning are left unverbalized in spoken conversation. You get them from context, which can include not only context within the conversation, but context far outside the conversation. I suppose it's not impossible that a computer program could factor in all of the relevant extra-conversational context, but it's hard to imagine how. You'd have to give it deep and ongoing access to a lot of information about yourself and your acquaintances, your relationships, your relative ages, etc. etc.
I've found that Google translate for informal writing (like tweets) is absolute useless garbage, and that's probably why.
I just used Google Translate to help Pebbles read a graphic novel that had an incantation in Spanish and it was clearly the coolest thing in the world to her. This has only increased her desire for a phone.
74: not magic, just harder than one might expect. Choice of tone, connotation, replacement/explanation of idioms, slang. Context windows are big, and tone is hard.
In Korean, a lot of the parts of speech that give a statement specific meaning are left unverbalized in spoken conversation. You get them from context, which can include not only context within the conversation, but context far outside the conversation. I suppose it's not impossible that a computer program could factor in all of the relevant extra-conversational context, but it's hard to imagine how. You'd have to give it deep and ongoing access to a lot of information about yourself and your acquaintances, your relationships, your relative ages, etc. etc.
This seems like an assertion that you cannot understand a Korean conversation unless you know the people involved pretty intimately, which seems unlikely. If you gave a Korean interpreter a tape of two unknown people talking in Korean, he'd throw up his hands and give up? "I have no idea what this means! I don't know how old they are or who their uncles are!"