I know this is not where I should start this, but trying to get some initial feedback before I go jump into the middle of a pack of wolves.
My reason for doing this is because there are way too many quality video productions done in foreign languages that have not be "voice-overed" for a secondary market. Notably, the 4-season production of "Sissy", which is only done in German with English subtitles. I am especially upset in that the production was redone with French voice-over, which I can understand, but my wife struggles with. An English version would be so much better. I would even purchase the DVD series if it had the English voice-over, not just the subtitles.
Text of my proposal ...
============================================================
PROPOSAL -- Multi-media -- Subtitle Files -- Enhanced Format
============================================================
2025-07-26 v1.0 Eric Marceau, Ottawa, Canada
========
Purpose:
========
The purpose of this proposal is to open the discussion on how
the subtitle files could be enhanced in such a way as to allow
for additional functionality, such as automated post-processing,
which was not originally conceived as possible using the "*.srt"
vehicle.
==============
Enhancement 1:
==============
Defining a pair of tags (<voice> and <tone>) to uniquely
associate subtitle text with individual movie characters or
actors. The <voice> tag could then be associated with its own
"specification file", namely:
- ${movie_filename}.srt.voices
providing the indexing to the "API" for the voice generation
function using line-item references such as:
1|gen_API_1_M|JĂŒrgen Prochnow|normal
1|gen_API_1_E|JĂŒrgen Prochnow|angry
1|gen_API_1_S|JĂŒrgen Prochnow|whisper
1|gen_API_1_I|JĂŒrgen Prochnow|reflective
2|gen_API_2_M|Stellan SkarsgÄrd|normal
2|gen_API_2_G|Stellan SkarsgÄrd|directive
2|gen_API_2_A|Stellan SkarsgÄrd|yelling
2|gen_API_2_K|Stellan SkarsgÄrd|worried
* * *
or simply
1|gen_API_1_M|JĂŒrgen Prochnow
1|gen_API_1_E|JĂŒrgen Prochnow
1|gen_API_1_S|JĂŒrgen Prochnow
1|gen_API_1_I|JĂŒrgen Prochnow
2|gen_API_2_M|Stellan SkarsgÄrd
2|gen_API_2_G|Stellan SkarsgÄrd
2|gen_API_2_A|Stellan SkarsgÄrd
2|gen_API_2_K|Stellan SkarsgÄrd
* * *
The "original" Actor's names are used because that would reduce
the number of "original" voices that need to be cross-referenced
to a secondary language's substitute voice that would have the
same "textural quality" as the original. Cross-referencing of
specific voice-pairs would allow the building of a reusable
library that could be applied to multiple productions.
======================
Ensuing Opportunities:
======================
a) The <voice> tag is offered as the identifier for the character
or actor whose voice is speaking.
b) The <tone> tag is offered as the identifier characterizing the
"flavour" of the voice, the manner in which the words are
spoken. These additional tags are only a suggested subset
of those that could possibly be defined:
- A|yelling
- C|loud
- E|angry
- G|directive
- I|reflective
- K|worried
- M|normal
- O|submissive
- Q|soft
- S|whisper
- U|crying
c) Using the above two tags as embedded directives,
auto-generate language-specific voice-over corresponding
to the subtitle text.
======================
Formatting Proposal A:
======================
1
00:00:42,500 --> 00:00:45,085
<voice 1><tone reflective>WOMAN: Remember, remember
The 5th of November</tone></voice>
2
00:00:45,086 --> 00:00:47,755
<voice 1><tone reflective>The gunpowder treason and plot</tone></voice>
3
00:00:47,922 --> 00:00:50,382
<voice 1><tone reflective>I know of no reason
Why the gunpowder treason</tone></voice>
4
00:00:50,383 --> 00:00:52,301
<voice 1><tone reflective>Should ever be forgot</tone></voice>
======================
Formatting Proposal B:
======================
1
00:00:42,500 --> 00:00:45,085
<voice 1><tone reflective>
WOMAN: Remember, remember
The 5th of November
</tone></voice>
2
00:00:45,086 --> 00:00:47,755
<voice 1><tone reflective>
The gunpowder treason and plot
</tone></voice>
3
00:00:47,922 --> 00:00:50,382
<voice 1><tone reflective>
I know of no reason
Why the gunpowder treason
</tone></voice>
4
00:00:50,383 --> 00:00:52,301
<voice 1><tone reflective>
Should ever be forgot
</tone></voice>
I was visualizing something like the now-defunct CDDB, where a public-access DB+API would allow remote users to âharvestâ the subset relevant to a given movie and there would be a desktop caching DB, such that those previously downloaded are preserved at the desktop in a way that another movie âconversionâ would re-use those cached files for the newer task.
I am sure there would be motivation for something like that in the industry, especially for what North Americans call âForeign Language Filmsâ, but I perceive non-North-American countries banding together to form such a âstrategic initiativeâ to facilitate the repackaging of their movies by adapting them to other than home-country markets, thereby increasing their own market share and profitability of their own movie industries.
Countries who want to export their movies via the over-dubbing process could have creative control and preserve the quality/character of voices being portrayed by facilitating the voice-selection process themselves to match a given home-country voice with a foreign-language voice.
To clarify, the âsystemâ would be for creating a new movie with over-dubbing by a production house that has separate voice tracks from the remaining soundtrack; NOT for on-demand conversion at time of playing the video.
You could also argue for this from an accessibility point of view; just because something has English subtitles, doesn't make it particularly useful for someone who struggles with reading subtitles but still wants to watch film (e.g. vision impairment).
I say this as someone who doesn't like dubbed at all, and would rather subtitles any day
To me, that sounds like is a stunningly good, at least very interesting, idea.
Just like CDDB was a community effort. It might just work
Here is a very good example of exactly that (although not automated but classically casted).
The Japanese producer/writer/animator does have control over the the whole process including the foreign language (=english) dub, and it shows !
( b.t.w. both videos above have the same collection of audio and sub tracks. You can switch in the youtube player between the original japanese audio and the dubbed american version, you can also enable subs there )
The quality of the english dub is so good (as in 'ârealâ) that it sometimes, at moments, can be textual/contentwise slightly better than the original japanese dub (i.e. the âpopcornâ line in this example was very much approved by some japanese viewers who watched both versions, and .although it was not âcorrectlyâ translated, it fitted idiomatic and culturally much much better than a literal translation. )
(by the way, the example videos are published on YouTube by the producer/witer/animator himself. No fees or anything, just for fun )
For me it is the same, I always prefer the original audio + subs because it âfitsâ better,
besides the fact that Iâm used to it. In my country we hardly do any dubs, only subs.
( The other thing is that most 'âhollywoodâ voice actors sound like a bunch of bickering valley girls )
The above example is probably one of the very few exceptions. (donât laugh, i watch both versions because in either language they are just as funny and âauthenticâ. Something that you rarely encounter)
Maybe because you have grown up with subtitles, you are so used to them that they are no longer distracting.
For me, I am so focused on trying to read (I am a slower-than-average reader because of the way my brain parses what I read) that I donât get to savour or enjoy the scenery/action. I always have trouble with the âticker-tapeâ news bar at the bottom of news broadcasts, never having enough time, so forced to read thru them at least twice to get it all.
So, for me, subtitles is an absolute last resort, and only if I am really desparate to want to see the movie with subtitles!
I know what you mean, I have exactly that with some japanese stuff where the subtitles are passing so ridiculously fast that I have to pause it sometimes to avoid missing the action.
That is a bit of a drawback of expressive languages: Translations happen to be three times as long sometimes to convey the same meaning. It is comparable with idioms based on proverbs.
For instance: when I say âbell and clapperâ, everyone that grew up with my native language will know what I mean. ( It is the shortened version of the proverb âHe heard the bell chime but doesnât know where the clapper isâ )
These three words can probably only be translated in english as âHe talks about something he heard about but actually doesnât now anything about itâ
3 words translated into 14 words.
You bet that subs would go crazy fast if a translated language is rich enough in culture to convey rich meanings with only a few words (like japanese)
Yes indeed. That is what youtube is doing with auto translations but , well, itâs not quite there yet.
For translating between âwesternâ languages it works quite well,
but when translating from chinese or japanese it can go so completely off the rails
that it ends up in gibberish.
Try the video that I posted: first with english subtitles and then with japanese subtitles auto-translated to english and you see what I mean.
And although I assume that AI is already used, quite a lot gets lost in translation.
But I strongly believe that weâll get there in the end.
By the way, an extra difficulty is that meaning can change based on where you put the stress or accent in a word. For instance: hashi means chopsticks in japanese but hashi means bridge.
The other thing is that stress or accent in western languages is by increasing volume, but in japanese and chinese it is not in volume at all but in pitch. It will be quite a job to tackle that.
Yes and there is good dubbing and a lot of very bad dubbing. I watched a movie where a female childâs voice was dubbed by an old man, quite a distraction! Bad dubbing ruins a movie. I
watch a lot of foreign films on Roku/YouTube with subtitles or closed caption that allows you to slow the speed of the subtitles, which makes them easier to read. Sometimes movies are not closed captioned or they donât offer a language you understand, but that is not often if you understand English, at least in the US. Maybe YouTube offer the home language in the home countries? They do have a country setting.