Expanding embedded functionality of subtitle files

ericmarceau · 26 July 2025 17:49

I know this is not where I should start this, but trying to get some initial feedback before I go jump into the middle of a pack of wolves.

My reason for doing this is because there are way too many quality video productions done in foreign languages that have not be "voice-overed" for a secondary market. Notably, the 4-season production of "Sissy", which is only done in German with English subtitles. I am especially upset in that the production was redone with French voice-over, which I can understand, but my wife struggles with. An English version would be so much better. I would even purchase the DVD series if it had the English voice-over, not just the subtitles.

Text of my proposal ...

============================================================
PROPOSAL -- Multi-media -- Subtitle Files -- Enhanced Format
============================================================

2025-07-26	v1.0	Eric Marceau, Ottawa, Canada


========
Purpose:
========

The purpose of this proposal is to open the discussion on how 
the subtitle files could be enhanced in such a way as to allow 
for additional functionality, such as automated post-processing, 
which was not originally conceived as possible using the "*.srt" 
vehicle.


==============
Enhancement 1:
==============

Defining a pair of tags (<voice> and <tone>) to uniquely 
associate subtitle text with individual movie characters or 
actors.  The <voice> tag could then be associated with its own 
"specification file", namely:

	- ${movie_filename}.srt.voices

providing the indexing to the "API" for the voice generation 
function using line-item references such as:

	1|gen_API_1_M|Jürgen Prochnow|normal
	1|gen_API_1_E|Jürgen Prochnow|angry
	1|gen_API_1_S|Jürgen Prochnow|whisper
	1|gen_API_1_I|Jürgen Prochnow|reflective
	2|gen_API_2_M|Stellan Skarsgård|normal
	2|gen_API_2_G|Stellan Skarsgård|directive
	2|gen_API_2_A|Stellan Skarsgård|yelling
	2|gen_API_2_K|Stellan Skarsgård|worried
	   * * *

or simply

	1|gen_API_1_M|Jürgen Prochnow
	1|gen_API_1_E|Jürgen Prochnow
	1|gen_API_1_S|Jürgen Prochnow
	1|gen_API_1_I|Jürgen Prochnow
	2|gen_API_2_M|Stellan Skarsgård
	2|gen_API_2_G|Stellan Skarsgård
	2|gen_API_2_A|Stellan Skarsgård
	2|gen_API_2_K|Stellan Skarsgård
	   * * *

The "original" Actor's names are used because that would reduce 
the number of "original" voices that need to be cross-referenced 
to a secondary language's substitute voice that would have the 
same "textural quality" as the original.   Cross-referencing of 
specific voice-pairs would allow the building of a reusable 
library that could be applied to multiple productions.


======================
Ensuing Opportunities:
======================

a) The <voice> tag is offered as the identifier for the character 
     or actor whose voice is speaking.

b) The <tone> tag is offered as the identifier characterizing the 
     "flavour" of the voice, the manner in which the words are 
     spoken.  These additional tags are only a suggested subset 
     of those that could possibly be defined:
	- A|yelling
	- C|loud
	- E|angry
	- G|directive
	- I|reflective
	- K|worried
	- M|normal
	- O|submissive
	- Q|soft
	- S|whisper
	- U|crying

c) Using the above two tags as embedded directives, 
     auto-generate language-specific voice-over corresponding 
     to the subtitle text.


======================
Formatting Proposal A:
======================

1
00:00:42,500 --> 00:00:45,085
<voice 1><tone reflective>WOMAN: Remember, remember
The 5th of November</tone></voice>

2
00:00:45,086 --> 00:00:47,755
<voice 1><tone reflective>The gunpowder treason and plot</tone></voice>

3
00:00:47,922 --> 00:00:50,382
<voice 1><tone reflective>I know of no reason
Why the gunpowder treason</tone></voice>

4
00:00:50,383 --> 00:00:52,301
<voice 1><tone reflective>Should ever be forgot</tone></voice>


======================
Formatting Proposal B:
======================

1
00:00:42,500 --> 00:00:45,085
<voice 1><tone reflective>
WOMAN: Remember, remember
The 5th of November
</tone></voice>

2
00:00:45,086 --> 00:00:47,755
<voice 1><tone reflective>
The gunpowder treason and plot
</tone></voice>

3
00:00:47,922 --> 00:00:50,382
<voice 1><tone reflective>
I know of no reason
Why the gunpowder treason
</tone></voice>

4
00:00:50,383 --> 00:00:52,301
<voice 1><tone reflective>
Should ever be forgot
</tone></voice>

tkn · 4 August 2025 07:58

Uhhh….let me get this straight:

You aim to feed subtitle files to a speech-synthesizer with an audio database of thousands of well know actor voices ?

ericmarceau · 4 August 2025 17:50

I was visualizing something like the now-defunct CDDB, where a public-access DB+API would allow remote users to “harvest” the subset relevant to a given movie and there would be a desktop caching DB, such that those previously downloaded are preserved at the desktop in a way that another movie “conversion” would re-use those cached files for the newer task.

I am sure there would be motivation for something like that in the industry, especially for what North Americans call “Foreign Language Films”, but I perceive non-North-American countries banding together to form such a “strategic initiative” to facilitate the repackaging of their movies by adapting them to other than home-country markets, thereby increasing their own market share and profitability of their own movie industries.

Countries who want to export their movies via the over-dubbing process could have creative control and preserve the quality/character of voices being portrayed by facilitating the voice-selection process themselves to match a given home-country voice with a foreign-language voice.

To clarify, the “system” would be for creating a new movie with over-dubbing by a production house that has separate voice tracks from the remaining soundtrack; NOT for on-demand conversion at time of playing the video.

At least, that is my vision! (hallucination? )

stephematician · 5 August 2025 01:02

You could also argue for this from an accessibility point of view; just because something has English subtitles, doesn't make it particularly useful for someone who struggles with reading subtitles but still wants to watch film (e.g. vision impairment).

I say this as someone who doesn't like dubbed at all, and would rather subtitles any day

tkn · 5 August 2025 10:43

Wow!

To me, that sounds like is a stunningly good, at least very interesting, idea.

Just like CDDB was a community effort. It might just work

Here is a very good example of exactly that (although not automated but classically casted).
The Japanese producer/writer/animator does have control over the the whole process including the foreign language (=english) dub, and it shows !

( b.t.w. both videos above have the same collection of audio and sub tracks. You can switch in the youtube player between the original japanese audio and the dubbed american version, you can also enable subs there )

The quality of the english dub is so good (as in '“real”) that it sometimes, at moments, can be textual/contentwise slightly better than the original japanese dub (i.e. the “popcorn” line in this example was very much approved by some japanese viewers who watched both versions, and .although it was not “correctly” translated, it fitted idiomatic and culturally much much better than a literal translation. )

(by the way, the example videos are published on YouTube by the producer/witer/animator himself. No fees or anything, just for fun )

For me it is the same, I always prefer the original audio + subs because it “fits” better,
besides the fact that I’m used to it. In my country we hardly do any dubs, only subs.

( The other thing is that most '“hollywood” voice actors sound like a bunch of bickering valley girls )

The above example is probably one of the very few exceptions. (don’t laugh, i watch both versions because in either language they are just as funny and “authentic”. Something that you rarely encounter)

ericmarceau · 5 August 2025 15:56

Maybe because you have grown up with subtitles, you are so used to them that they are no longer distracting.

For me, I am so focused on trying to read (I am a slower-than-average reader because of the way my brain parses what I read) that I don’t get to savour or enjoy the scenery/action. I always have trouble with the “ticker-tape” news bar at the bottom of news broadcasts, never having enough time, so forced to read thru them at least twice to get it all.

So, for me, subtitles is an absolute last resort, and only if I am really desparate to want to see the movie with subtitles!

tkn · 5 August 2025 17:11

That is indeed the case, I grew up with it

I know what you mean, I have exactly that with some japanese stuff where the subtitles are passing so ridiculously fast that I have to pause it sometimes to avoid missing the action.

That is a bit of a drawback of expressive languages: Translations happen to be three times as long sometimes to convey the same meaning. It is comparable with idioms based on proverbs.

For instance: when I say “bell and clapper”, everyone that grew up with my native language will know what I mean. ( It is the shortened version of the proverb “He heard the bell chime but doesn’t know where the clapper is” )

These three words can probably only be translated in english as “He talks about something he heard about but actually doesn’t now anything about it”
3 words translated into 14 words.

You bet that subs would go crazy fast if a translated language is rich enough in culture to convey rich meanings with only a few words (like japanese)

ericmarceau · 5 August 2025 17:37

That’s where AI assist, using the likes of tools similar to “Les Atlas sémantiques” would go a long way to ensuring the most context-appropriate translation!

tkn · 5 August 2025 17:53

Yes indeed. That is what youtube is doing with auto translations but , well, it’s not quite there yet.

For translating between “western” languages it works quite well,
but when translating from chinese or japanese it can go so completely off the rails
that it ends up in gibberish.

Try the video that I posted: first with english subtitles and then with japanese subtitles auto-translated to english and you see what I mean.
And although I assume that AI is already used, quite a lot gets lost in translation.
But I strongly believe that we’ll get there in the end.

By the way, an extra difficulty is that meaning can change based on where you put the stress or accent in a word. For instance: hashi means chopsticks in japanese but hashi means bridge.
The other thing is that stress or accent in western languages is by increasing volume, but in japanese and chinese it is not in volume at all but in pitch. It will be quite a job to tackle that.

jymm · 15 September 2025 10:25

Yes and there is good dubbing and a lot of very bad dubbing. I watched a movie where a female child’s voice was dubbed by an old man, quite a distraction! Bad dubbing ruins a movie. I

watch a lot of foreign films on Roku/YouTube with subtitles or closed caption that allows you to slow the speed of the subtitles, which makes them easier to read. Sometimes movies are not closed captioned or they don’t offer a language you understand, but that is not often if you understand English, at least in the US. Maybe YouTube offer the home language in the home countries? They do have a country setting.

tkn · 16 September 2025 10:31

Only if the uploaded stream contains that specific subtitles.
But if there are any subtitles, youtube will be able to translate it on the fly.

Even if there are no subtitles, youtube offers closed captions by speech-recognition which too can be translated on the fly.

Magic time we live in