How to convert a .txt subtitle file to .srt format?
-
I have a subtitle file, it looks like this: 00:00:44:" Myślę, więc jestem".|Kartezjusz, 1596-1650 00:01:01:Trzynaste Pietro 00:01:06:Podobno niewiedza uszczęśliwia. 00:01:10:Po raz pierwszy w życiu|zgadzam się z tym. 00:01:13:Wolałbym... 00:01:15:nigdy nie odkryć|tej straszliwej prawdy. 00:01:19:Teraz już wiem... I'm not sure what format this is, but I wanted to convert the subtitles to .srt. Unfortunately gnome-subtitles and subtitleeditor can't recognize this kind of format. gnome-subtitles says: Unable to detect the subtitle format. Please check that the file type is supported. subtitleeditor says: Please check that the file contains subtitles in a supported format. file output: UTF-8 Unicode text Is there a way to convert this file to .srt format?
-
Answer:
This is very similar to @goldilock's approach but, IMO, simpler and can deal with empty lines in the file and replaces | with a line break : #!/usr/bin/env perl my ($time, $text, $next_time, $next_text); my ($c,$i)=0; while (<>) { ## skip bad lines next unless /^\s*([:\d]+)\s*:(.+)/; ## If this is the first line. I could have used $. but this is ## safer in case the file contains an empty line at the beginning. if ($c == 0) { $time=$1; $text=$2; $c++; } else { ## This is the counter for the subtitle index $i++; ## Save the current values $next_time=$1; $next_text=$2; ## I am assuming that the | should be interpreted ## as a newline, remove this if I'm wrong. $text=~s/\|/\n/g; ## Print the previous subttitle print "$i\n$time,100 --> $next_time,000\n$text\n\n"; ## Save the current one for the next line $time=$next_time; $text=$next_text; } } ## Print the last subtitle. It will be dislayed for a minute ## 'cause I'm lazy. $i++; $time=~/(\d+:)(\d+)(:\d+)/; my $newtime=$1 . (sprintf "%02d", $2+1) . $3; print "$i\n$time,100 --> $newtime,000\n$text\n\n"; Save the script as a file and make it executable, then run: ./script.pl subfile > good_subs.srt The output I get on your sample was: 1 00:00:44,100 --> 00:01:01,000 " Myślę, więc jestem". Kartezjusz, 1596-1650 2 00:01:01,100 --> 00:01:06,000 Trzynaste Pietro 3 00:01:06,100 --> 00:01:10,000 Podobno niewiedza uszczęśliwia. 4 00:01:10,100 --> 00:01:13,000 Po raz pierwszy w życiu zgadzam się z tym. 5 00:01:13,100 --> 00:01:15,000 Wolałbym... 6 00:01:15,100 --> 00:01:19,000 nigdy nie odkryć tej straszliwej prawdy. 7 00:01:19,100 --> 00:02:19,000 Teraz już wiem...
Mikhail Morfikov at Unix and Linux Visit the source
Other answers
What Thorsten meant is something like this: #!/usr/bin/perl use strict; use warnings FATAL => qw(all); my $END = '!!ZZ_END'; my $LastTitleDuration = 5; my $count = 1; my $line = <STDIN>; chomp $line; my $next = <STDIN>; while ($line) { $next = lastSubtitle($line) if !$next; last if !$next; chomp $next; if (!($next =~ m/^\d\d:\d\d:\d\d:.+/)) { print STDERR 'Skipping bad data at line '.($count+1).":\n$line\n"; $next = <STDIN>; next; } printf STDOUT "%d\r\n%s,100 --> %s,000\r\n%s\r\n\r\n", $count++, substr($line, 0, 8), substr($next, 0, 8), substr($line, 9) ; } continue { $line = $next; $next = <STDIN>; } sub lastSubtitle { my $line = shift; $line =~ /^(\d\d:\d\d:)(\d\d):(.+)/; return 0 if $3 eq $END; return sprintf("$1%2d:$END", $2 + $LastTitleDuration); } When I feed your sample data into this, I get: 1 00:00:44,100 --> 00:01:01,000 " Myślę, więc jestem".|Kartezjusz, 1596-1650 2 00:01:01,100 --> 00:01:06,000 Trzynaste Pietro 3 00:01:06,100 --> 00:01:10,000 Podobno niewiedza uszczęśliwia. 4 00:01:10,100 --> 00:01:13,000 Po raz pierwszy w życiu|zgadzam się z tym. 5 00:01:13,100 --> 00:01:15,000 Wolałbym... 6 00:01:15,100 --> 00:01:19,000 nigdy nie odkryć|tej straszliwej prawdy. 7 00:01:19,100 --> 00:01:24,000 Teraz już wiem... Couple of points: The subtitles actually start 1/10th second late so they do not overlap, and because I was too lazy to add in some math involving the second timestamp. They then stay on until 1/10th second before the next title. The last title stays up for $LastTitleDuration (5 seconds). I used CRLF line endings as per the http://en.wikipedia.org/wiki/SubRip#SubRip_.28.srt.29_structure_examples although that may not be necessary. It presumes the first line of input is not malformed. Beyond that, they are checked, and errors are reported to stdout, so: readAlongToSRT.pl < readAlong.txt > whatever.srt Should create the file but still print errors to the screen. Processing will stop at a blank line. See terdon's comment below re: the possible significance of | in the subtitle content. You may want to insert $line =~ s/|/\r\n/g; before the printf STDOUT line. This took me 20 minutes and the only test data I had was those 7 lines, so don't count on it being perfect. If there are ever line breaks in the subtitles, that will cause a problem. I presumed there aren't; if that is the case I suggest you remove them from the input first rather than trying to deal with them here.
TAFKA 'goldilocks'
Related Q & A:
- How To Convert Pmd To Word File?Best solution by file-extensions.org
- How to convert a video to a specific format?Best solution by Super User
- How to convert a HTML file to XML file?Best solution by Stack Overflow
- How to convert a DC locomotive to a DCC locomotive?Best solution by Yahoo! Answers
- How to convert a .DMG file into a .EXE?Best solution by answers.yahoo.com
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.