How to convert a .txt subtitle file to .srt format?

I have a subtitle file, it looks like this: 00:00:44:" Myślę, więc jestem".|Kartezjusz, 1596-1650 00:01:01:Trzynaste Pietro 00:01:06:Podobno niewiedza uszczęśliwia. 00:01:10:Po raz pierwszy w życiu|zgadzam się z tym. 00:01:13:Wolałbym... 00:01:15:nigdy nie odkryć|tej straszliwej prawdy. 00:01:19:Teraz już wiem... I'm not sure what format this is, but I wanted to convert the subtitles to .srt. Unfortunately gnome-subtitles and subtitleeditor can't recognize this kind of format. gnome-subtitles says: Unable to detect the subtitle format. Please check that the file type is supported. subtitleeditor says: Please check that the file contains subtitles in a supported format. file output: UTF-8 Unicode text Is there a way to convert this file to .srt format?
Answer:

This is very similar to @goldilock's approach but, IMO, simpler and can deal with empty lines in the file and replaces | with a line break : #!/usr/bin/env perl my ($time, $text, $next_time, $next_text); my ($c,$i)=0; while (<>) { ## skip bad lines next unless /^\s*([:\d]+)\s*:(.+)/; ## If this is the first line. I could have used $. but this is ## safer in case the file contains an empty line at the beginning. if ($c == 0) { $time=$1; $text=$2; $c++; } else { ## This is the counter for the subtitle index $i++; ## Save the current values $next_time=$1; $next_text=$2; ## I am assuming that the | should be interpreted ## as a newline, remove this if I'm wrong. $text=~s/\|/\n/g; ## Print the previous subttitle print "$i\n$time,100 --> $next_time,000\n$text\n\n"; ## Save the current one for the next line $time=$next_time; $text=$next_text; } } ## Print the last subtitle. It will be dislayed for a minute ## 'cause I'm lazy. $i++; $time=~/(\d+:)(\d+)(:\d+)/; my $newtime=$1 . (sprintf "%02d", $2+1) . $3; print "$i\n$time,100 --> $newtime,000\n$text\n\n"; Save the script as a file and make it executable, then run: ./script.pl subfile > good_subs.srt The output I get on your sample was: 1 00:00:44,100 --> 00:01:01,000 " Myślę, więc jestem". Kartezjusz, 1596-1650 2 00:01:01,100 --> 00:01:06,000 Trzynaste Pietro 3 00:01:06,100 --> 00:01:10,000 Podobno niewiedza uszczęśliwia. 4 00:01:10,100 --> 00:01:13,000 Po raz pierwszy w życiu zgadzam się z tym. 5 00:01:13,100 --> 00:01:15,000 Wolałbym... 6 00:01:15,100 --> 00:01:19,000 nigdy nie odkryć tej straszliwej prawdy. 7 00:01:19,100 --> 00:02:19,000 Teraz już wiem...

Mikhail Morfikov at Unix and Linux Visit the source

Was this solution helpful to you?

Other answers

What Thorsten meant is something like this: #!/usr/bin/perl use strict; use warnings FATAL => qw(all); my $END = '!!ZZ_END'; my $LastTitleDuration = 5; my $count = 1; my $line = <STDIN>; chomp $line; my $next = <STDIN>; while ($line) { $next = lastSubtitle($line) if !$next; last if !$next; chomp $next; if (!($next =~ m/^\d\d:\d\d:\d\d:.+/)) { print STDERR 'Skipping bad data at line '.($count+1).":\n$line\n"; $next = <STDIN>; next; } printf STDOUT "%d\r\n%s,100 --> %s,000\r\n%s\r\n\r\n", $count++, substr($line, 0, 8), substr($next, 0, 8), substr($line, 9) ; } continue { $line = $next; $next = <STDIN>; } sub lastSubtitle { my $line = shift; $line =~ /^(\d\d:\d\d:)(\d\d):(.+)/; return 0 if $3 eq $END; return sprintf("$1%2d:$END", $2 + $LastTitleDuration); } When I feed your sample data into this, I get: 1 00:00:44,100 --> 00:01:01,000 " Myślę, więc jestem".|Kartezjusz, 1596-1650 2 00:01:01,100 --> 00:01:06,000 Trzynaste Pietro 3 00:01:06,100 --> 00:01:10,000 Podobno niewiedza uszczęśliwia. 4 00:01:10,100 --> 00:01:13,000 Po raz pierwszy w życiu|zgadzam się z tym. 5 00:01:13,100 --> 00:01:15,000 Wolałbym... 6 00:01:15,100 --> 00:01:19,000 nigdy nie odkryć|tej straszliwej prawdy. 7 00:01:19,100 --> 00:01:24,000 Teraz już wiem... Couple of points: The subtitles actually start 1/10th second late so they do not overlap, and because I was too lazy to add in some math involving the second timestamp. They then stay on until 1/10th second before the next title. The last title stays up for $LastTitleDuration (5 seconds). I used CRLF line endings as per the http://en.wikipedia.org/wiki/SubRip#SubRip_.28.srt.29_structure_examples although that may not be necessary. It presumes the first line of input is not malformed. Beyond that, they are checked, and errors are reported to stdout, so: readAlongToSRT.pl < readAlong.txt > whatever.srt Should create the file but still print errors to the screen. Processing will stop at a blank line. See terdon's comment below re: the possible significance of | in the subtitle content. You may want to insert $line =~ s/|/\r\n/g; before the printf STDOUT line. This took me 20 minutes and the only test data I had was those 7 lines, so don't count on it being perfect. If there are ever line breaks in the subtitles, that will cause a problem. I presumed there aren't; if that is the case I suggest you remove them from the input first rather than trying to deal with them here.

TAFKA 'goldilocks'

Related Q & A:

How To Convert Pmd To Word File?Best solution by file-extensions.org
How to convert a video to a specific format?Best solution by Super User
How to convert a HTML file to XML file?Best solution by Stack Overflow
How to convert a DC locomotive to a DCC locomotive?Best solution by Yahoo! Answers
How to convert a .DMG file into a .EXE?Best solution by answers.yahoo.com

Just Added Q & A:

How many active mobile subscribers are there in China?Best solution by Quora
How to find the right vacation?Best solution by bookit.com
How To Make Your Own Primer?Best solution by thekrazycouponlady.com
How do you get the domain & range?Best solution by ChaCha
How do you open pop up blockers?Best solution by Yahoo! Answers

For every problem there is a solution! Proved by Solucija.

Got an issue and looking for advice?
Ask Solucija to search every corner of the Web for help.
Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.