SourceForge Q&A: YouTube Subtitle Conversion

Google2SRT

SourceForge’s “Staff Pick” Project of the Month for November is Google2SRT, a conversion tool that allows you to download, save, and convert subtitles from YouTube and Google Video to SubRip (.srt) format. Google2SRT administrator kom shared his thoughts about the project’s history, purpose, and direction.

Click here to find video-related jobs.

Tell me about the Google2SRT project, please.

Google2SRT is a tool that downloads XML CC (Closed Captions/ subtitles) from a former Google Video or YouTube and converts them to SubRip (SRT), which is recognized by most video players.

What made you start this?

Back in 2007, there was a publicly available documentary on the former Google Video platform with non-embedded subtitles in many non-English languages. Some friends and I wanted to download it so we could watch it without the inconveniences of online streaming in our non-wireless house. The documentary’s authors also distributed the video via P2P; however, subtitles were not available. The documentary was publicly available, so I attempted to download it from Google Video but couldn’t find any subtitles! I searched on the Internet, among the dozens of video downloaders available in those days, to see if there was any that would help us get the subtitles. Perhaps, closed captions were not extensively used back then because I had to give up on my search without results. When I investigated a bit, I realized subtitles were transmitted via a simple XML file, which could be easily transformed to SRT. So I downloaded the XML file and wrote a rudimentary Java application to convert it to SRT. That’s how we were able to enjoy that documentary offline and with subtitles!

Has the original vision been achieved?

Yes, I achieved more than I ever thought possible. On one hand, Google Video and its CC unfortunately never received the attention they deserved and were slowly dying. And, on the other hand, in 2008, the omnipresent YouTube service implemented a practically identical protocol and XML format, which was then the potential growth for this application’s audience.

Who can benefit the most from your project?

Anyone who has the need to download YouTube video subtitles for later offline usage in an alternative video player such as people learning languages, people who have a language barrier, or people who are hearing-impaired.

Upload Your ResumeEmployers want candidates like you. Upload your resume. Show them you’re awesome.

What is the need for this particular subtitle conversion program?

It gives a user what Google received from another user, subtitles in SRT format. If Google allowed you to download these subtitles, Google2SRT would be useless.

What’s the best way to get the most out of using Google2SRT?

The application is quite simple and so it is design. In the latest release, v0.7, supports multiple videos (with multiple subtitles with multiple translations!). The design is so simple that it does not overwhelm the user. The application is documented but, as stated, it really is a simple tool.

What has your project team done to help build and nurture your community?

Basically, we offer new features when Google updates its functionality on YouTube, like translations, ASR, and a multi-lingual interface.

Have you all found that more frequent releases help build up your community of users?

There has never been a very frequent release schedule (some gaps go up to a year and a half!); however, activity has increased recently from users downloading the application, contributors offering translations, suggestions for fixes, or requests for features.

What was the first big thing that happened for your project?

In 2008, the big surprise was that YouTube practically absorbed and inherited the Google Video CC design, which encouraged us to add network support for Google Video and YouTube.

What helped make that happen?

A few years after YouTube replaced Google Video, Google Video vanished as video sharing and streaming service.

What was the net result for that event?

YouTube got more and more subtitle-related features, like automatic translations and ASR (Automatic Speech Recognition) subtitles. And Google2SRT provided support for this functionality. Actually, nowadays “Google2SRT” could only mean “Google’s format to SRT” through YouTube, its only supported live service.

What is the next big thing for Google2SRT?

There are some user requests to process YouTube playlists and multiple offline XML files. The former can be partially achieved in v0.7 when the playlist’s list of URL is provided in a text file (obtained from an alternative source). The latter, and also the ability to save XML files without converting them on-the-fly, are pending additions to the next release.

How long do you think that will take?

It is hard to say. This is a personal project that is enhanced from time to time in my spare hours.

Do you have the resources you need to make that happen?

We could definitely use more help and are always open to contributions. I would like to thank everybody who has contributed to Google2SRT by translating the application or the website, reporting bugs, suggesting improvements, and especially JAYZMRT who provided some valuable information regarding ASR retrieval.

If you had it to do over again, what would you do differently for Google2SRT and why?

Frankly, there are no real regrets even when bugs are reported. Even the initial command-line release, v0.1, only available in Catalan with less than 200 lines of actual code, besides charset encoding and certain bugs fixed, still does the job of converting Google’s XML to SRT!

Is there anything else we should know? 

It is great news to get recognized as a SourceForge Project of the Month, especially taking into account the high quality, age, and size of some other past projects of the month. Again, I would like to thank all people who contributed to this project in a way or another, even if it was just to let me know that Google2SRT solved one of their small problems.

Related Articles

Image: Google2SRT