HTML5 Audio Karaoke – a JavaScript audio text aligner

June 1, 2012 | Bible Tools, HTML/CSS, JavaScript | 10 Comments

What it Does

Based on some amazing work by my friend Weston Ruter, I’ve put together a little library that mashes together

  1. some text (usually some HTML)
  2. an audio source reading that text (usually an mp3)
  3. a timing file (in this case, generated by CMU Sphinx)

The result is that when you press “play” the words are highlighted as they are read, and you can click on words to navigate through the audio. The magic comes from data produced by the CMU Sphinx library (based on Weston’s work) which creates the word timing information.

I put together two demo versions, one of Martin Luther King, Jr.’s I Have A Dream speech and another one of the English Bible using the English Standard Version which has as great API. Unfortunately, the MLK speech didn’t align very well so the demo isn’t very good other than as an example of how dependent the process is on a good alignment.

(note: right now it’s Chrome/Safari/IE9 only since it requires MP3 playback)

How it Works

Although I wanted to use a “standard” format like WebVTT, I also wanted the filesize to be compact since my intended project involved large datasets of 48 hours or more of audio (i.e. the Bible). So here’s the basic JSON format:

{"words":[
 ["in",0.03,0.18],
 ["the",0.18,0.28],
 ["beginning",0.28,0.88],
 ["god",0.88,1.35],
 ["created",1.35,1.93]
]}

Basically, it’s just an array of words with a start and end time. The array of arrays format is quite a bit smaller than using JSON and doesn’t require any processing like WebVTT (although that might change later). It would take quite a bit of time to produce something like this by hand, but Weston used the CMU Sphinx library to generate this data, and it’s probably been about 90% accurate for the entire ESV Bible.

Once all the data is loaded, the AudioAligner class searches through a DOM node for the words in the array, skipping over classes or tags you define, and then links those words to the audio player.

Demo

Again, the demo I put together utilizes the API provided by the creators of the English Standard Version (ESV) of the Bible. The API allows developers to request the text and the MP3 and then this is mashed up with the timing files generated with SMU Sphinx.

HTML5 Karoke Demo

If anyone’s interested in the library, please let me know in the comments and I’ll post it to Github.

10 Responses to “HTML5 Audio Karaoke – a JavaScript audio text aligner”

  1. Ryan says:

    I’m definitely interested and I’m looking to participate to help on some Bible web/app projects

  2. Ryan says:

    btw, this is awesome :)

  3. Winston Fassett says:

    Very cool! Yesterday I was reading up on web audio and ran across an experiment by the author of jPlayer that had some similarities, but it was doing manual audio syncing. I can’t speak to the underlying code, but the demo was fun to fiddle with, particularly using the text to navigate or as a soundboard, and the visualization bit was also nice.

    http://happyworm.com/blog/2010/12/05/drumbeat-demo-html5-audio-text-sync/

  4. Jay says:

    Yes! Please post the code to Github. I can see this being very useful for playing hymns + words – a hymn karaoke, sort of. Do you have any idea if CMU Shpinx works on other languages?

    Thanks.

  5. Alan McCann says:

    Hi:

    Very cool. Would you mind sharing on github?

    Thanks
    Alan

  6. Mark Boas says:

    Am I interested? AM I? This is amazing – I’m all over it. In fact I wanted to do something myself using CMU Sphinx. Please do put in on github – great work and thanks!

  7. RiaanP says:

    I totally hear you on the timing side of things.. phew, we blew a massive amount of money last year on R&D to build this very tool in Flex.. we basically tried to use Flex to analyse the audio graph and “cleverly” plot the words as it heard it in a fashion where you could then “make minor adjustments” to the plotted words on the audio graph.. needless to say, it is a VERY hard thing to get right and we eventually canned it after trying out existing hardware accelerated timing apps. We did end up using it for some client work, but it was so frustrating to work with. You can see it in action here: http://www.readright.co.za/stories/2009/11/jasper-an-outing-to-the-aquarium-read-along/

    So, note to all, this line is gospel: The magic comes from data produced by the CMU Sphinx library (based on Weston’s work) which creates the word timing information.

  8. Roy says:

    Would love to play with this! Did you get a chance to post it to github?

    Thanks!

  9. John Adams says:

    Appreciate your posting. Using Westons work, along with what I found in Silvia Pfeiffer’s book (Definitive Guide to HTML5 Video) I have created Read Along Videos…thus far highlighting (word by word) over 500000 words. (Sometimes doing the same book 3 times as this has been an interesting learning process)..
    I find time codes are best gotten using Premiere and running Speech Analysis. (I remove everything except letters and numbers…no punctuation..none)…This creates an XMP file which can be exported using export option under file…unfortunately for me, I am a terrible javascript guy and could never figure out the time codes given in the file or exactly how the file worked…(it also gives word duration)…Fortunately for me Adobe Soundbooth will export the same audio file(metadata) with the time codes in seconds and hundredths of seconds into an XML file which I then convert to plain text…Unfortunately Adobe no longer sells Soundbooth(found mine on ebay)
    for a demo I am highlighting Voice of America newscasts and other features here:

    http://www.youtube.com/watch?v=gdD64Uc03C0&feature=plcp

  10. Jayant Rimza says:

    I am very much interested in library.

Leave a Reply

Hi, I'm John Dyer. In my day job, I build websites and create online seminary software for a seminary in Dallas. I also like to release open source tools including a pretty popular HTML5 video player and build tools that help people find best bible commentaries and do bible study. And just for fun, I also wrote a book on the theology of technology and media.

Fork me on GitHub

Social Widgets powered by AB-WebLog.com.