HTML5 Audio Karaoke – a JavaScript audio text aligner

June 1, 2012 | Bible Tools, HTML/CSS, JavaScript | 23 Comments

What it Does

Based on some amazing work by my friend Weston Ruter, I’ve put together a little library that mashes together

  1. some text (usually some HTML)
  2. an audio source reading that text (usually an mp3)
  3. a timing file (in this case, generated by CMU Sphinx)

The result is that when you press “play” the words are highlighted as they are read, and you can click on words to navigate through the audio. The magic comes from data produced by the CMU Sphinx library (based on Weston’s work) which creates the word timing information.

I put together two demo versions, one of Martin Luther King, Jr.’s I Have A Dream speech and another one of the English Bible using the English Standard Version which has as great API. Unfortunately, the MLK speech didn’t align very well so the demo isn’t very good other than as an example of how dependent the process is on a good alignment.

(note: right now it’s Chrome/Safari/IE9 only since it requires MP3 playback)

How it Works

Although I wanted to use a “standard” format like WebVTT, I also wanted the filesize to be compact since my intended project involved large datasets of 48 hours or more of audio (i.e. the Bible). So here’s the basic JSON format:


Basically, it’s just an array of words with a start and end time. The array of arrays format is quite a bit smaller than using JSON and doesn’t require any processing like WebVTT (although that might change later). It would take quite a bit of time to produce something like this by hand, but Weston used the CMU Sphinx library to generate this data, and it’s probably been about 90% accurate for the entire ESV Bible.

Once all the data is loaded, the AudioAligner class searches through a DOM node for the words in the array, skipping over classes or tags you define, and then links those words to the audio player.


Again, the demo I put together utilizes the API provided by the creators of the English Standard Version (ESV) of the Bible. The API allows developers to request the text and the MP3 and then this is mashed up with the timing files generated with SMU Sphinx.

HTML5 Karoke Demo

If anyone’s interested in the library, please let me know in the comments and I’ll post it to Github.

23 responses to “HTML5 Audio Karaoke – a JavaScript audio text aligner”

  1. Ryan says:

    I’m definitely interested and I’m looking to participate to help on some Bible web/app projects

  2. Ryan says:

    btw, this is awesome 🙂

  3. Winston Fassett says:

    Very cool! Yesterday I was reading up on web audio and ran across an experiment by the author of jPlayer that had some similarities, but it was doing manual audio syncing. I can’t speak to the underlying code, but the demo was fun to fiddle with, particularly using the text to navigate or as a soundboard, and the visualization bit was also nice.

  4. Jay says:

    Yes! Please post the code to Github. I can see this being very useful for playing hymns + words – a hymn karaoke, sort of. Do you have any idea if CMU Shpinx works on other languages?


  5. Alan McCann says:


    Very cool. Would you mind sharing on github?


  6. Mark Boas says:

    Am I interested? AM I? This is amazing – I’m all over it. In fact I wanted to do something myself using CMU Sphinx. Please do put in on github – great work and thanks!

  7. RiaanP says:

    I totally hear you on the timing side of things.. phew, we blew a massive amount of money last year on R&D to build this very tool in Flex.. we basically tried to use Flex to analyse the audio graph and “cleverly” plot the words as it heard it in a fashion where you could then “make minor adjustments” to the plotted words on the audio graph.. needless to say, it is a VERY hard thing to get right and we eventually canned it after trying out existing hardware accelerated timing apps. We did end up using it for some client work, but it was so frustrating to work with. You can see it in action here:

    So, note to all, this line is gospel: The magic comes from data produced by the CMU Sphinx library (based on Weston’s work) which creates the word timing information.

  8. Roy says:

    Would love to play with this! Did you get a chance to post it to github?


  9. John Adams says:

    Appreciate your posting. Using Westons work, along with what I found in Silvia Pfeiffer’s book (Definitive Guide to HTML5 Video) I have created Read Along Videos…thus far highlighting (word by word) over 500000 words. (Sometimes doing the same book 3 times as this has been an interesting learning process)..
    I find time codes are best gotten using Premiere and running Speech Analysis. (I remove everything except letters and numbers…no punctuation..none)…This creates an XMP file which can be exported using export option under file…unfortunately for me, I am a terrible javascript guy and could never figure out the time codes given in the file or exactly how the file worked…(it also gives word duration)…Fortunately for me Adobe Soundbooth will export the same audio file(metadata) with the time codes in seconds and hundredths of seconds into an XML file which I then convert to plain text…Unfortunately Adobe no longer sells Soundbooth(found mine on ebay)
    for a demo I am highlighting Voice of America newscasts and other features here:

  10. Jayant Rimza says:

    I am very much interested in library.

  11. I’m pretty sure I looked at your website several years ago. Thanks for showing Karaoke Text To Speech can work on the PC.

    I went ahead and hooked up ESPEAK, FESTIVAL, and then SAPI TTS voices on my server at home and, through my portal, I can do Karoake Text To Speech on most webpages.

    Double click on the first word in a sentence (that is not a link). From then on the sentence is selected and a sound bar pops up. Wait until “Karaoke ON” happens in the popup and the sound bar lightens up a bit, then hit start. The speech will be spoken and the spoken word selected.

    Doesn’t work on ie11.

    is the webportal I refer to.

    Thanks for showing the effect can happen.

    I use it for several foreign languages as well as English.

  12. Amazing.. i love it.. its awesome.. Visit my web :

  13. Gary kuper says:

    I am very interested in your library and would love to work with the source code for a project of mine. Will you be making the source code available to the public? I look forward to hearing from you and learning more about the possibilities of this tool. Thank you. Sincerely, Gary.

  14. Daniel Moore says:

    I am interested in how this works and possibly improving on it.

  15. Aril says:

    Hi Irene!When you seal your wood canvas with Gesso you have to allow it to dry coelletmpy. I usually do multiple ones that way they are ready to go when I’m ready to paint on them.I usually do at least 2 layers of Gesso on all sides including the back (3 layers is ideal) for longevity.Once the Gesso is dry, then you can sketch on on it.I sketch on Vellum or Tracing paper first so all the erasing and corrections are done on that and not on the wood. Once I’m happy with the drawing then I transfer it using another piece of Velum that I have rubbed coelletmpy with a 4B pencil, you can purchase transfer paper, but I just make it my self.By transferring the drawing, you keep the Gesso clean and then begin to paint.If you do not want to deal with transferring, you can always add a layer of clear coat to the pencil sketch that is on the wood, let it dry and then start painting. The lead will be sealed and will not mix with your colors.I hope that helps.You’ve given me a nice idea for a blog post and I’ll be explaining it using pictures.Take careMaggie

  16. Jose Eduardo says:

    Hello, I’m very interested in you project. Can you make it available to me? I would like to use in a project of mine.

  17. I am very interested in your library, how do i get it from github?

  18. Vick says:

    Amen brother, please send me the link on github with the library…thanks a bunch, Vick

  19. Raúl says:

    Please brother send me the github link its incredible.

  20. Pooja Donekal says:

    This is very helpful for one of the projects I am working on. Can you please provide me with the link?

  21. trying says:

    Here’s the Github link –

  22. Gregory Werking says:

    I would like to develop a website where users can upload their audio prayers to be listened to by later users as a online group prayer session, The text and audio of the prayers scroll across the screen as multiple international users enunciate the words of the prayer at the same time based on the cadence provided by the karaoke api. I need to develop an algorithm that will automatically determine if the upload is not malicious, accurate, and safe for a holy website, What are your thoughts?


  23. Fropt says:

    Hi, I’m interested in your solution. I teach music and I’m looking for tools that help me out with getting better results.
    Could you share your code please?

Hi, I'm John Dyer. In my day job, I build websites and create online seminary software for a seminary in Dallas. I also like to release open source tools including a pretty popular HTML5 video player and build tools that help people find best bible commentaries and do bible study. And just for fun, I also wrote a book on the theology of technology and media.

Fork me on GitHub

Social Widgets powered by