Efficient transcription of audio files has been one of the major “missing links” in modern NLP — till now. Enter Hugging Face’s implementation of Facebook’s Wav2Vec2 model, which produces impressive out-of-the-box results.

Image for post
Image for post
Poet Amanda Gorman delivering the inauguration poem on Jan 20, 2021. Screen-capture via PBS NewsHour’s YouTube clip.

AI can’t write great poetry on its own (yet?). But it can now transcribe a poetry recital really well, if results from the Wav2Vec2 transformer model is anything to go by.

My trials using audio clips ranging in length from 62s to 12.5 minutes, including the evocative Inaugural Poem by youth poet Amanda Gorman, turned up pretty impressive results.

Efficient audio-to-text transcription has been one of the “missing links” in the modern Natural Language Processing (NLP) toolkit. Not anymore it seems, thanks to Hugging Face’s implementation of the Wav2Vec2 model by Facebook.

What’s exciting about this is that it opens…


Small batch machine translation of speeches and news articles (English to Chinese/Tamil, and vice versa) in under-30 lines of code, using Hugging Face’s version of MarianMT and Facebook’s Fairseq.

Image for post
Image for post
Illustration: Chua Chin Hon

*UPDATED Dec 30, 2020*:

Facebook recently released recently released its machine translation models for English to Tamil (and vice versa), and I was eager to give it a try since Tamil is among the most under-served languages in machine learning, and related language pairs are pretty hard to come by.

The new notebooks and toy datasets are in the repo. Or, go here for the demo for English-to-Tamil translation of speeches and news articles, and here for Tamil-to-English translation of the same type of material.

There are obvious problems with the quality of the translation in some parts. But machine…


Forecasters correctly predicted that Joe Biden will win, but vastly underestimated the strength of support for Donald Trump. A quick post-mortem suggests that the polls and forecasts became too bullish after August 31.

Image for post
Image for post

Despite the political drama in the home stretch of the US presidential election, two months-long forecasts have held surprisingly steady in projecting a clear win for Joe Biden on Nov 3.

Image for post
Image for post
Joe Biden and Donald Trump at their final presidential debate on Oct 23. Screen-grab via C-Span.

On the surface, the 2020 US Presidential election seems like a wild roller-coaster ride, with each surprising twist and turn of events inducing both panic and dread among voters and observers alike.

In comparison, the polls and forecasts in the run up to the vote on November 3 have kept an almost Zen-like calm. Two months-long forecasts by FiveThirtyEight and weekly magazine The Economist point to an unambiguous win for challenger Joe Biden despite widespread fears of a contested election.

Meanwhile, White House incumbent Donald Trump has seen his chances of re-election decline steadily in the forecasts despite talk of…


Trump’s down in the polls, and forecasts by experts point to a decisive loss on November 3. He doesn’t even seem to be doing as well on Twitter as he did in 2016. Is it game over for the White House incumbent?

Image for post
Image for post

With about two weeks to go before the 2020 US Presidential Election, the statistics on multiple fronts are looking rather grim for White House incumbent Donald Trump.

Daily forecasts from data analysis outfit FiveThirtyEight and weekly magazine The Economist point to a resounding defeat for him on November 3. Trump even appears to be underperforming on Twitter upon closer examination of the metrics.

Is it game over for Trump? As FiveThirtyEight’s editor-in-chief Nate Silver has pointed out on numerous occasions, having a low chance is not the same as having no chance of winning. …


With all eyes on the outcome of the Nov 3 vote, which metric, poll or forecaster can accurately predict the outcome of a highly volatile race for the White House? Probably none in isolation, but an aggregate of reputable forecasts might help.

Image for post
Image for post
Screen-cap of first 2020 US Presidential Debate on September 29: C-Span’s YouTube live-feed.

Note to readers: The forecasts in this post were completed just as news broke that Donald Trump had tested positive for Covid-19. The impact of this major development won’t be clear for a while, and I’ll update the forecasts as things become clearer.

With about a month to go before the 2020 United States Presidential Election on November 3, all eyes are on the barrage of polls and forecasts for the highly volatile race for the White House. …


A practical use case on fine tuning a Distilbert model on a custom dataset, and testing its performance against more commonly used models like Logistic Regression and XGBoost

Image for post
Image for post
Illustration of web app by: Chua Chin Hon

With the 2020 US election around the corner, concerns about electoral interference by state actors via social media and other online means are back in the spotlight in a big way.

Twitter was a major platform that Russia used to interfere with the 2016 US election, and few have doubts that Moscow, Beijing and others will turn to the platform yet again with new disinformation campaigns.

This post will outline a broad overview of of how you can build a state troll tweets detector by fine tuning a transformer model (Distilbert) with a custom dataset. …


AI text generation is one of the most exciting fields in NLP, but also a daunting one for beginners. This post aims to speed up the learning process for newcomers by combining and adapting several existing tutorials into a practical end-to-end walkthrough with notebooks and sample data for a conversational chatbot that can be used in an interactive app.

Image for post
Image for post
Singlish phrases such as “blur as a sotong” can be bewildering for non-Singaporeans. Can a transformer model make sense of it? Photo: Chua Chin Hon

Auto-text generation is undoubtedly one of the most exciting fields in NLP in recent years. But it’s also an area that’s relatively difficult for newcomers to navigate, due to the high bar for technical knowledge and resource requirements.

While there’s no shortage of helpful notebooks and tutorials out there, pulling the various threads together can be time consuming. To help speed up the learning process for fellow newcomers, I’ve put together a simple end-to-end project to create a simple AI conversational chatbot that you can run in an interactive app.

I chose to frame the text generation project around a…


Compared to sentiment analysis or classification, text summarisation is a far less ubiquitous NLP task due to the time and resources needed to execute it well. Hugging Face’s transformers pipeline has changed that. Here’s a quick demo of how you can summarise short and long speeches easily.

Image for post
Image for post
Screen grabs from PAP.org.sg (left) and WP.sg (right).

Summarising a speech is more art than science, some might argue. But recent advances in NLP could well test the validity of that argument.

In particular, Hugging Face’s (HF) transformers summarisation pipeline has made the task easier, faster and more efficient to execute. Admittedly, there’s still a hit-and-miss quality to current results. But there are also flashes of brilliance that hint at the possibilities to come as language models become more sophisticated.

This post will demonstrate how you can easily use HF’s pipeline to summarise both short and long speeches. A minor work-around is needed for long speeches due to…


Results from Singapore’s 2020 vote turned just about every pre-election prediction on its head. What happened? Here’s my post-mortem of the silent earthquake that rocked Singapore politics on July 10.

Image for post
Image for post
Elderly voters queuing up to vote on July 10, 2020, in an election that was shaped by the Covid-19 in form, but not necessarily in substance. Photo: Chua Chin Hon

General Election (GE) 2020 has been widely dubbed the “Covid-19 election”. But results from the July 10 vote and my post-mortem suggest that the pandemic mostly shaped the election in form but not in substance.

Sure, voters had to don masks and endure snaking queues due to Covid-19 related precautions. Fears of new clusters of infection also prompted the authorities to ban outdoor rallies and curtail traditional retail politics.

But…

Chua Chin Hon

Data Science | Media | Politics

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store