Visual/data analysis of 271 people in Singapore who tested positive for Covid-19 despite having one or two shots of the vaccine. This is not a medical study.

Breakdown of 271 breakthrough Covid-19 cases in Singapore by gender, symptomatic and vaccination status. Interactive Flourish chart available here.

The Covid-19 pandemic has confounded medical experts and policy makers alike since it began in 2020. More than a year on, they are grappling with an emerging new mystery involving “breakthrough cases” — people who caught Covid-19 despite being vaccinated.

No vaccine is 100% effective so it’s not a surprise that some vaccinated individuals would still test positive. But it remains unclear how these breakthrough infections occur, whether demographic or environmental factors play a bigger role, and how soon a booster shot is needed in the face of new and more contagious Covid-19 variants.

This post takes a look at…


Going straight from audio to a range of text-based NLP tasks such as translation, summarisation and sentiment analysis

Chart: Chua Chin Hon

The addition of the Wav2Vec2 model in Hugging Face’s transformers library has been one of the more exciting developments in NLP in recent months. Until then, it wasn’t easy to execute tasks like machine translation or sentiment analysis if you only had a long audio clip to work with.

But now you can link up an interesting combination of NLP tasks in one go: transcribe the audio clip with Wav2Vec2, and then use a variety of transformer models to summarize or translate the transcript. …


Efficient transcription of audio files has been one of the major “missing links” in modern NLP — till now. Enter Hugging Face’s implementation of Facebook’s Wav2Vec2 model, which produces impressive out-of-the-box results.

Poet Amanda Gorman delivering the inauguration poem on Jan 20, 2021. Screen-capture via PBS NewsHour’s YouTube clip.

AI can’t write great poetry on its own (yet?). But it can now transcribe a poetry recital really well, if results from the Wav2Vec2 transformer model is anything to go by.

My trials using audio clips ranging in length from 62s to 12.5 minutes, including the evocative Inaugural Poem by youth poet Amanda Gorman, turned up pretty impressive results.

Efficient audio-to-text transcription has been one of the “missing links” in the modern Natural Language Processing (NLP) toolkit. Not anymore it seems, thanks to Hugging Face’s implementation of the Wav2Vec2 model by Facebook.

What’s exciting about this is that it opens…


Forecasters correctly predicted that Joe Biden will win, but vastly underestimated the strength of support for Donald Trump. A quick post-mortem suggests that the polls and forecasts became too bullish after August 31.


Despite the political drama in the home stretch of the US presidential election, two months-long forecasts have held surprisingly steady in projecting a clear win for Joe Biden on Nov 3.

Joe Biden and Donald Trump at their final presidential debate on Oct 23. Screen-grab via C-Span.

On the surface, the 2020 US Presidential election seems like a wild roller-coaster ride, with each surprising twist and turn of events inducing both panic and dread among voters and observers alike.

In comparison, the polls and forecasts in the run up to the vote on November 3 have kept an almost Zen-like calm. Two months-long forecasts by FiveThirtyEight and weekly magazine The Economist point to an unambiguous win for challenger Joe Biden despite widespread fears of a contested election.

Meanwhile, White House incumbent Donald Trump has seen his chances of re-election decline steadily in the forecasts despite talk of…


Trump’s down in the polls, and forecasts by experts point to a decisive loss on November 3. He doesn’t even seem to be doing as well on Twitter as he did in 2016. Is it game over for the White House incumbent?

With about two weeks to go before the 2020 US Presidential Election, the statistics on multiple fronts are looking rather grim for White House incumbent Donald Trump.

Daily forecasts from data analysis outfit FiveThirtyEight and weekly magazine The Economist point to a resounding defeat for him on November 3. Trump even appears to be underperforming on Twitter upon closer examination of the metrics.

Is it game over for Trump? As FiveThirtyEight’s editor-in-chief Nate Silver has pointed out on numerous occasions, having a low chance is not the same as having no chance of winning. …


With all eyes on the outcome of the Nov 3 vote, which metric, poll or forecaster can accurately predict the outcome of a highly volatile race for the White House? Probably none in isolation, but an aggregate of reputable forecasts might help.

Screen-cap of first 2020 US Presidential Debate on September 29: C-Span’s YouTube live-feed.

Note to readers: The forecasts in this post were completed just as news broke that Donald Trump had tested positive for Covid-19. The impact of this major development won’t be clear for a while, and I’ll update the forecasts as things become clearer.

With about a month to go before the 2020 United States Presidential Election on November 3, all eyes are on the barrage of polls and forecasts for the highly volatile race for the White House. …


A practical use case on fine tuning a Distilbert model on a custom dataset, and testing its performance against more commonly used models like Logistic Regression and XGBoost

Illustration of web app by: Chua Chin Hon

With the 2020 US election around the corner, concerns about electoral interference by state actors via social media and other online means are back in the spotlight in a big way.

Twitter was a major platform that Russia used to interfere with the 2016 US election, and few have doubts that Moscow, Beijing and others will turn to the platform yet again with new disinformation campaigns.

This post will outline a broad overview of of how you can build a state troll tweets detector by fine tuning a transformer model (Distilbert) with a custom dataset. …


Small batch machine translation of speeches and news articles (English to Chinese/Tamil, and vice versa) in under-30 lines of code, using Hugging Face’s version of MarianMT and Facebook’s Fairseq.

Illustration: Chua Chin Hon

*UPDATED Dec 30, 2020*:

Facebook recently released recently released its machine translation models for English to Tamil (and vice versa), and I was eager to give it a try since Tamil is among the most under-served languages in machine learning, and related language pairs are pretty hard to come by.

The new notebooks and toy datasets are in the repo. Or, go here for the demo for English-to-Tamil translation of speeches and news articles, and here for Tamil-to-English translation of the same type of material.

There are obvious problems with the quality of the translation in some parts. But machine…


AI text generation is one of the most exciting fields in NLP, but also a daunting one for beginners. This post aims to speed up the learning process for newcomers by combining and adapting several existing tutorials into a practical end-to-end walkthrough with notebooks and sample data for a conversational chatbot that can be used in an interactive app.

Singlish phrases such as “blur as a sotong” can be bewildering for non-Singaporeans. Can a transformer model make sense of it? Photo: Chua Chin Hon

Auto-text generation is undoubtedly one of the most exciting fields in NLP in recent years. But it’s also an area that’s relatively difficult for newcomers to navigate, due to the high bar for technical knowledge and resource requirements.

While there’s no shortage of helpful notebooks and tutorials out there, pulling the various threads together can be time consuming. To help speed up the learning process for fellow newcomers, I’ve put together a simple end-to-end project to create a simple AI conversational chatbot that you can run in an interactive app.

I chose to frame the text generation project around a…

Chua Chin Hon

Data Science | Media | Politics

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store