Small batch machine translation of speeches and news articles (English to Chinese/Tamil, and vice versa) in under-30 lines of code, using Hugging Face’s version of MarianMT and Facebook’s Fairseq.

Image for post
Image for post
Illustration: Chua Chin Hon

*UPDATED Dec 30, 2020*:

Facebook recently released recently released its machine translation models for English to Tamil (and vice versa), and I was eager to give it a try since Tamil is among the most under-served languages in machine learning, and related language pairs are pretty hard to come by.

The new notebooks and toy datasets are in the repo. Or, go here for the demo for English-to-Tamil translation of speeches and news articles, and here for Tamil-to-English translation of the same type of material.

There are obvious problems with the quality of the translation in some parts. But machine translation gets about 70–80% of the job done, in my view, allowing human translators to work more efficiently. …


Forecasters correctly predicted that Joe Biden will win, but vastly underestimated the strength of support for Donald Trump. A quick post-mortem suggests that the polls and forecasts became too bullish after August 31.

Image for post
Image for post


Despite the political drama in the home stretch of the US presidential election, two months-long forecasts have held surprisingly steady in projecting a clear win for Joe Biden on Nov 3.

Image for post
Image for post
Joe Biden and Donald Trump at their final presidential debate on Oct 23. Screen-grab via C-Span.

On the surface, the 2020 US Presidential election seems like a wild roller-coaster ride, with each surprising twist and turn of events inducing both panic and dread among voters and observers alike.

In comparison, the polls and forecasts in the run up to the vote on November 3 have kept an almost Zen-like calm. Two months-long forecasts by FiveThirtyEight and weekly magazine The Economist point to an unambiguous win for challenger Joe Biden despite widespread fears of a contested election.

Meanwhile, White House incumbent Donald Trump has seen his chances of re-election decline steadily in the forecasts despite talk of an “October Surprise”. …


Trump’s down in the polls, and forecasts by experts point to a decisive loss on November 3. He doesn’t even seem to be doing as well on Twitter as he did in 2016. Is it game over for the White House incumbent?

Image for post
Image for post

With about two weeks to go before the 2020 US Presidential Election, the statistics on multiple fronts are looking rather grim for White House incumbent Donald Trump.

Daily forecasts from data analysis outfit FiveThirtyEight and weekly magazine The Economist point to a resounding defeat for him on November 3. Trump even appears to be underperforming on Twitter upon closer examination of the metrics.

Is it game over for Trump? As FiveThirtyEight’s editor-in-chief Nate Silver has pointed out on numerous occasions, having a low chance is not the same as having no chance of winning. …


With all eyes on the outcome of the Nov 3 vote, which metric, poll or forecaster can accurately predict the outcome of a highly volatile race for the White House? Probably none in isolation, but an aggregate of reputable forecasts might help.

Image for post
Image for post
Screen-cap of first 2020 US Presidential Debate on September 29: C-Span’s YouTube live-feed.

Note to readers: The forecasts in this post were completed just as news broke that Donald Trump had tested positive for Covid-19. The impact of this major development won’t be clear for a while, and I’ll update the forecasts as things become clearer.

With about a month to go before the 2020 United States Presidential Election on November 3, all eyes are on the barrage of polls and forecasts for the highly volatile race for the White House. …


A practical use case on fine tuning a Distilbert model on a custom dataset, and testing its performance against more commonly used models like Logistic Regression and XGBoost

Image for post
Image for post
Illustration of web app by: Chua Chin Hon

With the 2020 US election around the corner, concerns about electoral interference by state actors via social media and other online means are back in the spotlight in a big way.

Twitter was a major platform that Russia used to interfere with the 2016 US election, and few have doubts that Moscow, Beijing and others will turn to the platform yet again with new disinformation campaigns.

This post will outline a broad overview of of how you can build a state troll tweets detector by fine tuning a transformer model (Distilbert) with a custom dataset. …


AI text generation is one of the most exciting fields in NLP, but also a daunting one for beginners. This post aims to speed up the learning process for newcomers by combining and adapting several existing tutorials into a practical end-to-end walkthrough with notebooks and sample data for a conversational chatbot that can be used in an interactive app.

Image for post
Image for post
Singlish phrases such as “blur as a sotong” can be bewildering for non-Singaporeans. Can a transformer model make sense of it? Photo: Chua Chin Hon

Auto-text generation is undoubtedly one of the most exciting fields in NLP in recent years. But it’s also an area that’s relatively difficult for newcomers to navigate, due to the high bar for technical knowledge and resource requirements.

While there’s no shortage of helpful notebooks and tutorials out there, pulling the various threads together can be time consuming. To help speed up the learning process for fellow newcomers, I’ve put together a simple end-to-end project to create a simple AI conversational chatbot that you can run in an interactive app.

I chose to frame the text generation project around a chatbot as we react more intuitively to conversations, and can easily tell whether the auto-generated text is any good. Chatbots are also ubiquitous enough that most of us would have a good sense of the expected baseline performance without having to consult a manual or an expert. If it’s bad, you’ll know right away without having to check a score or metric. …


Compared to sentiment analysis or classification, text summarisation is a far less ubiquitous NLP task due to the time and resources needed to execute it well. Hugging Face’s transformers pipeline has changed that. Here’s a quick demo of how you can summarise short and long speeches easily.

Image for post
Image for post
Screen grabs from PAP.org.sg (left) and WP.sg (right).

Summarising a speech is more art than science, some might argue. But recent advances in NLP could well test the validity of that argument.

In particular, Hugging Face’s (HF) transformers summarisation pipeline has made the task easier, faster and more efficient to execute. Admittedly, there’s still a hit-and-miss quality to current results. But there are also flashes of brilliance that hint at the possibilities to come as language models become more sophisticated.

This post will demonstrate how you can easily use HF’s pipeline to summarise both short and long speeches. A minor work-around is needed for long speeches due to the maximum sequence limit for models used in the pipeline. …


Results from Singapore’s 2020 vote turned just about every pre-election prediction on its head. What happened? Here’s my post-mortem of the silent earthquake that rocked Singapore politics on July 10.

Image for post
Image for post
Elderly voters queuing up to vote on July 10, 2020, in an election that was shaped by the Covid-19 in form, but not necessarily in substance. Photo: Chua Chin Hon

General Election (GE) 2020 has been widely dubbed the “Covid-19 election”. But results from the July 10 vote and my post-mortem suggest that the pandemic mostly shaped the election in form but not in substance.

Sure, voters had to don masks and endure snaking queues due to Covid-19 related precautions. Fears of new clusters of infection also prompted the authorities to ban outdoor rallies and curtail traditional retail politics.

But the pandemic did not appear to be the central issue that framed the Singaporean voters’ decision. Instead, many were swayed by the Opposition’s call for a more balanced Parliament — so much so that the ruling People’s Action Party’s (PAP) suffered an eye-popping 8.62 percentage points drop in vote share from the last election in 2015. …


Political calculations heading into Singapore’s upcoming polls have been disrupted by the Covid-19 outbreak. Will a ‘flight to safety” among jittery voters give the PAP a major boost? Or will the ruling party be punished for holding the election during a pandemic? Get up to speed with our quick recap of key trends behind the last seven elections.

Image for post
Image for post
A voting booth in Bedok during the 2015 General Election. Photo: Chua Chin Hon

Over the past three decades, Singapore has seen two major inflection points in voting trends — in 2001 and 2011, when electoral support for the ruling People’s Action Party (PAP) reached the respective highest and lowest points since Independence in 1965.

Signs are that General Election (GE) 2020 — framed by the most uncertain global outlook in decades due to the Covid-19 pandemic as well as generational leadership changes in the PAP and Opposition parties — will usher in another major political milestone. …

About

Chua Chin Hon

Data Science | Media | Politics

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store