Demarcation between “good” and “bad” in data-based decision making
Source: Dr. Jessica Zhao Jing-xin, Wisers Data Scientist 2021.02.12
In order to capture the attention of the audiences, who usually do not concentrate and have a short attention span in an information-intensive era, the public relations (PR) companies need to work hard to build and maintain brand images. They also need to invest manpower and resources into marketing. Is it enough to publish 10 posts a day? Or should we double down to 20 posts? Do 1,000 Likes a day mean a success? Or should we strive to get a minimum of 1,001 Likes? This article offers insights into the demarcation between “good” and “bad” from a statistical perspective.
First, we need to figure out with whom we want to compare. If we compare with our past performance, we need to see whether our brand profile has improved recently; if we compare to competitors in the industry, we need to check whether our marketing campaign has outperformed them over the same timeline. Different methods will be used in the two comparisons.
To compare to past performances, we need to arrange the set of data collected in chronological order in the form of a timeline; for example, pinning the share of voice (SOV) of a brand in each day of the month in chronological order. We can imagine that the numbers on the timeline will change every day, like share prices.
There will be many questions about this timeline. For instance, on the day with higher SOV, does it really mean more mentions? Or is it just a simple fluctuation? If we cannot simply compare it with the prior day’s data, then which day’s data should we compare it to? There are many research methodologies for the timeline in econometrics. Among them, moving average (MA) is a simple tool measuring overall performance. A simple MA is the average of the previous n numbers. For instance, the 7-day MA is the average SOV in the previous seven days. When we calculate the next day’s MA, we need to add the new value for today and remove the old value on the first day.
To put it simply, MA is a way to smoothen data and identify an overall trend on the timeline. Then, we can figure out whether there is a structural improvement in the SOV today. This MA is an effective demarcation between “good” and “bad”.
Take a brand’s SOV as an example. If we want to learn about our brand’s performance in the industry, we need to look at the issue from another perspective. Under the background of fair competition on the same timeline, we can assume that every brand’s SOV is independent. We can put all brands’ SOV together and describe it with a “probability distribution”, and then we will find our position in it.
As for “probability distribution”, the most common is “normal distribution”. The mean value and standard deviation of a "normal distribution" can be easily estimated, and "normal distribution" is a very ideal distribution. It uses mean value as an axis of symmetry, and most people are concentrated in the middle. In a "normal distribution", less than 1% of data will be three standard deviations or higher than the mean value. In other words, if the SOV of our brand is three standard deviations higher than the mean value, it indicates our brand is among the top 1% in terms of SOV. On the contrary, if we define the top 1% as good, our brand’s SOV must exceed the industry average by three standard deviations.
The figure which exceeds the industry average by three standard deviations, is the demarcation point between "good" and "bad" that we try to find.
However, in the real world, is the SOV really a "normal distribution"? The Wisers' big data research team has collected data on Hong Kong social media in the past three years, finding that the world is far from "ideal". In terms of SOV or other dimensions such as engagement and number of followers, it shows an "exponential distribution" or even a broader "GAMMA distribution", mainly with signs of extreme asymmetry and serious heavy-tailed distribution.
At this time, if you still simply use the "normal distribution" approach to estimate the mean value and standard deviation of the whole industry, it is likely to get a wrong conclusion. Therefore, you should dig into the actual "probability distribution" of the data you're looking at, so as to identify the line that demarcates “good” from “bad”.
The Wisers Indices of the KOL ranking list by WiseInfluencer studies nearly 30,000 KOL accounts on Hong Kong social media, finding that the "GAMMA distribution" can describe all KOL data in all dimensions more accurately. Based on it, the Wisers Indices score is worked out to accurately evaluate the comprehensive performance of KOL in the period.