For this (very nerdy) article, I had help from statistician, Jeff Andrews, at the University of British Columbia. Any mistakes are solely my responsibility, not his.

So you want more website visitors?

If you’re like many other business owners and entrepreneurs, you’d like more website visitors. Photo by PhotoMIX Ltd. from Pexels — If you’re like many other business owners and entrepreneurs, you’d like more website visitors. Photo by **PhotoMIX Ltd.** from **Pexels**

If you’re like many other business owners and entrepreneurs, you would give your eye teeth for more visitors to your business website. I’d certainly like more. More website visitors means greater potential for more customers. (Of course, you care about the quality of your traffic too, not just the quantity.)

You know that content marketing is a way to attract more website visitors and since you’re more of a writer than a podcaster or video star, you may be thinking of starting a blog. But because you take only calculated, sensible risks, you want to know how much impact blogging is likely to have on your business goals. What does the evidence say?

If you do a quick Google search for blogging statistics, you’ll find all sorts of websites saying that blogging will help get traffic to your website. But if you have a skeptical mind, you’ll want to dig a little deeper. And you’ll also want to read Jon Morrow’s article, For All the Entrepreneurs Confused About How Content Marketing Actually Works for why content marketing isn’t really about traffic anyway.

In the mean time, in this post, we’ll look into just one statistic. I’ve seen it repeated in numerous places, and you may have too:

Companies that blog have 55% more website visitors than those that don’t.

That sounds impressive. Let’s look at it in detail.

The source is this HubSpot article: Study Shows Business Blogging Leads to 55% More Website Visitors. The author says:

“I looked at data from 1,531 HubSpot customers (mostly small- and medium-sized businesses). 795 of the businesses in my sample blogged, 736 didn't.
The data was crystal clear: Companies that blog have far better marketing results. Specifically, the average company that blogs has 55% more visitors…”

In this blog post, we’ll look more closely at that statistic (which I’ll refer to as the “55% statistic”).

What I’m saying and what I’m not saying

Let me be explicit that I’m not saying the statistic is wrong, or that the author or HubSpot have been dishonest, or anything like that.

My point is only: Without more information, we skeptical marketers can’t know how seriously to take the 55% statistic.

The sample size (1531 companies) is decent, so that’s good. But there are all sorts of things that should raise at least orange flags about this statistic and its importance:

It’s a non-random sample.
The author (and HubSpot) is biased towards finding a result that supports blogging.
Other factors aren’t taken into consideration.
Crucial terms aren’t defined.
The time spans aren’t specified.
The variability of the data isn’t taken into account.

We’ll look at each of these points in turn.

1. Non-random sample

If a researcher wanted to know the average height of adult Canadian women, of course he wouldn’t be able to get the measurements of every single Canadian woman to find out. Instead, he would get the measurements of everyone in a more manageable sample.

Let’s suppose his sample consisted of 1000 Canadian women. But everyone in that sample was a past or present member of the Canadian Women’s National Basketball team. It’s easy to see how that non-random sample would be biased. The average height of the sample would not reflect the average height of Canadian women in general. That’s why it’s necessary to have a random sample — to minimize the risk of a skewed result from a biased sample.

The average height of 1000 Canadian women basketball players would not represent the average height of Canadian women in general. A random sample reduces the risk of a misleading statistic. Photo by Chris Poss Photography.

The sample for the 55% statistic wasn’t random. It consisted entirely of HubSpot’s customers. And, for all we know, there may be something about HubSpot customers that makes them more likely to get good results from blogging than companies that aren’t HubSpot customers.

So even if HubSpot customers that blog get, on average, 55% more website visitors than those that don’t — perhaps the statistic for non-HubSpot customers is completely different. (Just like the average height of Canadian women national basketball players is very different from the average height of Canadian women who have never played on the national basketball team.)

What’s an example of how HubSpot customers’ average blogging results could differ from the average results obtained by non-HubSpot customers?

HubSpot customers might be, on average, more content-marketing-savvy than those who don’t use HubSpot. So they may do a better job of optimizing and promoting their blogs. Because they’re more content-marketing-savvy, HubSpot customers that blog might get better results than non-HubSpot customers who blog. So, even if the 55% statistic holds for HubSpot customers, it may not hold for non-HubSpot customers.

Note that I’m not saying that this sample of HubSpot customers does differ in any way that matters from a group of non-HubSpot customers. It’s just that there’s a risk that it does. A random sample has less risk of getting results that are skewed in some way.

2. The author (and HubSpot) is biased towards finding a result that supports blogging.

HubSpot sells “inbound marketing software”. Blogging is a typical part of inbound marketing, so they have an interest in promoting blogging. This means that HubSpot and its authors are biased towards blogging and could unintentionally influence the results they ‘find’.

When academic researchers conduct studies, they take steps to make sure they don’t unintentionally influence the results. (If they’re honest, of course they don’t intentionally influence them. And the same is true for HubSpot.)

Suppose a scientist knows her career will get a big boost if she discovers result x. So, when she does her research, she really wants to find that x is true. So — because she also cares about the truth — she takes steps to make sure she doesn’t unintentionally influence the results so that she ‘finds’ x.

A scientist who wanted to show that coffee helps people lose weight would take steps to ensure she doesn’t unintentionally bias her study’s results. Photo by Arthur Ogleznev on Unsplash.

For example, suppose our scientist, Dr Brown, wants to find out if drinking coffee helps people lose weight. She recruits a few hundred undergraduate students to take part in her experiment. They will drink different amounts of coffee — from zero to six cups every day — record their other calorie consumption, the amount of physical activity they engage in, etc. The subjects are weighed at the beginning of the experiment and again 12 months later.

One of the students comes to be weighed at the 12 month point. Suppose the scale fluctuates between 57.35 kg and 57.34 kg. Should the weight be rounded up to 57.4 kg or rounded down to 57.3 kg?

If Dr Brown did the weighing herself, and if she knew that the student was on the six cups a day regimen, you know what she would be inclined to do! She would be inclined to round down because she wants the larger coffee consumption to result in lower weights. Even if she genuinely believed that 57.3 kg was the correct weight to be recorded, she’s only human and her beliefs can be influenced by her desires even when she can’t detect that influence.

To avoid this kind of bias, Dr Brown has assistants weigh the subjects instead. The subjects are instructed not to talk about the experiment when they’re being weighed. And — crucially — the assistants don’t know how much coffee any of the subjects drank. They may not even know that the research has anything to do with coffee consumption. This way, the assistants can’t unintentionally influence the weight recordings in favour of finding the result Dr Brown wants.

This kind of experimental ‘blinding’ is normal practice in scientific research and it means we can have more confidence in the results. But the author of the HubSpot article was likely looking for results to support blogging, so he was more likely to ‘find’ them.

Does that mean the resulting statistic is false? No, of course not. Again, it’s a matter of risk. If Dr Brown had weighed all the subjects herself, does that mean that whatever result she reports is false? No. It just means that it would be at greater risk of being false than if she used assistants as described above. We could be less confident in the results.

Similarly, for the 55% statistic. Because we know the author and HubSpot have a bias in favour of blogging, we can be less confident in statistic they report.

3. Other factors aren’t taken into consideration.

Imagine some researchers discovered that 15 year olds who play, on average, 5 or more hours of video games every day tend to do worse at school than kids who spend less time playing video games. You can imagine the headlines about how awful video games are.

But suppose the researchers didn’t pay any attention to other factors that could have detrimental effects on school performance — such as having neglectful parents, for example. Perhaps many kids who play 5 or more hours of video games every day also have neglectful parents. We would want to know: Is it the 5+ hours of video games or the neglectful parents (or something else) that is causing kids to do worse in school? The researchers would need to compare at least these two groups of kids with those that play less than 5 hours of video games a day:

A. Those who play 5 or more hours of video games and have neglectful parents

B. Those who play 5 or more hours of video games but don’t have neglectful parents.

Suppose it turns out that group B doesn’t do worse in school — despite playing 5+ hours of video games a day. Then we’d be able to see that playing 5 or more hours of video games isn’t the factor that’s making a difference to school performance. To know whether playing 5+ hours of video games is having an effect on school performance, the researchers have to look at other factors that could have an effect.

Similarly, perhaps the companies that blog also tend to spend a significant amount of money on advertising (for example). We wouldn’t know how much of the difference in website visitors is due to advertising, rather than due to blogging.

Companies that blog also might be more likely to do any of the following that could result in more website visitors:

Use Facebook or Twitter effectively.
Have a mobile responsive website.
Create videos and post on YouTube.
Send effective email newsletters.
Have an on-staff SEO expert.
Send out direct mail.

So we can’t know if blogging really had an effect on the number of website visitors, or if something else that happens to go along with blogging — such as advertising or good use of social media — caused the higher traffic.

Again, I’m not saying that the 55% statistic is false. I’m saying that, without information about other factors that could have an effect on the number of website visitors, we don’t know how much weight to give the statistic.

For all we know from this statistic, without more information, we might have better reason to focus on something else if our goal is to increase our number of website visitors.

4. Crucial terms aren’t defined.

The author doesn’t define his terms, and different definitions may lead to different stats that sound less impressive.

For example, does he count a company that hasn’t updated its blog for 6 months as a company that does or doesn’t blog? What about not updating for 3 months? Or 2 months?

The different definitions of a ‘company that blogs’ could make a difference to how the statistics turn out. And bias could influence the choices made, so that the statistic turns out more favourable to blogging. We don’t know if other definitions would have resulted in a statistic that would appear to support not blogging. Again, this means that we should be careful about granting significance to the resulting 55% statistic.

Another term that’s not defined is ‘small or medium sized business’. The author says that most of HubSpot’s customers fall into this category, but you might be surprised by how large a ‘small business’ can be, according to some definitions.

Since HubSpot is an American company, I looked up how the US government’s Small Business Administration defines ‘small business’. In some industries, small businesses can have up to 1,500 employees or annual receipts of up to US$38.5 million.

For example, a commercial bakery is still counted as a small business if it has 1,000 employees, and a women’s clothing store is still small if its annual receipts are US$27.5 million. Oddly, a men’s clothing store is counted as small only until it brings in US$11 million. (You can scroll down on this page to find a table of the different size standards: Small Business Size Standards.)

If the sample of HubSpot’s customers is mostly companies of this size (and larger), it’s easy to see that the 55% statistic might not apply to small businesses of between 1 and 10 people, say. Once again, we don’t have enough information to decide whether the 55% statistic is a good reason to blog.

5. The time spans aren’t specified.

This is a quick one. The author doesn’t mention the time spans in question. So even if everything else was good about the 55% statistic, we still wouldn’t know if blogging would be worth the effort.

If a company has to publish a blog post every day for 10 years to get 55% more lifetime website visitors than companies that don’t blog, that’s quite different from blogging once a month and getting 55% more monthly visitors in 6 months!

Without knowing the time spans in question, we can’t know how much weight to give the statistic when deciding whether to blog or not.

6. The variability of the data isn’t taken into account.

This section is a bit trickier than the others, so buckle in and hold on to your hat…

The problem is that the author doesn’t address the variability of the data. And this matters. I’ll use an extreme example to make this point clear.

In scenario A, imagine that every single company that blogged got exactly 300 website visitors per week and every single company that didn’t blog got exactly 200 website visitors per week. (I told you this was an extreme example! Bear with me.)

Of course, this means that blogging companies get, on average, 300 website visitors per week — which is 50% more than the average for non-blogging companies (200 visitors per week).

Something other than chance must account for the way the data is distributed so neatly and closely around the two averages (200 and 300). That neat clustering would be very unlikely to happen by chance. In this case, it’s very likely there’s an explanatory relationship between blogging and number of site visitors. (E.g. Blogging causes a website to get more visitors.)

Now let’s imagine scenario B. When we list the number of weekly website visitors for the blogging companies, the numbers are all over the place:

0, 10, 25, 500, 965… And the average works out to be 300 visitors per week.

Similarly, let’s imagine that when we list the number of weekly website visitors for the non-blogging companies, the numbers are also all over the place:

0, 10, 20, 400, 570… And the average works out to be 200 visitors per week.

Because the numbers for both the blogging and non-blogging companies are both all over the place in scenario B, and there’s not a neat clustering around the averages, it may well be that the number of visitors has absolutely nothing to do with blogging. And instead other factors are making those numbers vary, pretty randomly.

Scenario A and scenario B both have groups of data that have the same average number of website visitors. But in scenario A, there’s very likely to be an explanatory relationship between blogging/not-blogging and the number of website visitors.

But in scenario B, it’s much more likely that there is not an explanatory relationship between blogging/not-blogging and the number of website visitors. In scenario B, the number of website visitors probably doesn’t depend on blogging. (Or, if it does, the effect is very small.)

Talking only about averages masks the importance of the variability of the data. And variability matters when we’re looking for explanatory relationships — like whether or not blogging will be likely to help increase the number of website visitors.

Of course, this is an extreme example to make the point clear. But the point is the same even if we make the data in scenario A realistic and a bit more variable. If there’s still a relatively nice, tidy cluster around the two averages in scenario A, then there’s likely to be an explanatory relationship, unlike in the more random scenario B.

Once again, we don’t have important information that would help us decide how seriously to take the 55% statistic. So we can’t know if this stat provides any reason to blog, if our goal is more website visitors.

Conclusion

There are other questions we could ask about this statistic. (For example, we haven’t talked explicitly about outliers.) But this post is long enough for one day. And, remember, my point is not that the statistic is wrong or that the author or HubSpot have been dishonest. For all we know, they did a full statistical analysis, but didn’t want to include all the technical details for their readers.

To repeat, my point is:

If your goal is to get more website visitors, this statistic — in the absence of more information — can’t help you decide whether blogging is a good method for achieving that goal.

PS Well done if you read all the way to the end of a statistics post! *Round of applause.*

Download this post

Enter your name and email for instant access to a downloadable version of this post. It’s a Google doc you can save in a bunch of different formats to read later. You'll also receive notifications when I publish new posts. I won’t share your email address and you can unsubscribe any time. Thanks! :)

Why blog for business? Do companies that blog really 55% get more website visitors? Six reasons to be skeptical.