Sampling of data

Hello, I’m trying to understand the concept of sampling and how it affects our data.
We have a website where Piwik PRO has been implemented for approximately a month, and in that time, there has been over a million visits.

When creating reports, the sample size is 100% by default. However, when reading What is data sampling and how does it work?, it says that Piwik does not sample data by default. When changing the sample size in the reports to 10%, the visits count increases with a few thousands.

What does this mean and what is the correct approach here?

Thanks for help!

Hi, I recommend using sampling in cases when you’d like to speed up loading reports because it takes way too long. Usually that happens for sites with tens or hundreds of millions of monthly events.

First start with a higher sample size and go down to get a good balance between precision and speed. The lower sample size you choose the worse precision you get. Precision is also affected by the overall data set size - if bigger, then precision can be better but if the data size is not big enough, results can be way off the real values.

Sampling takes only a fraction of data and final metrics values are calculated based on that sample.

Hello Jarek,

thank you for your response.

I have not actively chosen sampling, it seems as if its set by default in my dashboard (see printscreen). How do I “turn off” sampling and use raw data?

Anteckning 2024-04-24 091126

With a 100% sample size selected, you’re working on the full data set and that is the default option which literally means no sampling enabled.

Oh I see!! Thank you for the clarification.

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.