Add Filter by ISP?

mbaersch · July 16, 2022, 9:49am

Hi,

in some accounts, I see a significant percentage of visitors using ISPs like “Google Cloud”, “Microsoft Corporation” or “Amazon.com”. For the most part you can be sure this is non-human traffic. And especially in “consent-free” setups, collecting hits from bots is quite possible. Additionally, browsers include things like “Headless Chrome”. Even “PhantomJS” might occur here amongst many other suspicious entries.

While blocking tracking for suspicious User Agents directly in the browser is possible and even can be configured in the site settings, detecting bots by processing the IP information would require quite some effort and resources. Would be nice if there was any way of filtering data before hits by this end up in reports. An exclusion list for ISPs just like the existing one for IP addresses might be a solution?

In the meantime I might have to use the option to exclude user agents quite extensive. A question regarding this feature: Are agents like „Chrome-Lighthouse“ useless entries here? Because I assume this is one of the agents optionally blocked by using the option „Don´s collect data from known crawlers“. Is there a public list of crawlers or user agents that are covered by this option?

Additionally I wonder why I see „Headless Chrome“ in the Browser dimension in reports, while the User Agent string would contain „HeadlessChrome“ without a bank. If I want to exclude those hits: which version would be the correct one to enter in the settings? I decided to do both just to be sure

By the way: Do hits from filtered bots / crawlers still count when calculating hit budgets?

best,
Markus

nila · August 1, 2022, 1:48pm

Hi there @kuba ,

indeed I see the same dubios ISPs as Marcus describes (“iCloud Private Relay” is another of them, it seems). Maybe something like a black list for ISPs could really be a good start to tackle that problem.

@mbaersch: thanks for your profound research !

kuba · August 1, 2022, 2:30pm

Thanks for the feedback. For some reason I missed this thread initially. @mbaersch if the traffic is filtered, it does not count towards the limits. As for the user agent exclusion, we match using the user agent string. I’ll check with the team if the list of crawlers could be publicly available.

mbaersch · August 1, 2022, 3:09pm

Hi,

please note that blacklisting iCloud Private Relay" would exclude real users on Safari with a paid iCloud subscription… masking IP and provider for those users is exactly that is intended with Private Relay. So wherever you find an exclusion list for ISPs, use it carefully to avoid false positives

best,
Markus

Topic		Replies	Views
Is it possible to have the user agent in report for bots/automate detection? Data collection and tags	4	472	March 21, 2023
Filtering Heatmap Data collection and tags	1	123	October 26, 2023
Managing Unwanted Traffic on Your Website Data collection and tags	3	238	March 28, 2024
Filter out certain traffic from reports Reports and working with data	2	169	October 24, 2023
Ignoring my own IP address not working Account	1	156	January 22, 2024

Add Filter by ISP?

Related topics