Hello! I have likely a bot related issue with data limits. Would it be possible to obtain the log data from the site? The data limits are already 100% full for October because of broken events (over 300 000 in October, normally about 6000 per full month). Last night 20 % of data limitation was used, but I did not locate any broken events from prior 6 hours, which makes it hard to spot the issue. Thank you!
Hi there!
Unfortunately, there is no way of exporting these past broken events if they are not available in the tracker debugger. What I would recommend is observing the debugger frequently to catch broken events. We have an API for the tracker debugger, which you could use to get the broken events automatically, for example.
Hi,
you could try my solution for the API that I recommended in another thread that sounds as if there is a bigger problem:
But seeing events is one thing, doing something about it might still be a task for Piwik PRO as those events seem to be sent towards the endpoint directly without a website in between. Maybe @sararekowska can reach out internally to start a broader inspection of what is going on there.
When you can share any details about the events you might find using the API, it would help to pinpoint what might be the cause and / or intent behind those requests.
best,
Markus
Hi,
I also agree this raises concerns and requires for urgent actions from Piwik PRO.
The instance’s data collection is rather small (uses approximately 25 % of data limitation per month). We found that a Yandex bot responsible for around 500,000 requests. As a precaution, we blocked “Yandex” and any variations of that name, along with other bots identified in the logs, including Bingbot, Baidu, and Comscore, even though their impact was minimal. This adjustment will ensure that these requests are categorized as “Excluded,” regardless of any potential issues they may cause.
The issue is that we were unable to block the bot ourselves because it did not appear in the debugger, as the broken events requests were coming in without parameters. By the time we got the logs, the data limit had already been exceeded.
The situation is extremely stressful for us and the clients to whom we recommended Piwik Pro over GA4, mainly because of its strong support and privacy-first approach.
Hi @mbaersch !
I’m really happy to be tell that we’re able to solve the issue by blocking the bots via the robots.txt file through WordPress. Now it seems that the data limit is no longer being exceeded, so I believe we’ve got the bot issue under control. Thank you for Piwik support also for helping us - a thousand times over – this was truly excellent service from Piwik.
Hi,
good to know that the website indeed was part of the attack and not directed towards the instance tracking endpoint directly. Could you share some details about how tracking was involved? Did the attacks load regular pages or was there active tracking code on 404 pages? I am just trying to collect some information for others who might run into the same problem.
best,
Markus