How to anonymize data from URL

Some of our URL’s contain information that could be considered personal information.
When using standard page view analytics this can be an issue as we want to track non personal data without consent.
I can’t seem to find how to anonymize this standard data being collected.

Anyone knows how to deal with this?

Example to clarify:
Our url’s look like:
https://domain.com/en/account/[someidentification]/products/

When collecting page views or custom events the action name contains this identification and the page view contains the full URL, I’m looking for a way to substitute this with non-meaningful data.
Can this be done?

To be on the safe side, I’d recommend anonymizing it on the tracking level. You can do it by using a custom tag that is triggered on each page view:

<script>
var _paq = _paq || [];
_paq.push(["setCustomUrl", window.location.protocol + "//" + window.location.host + window.location.pathname + window.location.search + window.location.hash]);
</script>

Here you have all the elements used to build the URL. You can further extend it with some inline replace (e.g. based on regexp).

Unfortunately we’re in a react setup with some dynamic content, we already had to go back to logging page views manually using your react library to avoid having duplicates etc.

Even when implementing custom events there’s always some field showing the url on which things originated so actually leaking the personal info.
I feel there’s a need to be able to anonymize parts of the URL, reading the above solution I’m afraid there’s no such way?

There’s no way to do it post-factum for now. Also, gathering PII and removing it afterwards isn’t for sure a viable solution. But I fully understand the need and will fill in a feature request for some data governance solution that would change some event values before storing them in our backend.

Getting back to what I suggested - I guess my example would also cover all custom events triggered after the URL has been amended.

1 Like