How to track PDF and other document downloads/opens?

In this post we’ll discuss:

  • how. does Piwik PRO track document downloads/opens?
  • what are the limitations of the default document tracking?
  • what are potential workarounds to improve document tracking?

By default, Piwik PRO will track links that visitors click to download files on your website. This mechanism is based on detecting links to the files based on the extension (e.g. zip, pdf, doc, docx etc.) and introducing a small delay (500ms) after the visitor clicks on the link and before is redirected to the file.

For tagging other elements as files on the website, ignoring them or customizing extensions check out the developer’s documentation.

The above solution lets you only track clicks in the links to the documents on the website with Piwik PRO tracker installed.

How to track direct document downloads/opens that are coming from email/Slack or link pasted in the web browser?

In case the request to the file is made directly (that is not clicking a link on the website), Piwik PRO will not be able to track it, because there is no tracking tag executed. But there are workarounds to that.

Workaround 1: Create an HTML page with the document embedded on the page.

  1. Create an HTML file for specific file download, e.g. for “report1.pdf” file it could be named “report1.pdf.html”. Copy the below source code and set the values of the variables:
  • instance_url - point to your Piwik PRO instance, e.g. myinstance.piwik.pro
  • website_id - you can get your website ID by going to Administration → Choose a website from the list and the ID will be displayed in grey below website name
  • file_url - this is the full URL of the file that you want to track
<!DOCTYPE html>
<html>
  <body>
   <iframe id="document_preview" width="100%" height="1200px"></iframe>
   <script type="text/javascript">
    var instance_url = 'YOURINSTANCE.piwik.pro'; // e.g. example.piwik.pro
    var website_id = 'YOUR_WEBSITE_ID'; // e.g. c971334f-c735-46f2-b7a9-4672cf54584e
    var file_url = 'YOUR_FILE_URL'; // e.g. https://example.org/path/to/report2.pdf
    document.getElementById("document_preview").src = file_url;
    var _paq = _paq || [];
    _paq.push(['trackLink', file_url, 'download']);
    (function(p,i,w,ik) {
        var g=ik.createElement('script'),s=ik.getElementsByTagName('script')[0];
        _paq.push(['setTrackerUrl', p]);
        _paq.push(['setSiteId', w]);
        g.type='text/javascript';g.async=true;g.defer=true;g.src=i;s.parentNode.insertBefore(g,s);
    })('//'+instance_url+'/ppms.php','//'+instance_url+'/ppms.js',website_id,document)
   </script>
  </body>
</html>
  1. Place the HTML file (e.g. report1.pdf.html) in the directory of your webserver, e.g. https://example.org/report1.pdf.html

  2. Use the above link instead of the direct PDF file link in your email or when sharing the file.

  3. The file will be loaded in the iframe and displayed to the visitor while we will be able to track the document open.

Workaround 2: Create an HTML page that will automatically start download of the file.

This is a very similar method to Workaround 1, but instead of displaying the document in the browser, it automatically starts the download of the file.

If you want to start download of the file for the user, modify the HTML file from Step 1 to:

<!DOCTYPE html>
<html>
  <body>
  <script type="text/javascript">
    var instance_url = 'YOURINSTANCE.piwik.pro'; // e.g. example.piwik.pro
    var website_id = 'YOUR_WEBSITE_ID'; // e.g. c971334f-c735-46f2-b7a9-4672cf54584e
    var file_url = 'YOUR_FILE_URL'; // e.g. https://example.org/path/to/report2.pdf
    setTimeout(function() { window.location = file_url },1000)
    var _paq = _paq || [];
    _paq.push(['trackLink', file_url, 'download']);
    (function(p,i,w,ik) {
        var g=ik.createElement('script'),s=ik.getElementsByTagName('script')[0];
        _paq.push(['setTrackerUrl', p]);
        _paq.push(['setSiteId', w]);
        g.type='text/javascript';g.async=true;g.defer=true;g.src=i;s.parentNode.insertBefore(g,s);
    })('//'+instance_url+'/ppms.php','//'+instance_url+'/ppms.js',website_id,document)
  </script>
  <p>Your download should start shortly. If it doesn't,
  <a href="#" onClick="window.location = file_url" download>click here</a>
  </body>
</html>

Remember to set instance_url, website_id and file_url variables to correct values identifying your Piwik PRO instance and file that you plan to track.
Steps 2-4 remain the same as in Workaround 1, with the difference that instead of displaying the file, the download of the file will start automatically.

Workaround 3: Use the Web log files import .

You can use the HTTP server log files which contain all requests for files from your web server. Please note that this method produces much more limited data set compared to JavaScript tracking, so you may want to process only the log files related to file downloads while keeping the visitor behaviour tracking via JavaScript tag.

Here is full documentation of the web log importer:

IMPORTANT: The data imported from web log files will not be merged into sessions tracked by JavaScript tag.

You can find more information about downloads report here.

Workaround 4: Server-side tracking API.

Use the REST Tracking API to track downloads of the files on the server side (or for all visitor tracking).

This needs to be implemented in the endpoint of your web application that is serving the files for downloads (if you have one). The endpoint will have to call Piwik PRO’s API to register the download.