What are upload filters?

Sven Krumrey

The European parliament recently voted on a major reform of EU copyright law. Henceforth, new laws are supposed to provide better IP protection for copyright owners. Upload filters are particularly controversial, as they'll analyze, and potentially block, videos, songs and images during the upload process, if they deem them in violation of intellectual property rights. All over the world, copyright owners, like movie makers, musicians and authors, have been following the discussion and now feel their finest hour has come. But what are upload filters, how do they work - and why are they so controversial?

This is what the internet will be like

First of all: Politicians generally try hard to avoid the very term because it it so unpopular. It raises Orwellian fears of a surveillance society, and no party wants any part of that. So they prefer "content recognition technologies", implemented, you guessed it, through upload filters. Avoiding the mention of the bad word serves to not frighten citizens / voters, even though the technical implementation is virtually identical. Makes sense! From now on, every upload (to Facebook, Instagram, Twitter, YouTube etc.) is to be scrutinized for potential copyright infringement. Profit-oriented websites older than 3 years must comply or face a fine.

But effectively preventing uploads of copyrighted material requires comparing the information against a database to determine cause for suspicion and, if present, then having a capable human review the data. Currently, that's an impossible task, simply because of the vast extent of uploaded data (1 billion hours of video content on YouTube alone every day). First, a database containing samples or, for practicality, hash values based on all copyrighted material will have to be created. Hash values are unique and represent the "essence" of a file distilled into more manageable bits of information. Once in place, any and all uploaded files will also have to be distilled into hash values which will then be pattern-matched against the database. Positive matches will result in the affected file upload, e.g. a video, being prevented.

Where things get tricky is when uploaded files aren't 1:1 copies but slightly modified facsimiles. Let's assume I decided to upload a recent Hollywood blockbuster to YouTube. It should be fairly obvious to anyone that I'd be infringing on copyright law and would be facing the owner's resolute veto. But what about a parody? If I were to take only portions of the video, re-edit them and add effects, comments and music to create new and, hopefully, humorous content? The hash value of my work would differ from that of the source material, so pattern-matching would yield a negative result. Parodies, innuendo and fan-edits make up a large portion of the cultural backbone of the web. In these instances, algorithms would have to detect, and understand, the minute differences between IP theft and satire. That's where the system falls apart. Artificial intelligence, however intriguing and versatile it may be, is totally devoid of humor and higher cognitive functions, which enable us humans to grasp subtle nuances.

Artificial intelligence doesn't really get us Artificial intelligence doesn't really get us

Let's look at another example: Imagine you bought a neat LEGO model but now have lost interest and intend to sell it on eBay. You take a photo of the box to and try to upload it. eBay's upload filter then compares your photo to whatever data LEGO provided - and blocks it due to similarities. Nobody will ever see it. What a bother! In the future, uploads to Facebook involving screenshots of a newspaper excerpt, a caricature or a GIF based on a movie may never make it past their automated filters. No matter whether your intent is to spread humor, convey information or educate people, filters will be a major pain and block what should not be. And since portals will be held directly accountable, they'll likely pull out the big guns and overshoot the mark by a mile. Whatever remotely resembles illegal content will be blocked, after all, the next lawsuit from a copyright owner is always just a few clicks away. Critics therefore expect large-scale overblocking to rather be the rule than the exception.

Normally, newspaper articles fall under copyright protection. But how would Facebook even know the excerpt from the current article of a provincial paper you're trying to upload? Nothing is older than yesterday's newspaper, that's why only current news are shared online. Does that mean all publishers have to constantly upload their current editions to keep databases up to date? I guess so. How else would online portals be able to detect violations? Excerpts are an even tougher nut to crack and would likely require automated up-to-the-minute optical character recognition. The alternative would be algorithms that block anything and everything that bears even the most minute of resemblance to actual articles, shooting down your local club paper, historical editions and your neighbors' wedding newspaper in the process. Educational sites, like Wikipedia, might also be in serious trouble, since their articles are chockfull of quotes, images and texts, that usually fall under the fair use provision, but might now have to be reevaluated to steer clear of major lawsuits. Agnostics are already predicting the intellectual impoverishment of the internet.

Live streams are another tricky issue. Will gamers be allowed to continue streaming their gameplay live? The games themselves are copyright-protected, but, from a technical standpoint, the question of how to monitor live streams effectively has, so far, not been definitively answered. And AI isn't ready to come to the rescue just yet. YouTube have been working for years to implement automated object and face recognition, but their filters operate on a very low and rudimentary level, with ample scattering and plenty of vagueness. Humans are falsely recognized as nude, classic renaissance works of art are deemed pornographic and, at times, dark-skinned people are identified as apes. And these filters are supposed to reliably police online uploads from now on? That seems a bit of a stretch.

A license to silence detractors A license to silence detractors

Another aspect, especially criticized by privacy groups, is that upload filters are the perfect means to implement full-scale online censorship. Sure, laws don't mention it, but the technical means will be available as soon as the reform is implemented. How hard will it be for countries to resist the temptation to keep close tabs on their citizens and dispose of unwelcome opinions straightaway? It'll certainly be very tempting not to silence political adversaries once access to every uploaded file is guaranteed. An image depicting government-opposed content is gaining popularity? A slight tweak to the upload filter will take care of it! A nightmare scenario, indeed!

At this time, readers outside the EU might think themselves blissfully safe from this latest EU-driven attack on the unregulated internet. Good for them! But don't celebrate just yet. US and other international lobbyist have done their part not just in Europe but across the globe to see similar laws passed all over the world. There's no denying that copyright laws are out of sync with the internet and in need of improvement. However, the solutions differ greatly from country to country. I'm hoping you'll be able to enjoy a mostly free and unregulated internet for years to come, wherever you may live!

What I would like to know: Should politicians have a say on matters they obviously know nothing about?

Back to overview

Write comment

Please log in to comment