march 17, 2022
There are metaphors, the metaverse, and companies owned by Meta – and then there’s metadata. Metadata is data that provides information about another piece of data. In the music space, metadata contains critical information about ownership and can be the difference between millions of royalties, or millions of missed opportunities. Without proper attribution, labels, publishers, artists, and other rightsholders can’t be accurately compensated, or get credit for their work when it’s used by others. At Pex, our mission is attribution for all, so we care deeply about metadata.
For audio and video content, metadata can include information such as title, recording artists, songwriters, record label, release title, International Standard Recording Code (ISRC), and other identifiers. These data points all compile to create a source of truth which can be referenced any time the audio or video file is used. We use this metadata in conjunction with our advanced content identification technology, to help verify the true owners of a piece of copyrighted content and to bring proper attribution and monetization to those owners when their content is used by others.
The ability to share and confirm this metadata is essential for a fair Internet. However, tracking and verifying ownership information can be a challenge for multiple reasons:
- The transfer of data leads to inaccuracies
- Ownership changes hands often
- Most copyrights go unregistered
- Content goes viral quickly and determining the true owner can sometimes be impossible
Thankfully, with a combination of technology, machine learning, and human experts, we can improve metadata processes and make determining ownership easy. When it comes to attribution for all, our greatest tool is accurate and complete metadata.
Pex’s approach to metadata
At Pex, we believe all metadata is good metadata, but this doesn’t mean all metadata is clean metadata. Simply put, all possible sources of metadata for a given industry should be weighed and considered, to determine which are most valuable.
We’ve adopted this philosophy in our approach to data sourcing. For our music metadata, we take a holistic view of the available data sources and constantly take steps to vet and research what is most valuable. We place high value on datasets that wildly intelligent data experts across the music industry have worked hard to develop. Even if deemed inadequate, knowing a data source isn’t valuable provides benefits in other ways, such as showing us what dirty metadata looks like for certain songs, and where we can expect it.
Leveraging good metadata
What happens when even basic metadata, such as a song’s title, differs across various data sources? In most cases, nothing. “Tubthumping” on Spotify is enjoyed by an individual user, and the same song on Apple Music is enjoyed by someone else, even if it is incorrectly spelled as “Tubstumping.” These are isolated user experiences; a listener has little reason to care that the track title on their phone is incorrect so long as they can find what they are looking for.
However, if we put these two song titles into a product that demands accuracy – such as our Attribution Engine – a problem arises. We can’t presume to have an authoritative record of the hit Chumbawumba track with both “Tubthumping” and “Tubstumping” titles in our dataset – we need a unique track title. Attribution Engine needs the truth, so we developed a process for determining the most true metadata.
Pex seeks to define an authority by scoring each dataset we source metadata from. We rely on both subject matter experts, as well as machine learning to score datasets. If we have a reason to think that one streaming service’s data on Chumbawumba is more accurate than another’s data, then the trusted source is scored higher. This may seem like a simple “this or that,” but Pex takes it a step further by scoring each data point from a source individually. This level of granularity combined with the numerous datasets we source from is what makes the Pex Registry so high quality.
The music metadata that matters most
While every piece of metadata has potential value, the critical bits provided directly from rightsholders are identifiers and ownership. With these two pieces of data for every registered asset (and the accompanying digital fingerprint), Attribution Engine can effectively identify matches of content and map the content back to its owners – bringing attribution to the Internet in a way that isn’t possible without centralized and vetted metadata. This metadata and identification process enables control and insights for rightsholders who are often left in the dark about their content.
Crafting an honest and reliable metadata future
As a digital rights technology company, we are building toward a world where only a single identifier, such as an ISRC or a digital fingerprint, is required for the registration and attribution of content, where our intelligent systems will recognize this identifier, apply the clean and accurate metadata we have available, and deliver it to a user. Similarly, we are building toward a world where composition rightsholders have a stronger and more reliable link to their recording counterparts. A significant chunk of our work here is going into our melody matching efforts, but metadata can help as well.
We’ve developed a way to have a bird’s-eye view of the music metadata landscape while simultaneously using our aggregated dataset for confirmation of what metadata is “most true” at any given moment. With this solid foundation, we can strengthen our matching models and provide confirmation of the linked sound recordings and compositions our melody matcher supplies. We anticipate our machine learning models will form links between recordings and compositions by using just the raw metadata available for each entity. These incredibly valuable predictions will help composition rights be connected to recordings where never before possible.
Metadata shouldn’t be a burden on anyone, but instead a service to everyone in recognizing the people who worked so hard to create the content we all love and share. Our metadata and approach to metadata gives rightsholders another tool in the fight for transparency and proper attribution.
written for pex