Keywords are often ambiguous. Here’s how AI understands subtle differences in concepts—and recommends content—based on its learnings.
The English language is full of ambiguity, which makes it difficult to automatically understand and process asset metadata across our various Shutterstock sites.
Let’s look at an example. If a digital asset has the keyword “amazon,” it is unclear on its own whether it is referring to the river, the company, or the mythical group of female warriors. In the Shutterstock image catalog, we have examples of all three of these things.
When cross-licensing assets from different sites, e.g., recommending music to go with videos, it is important to understand the contextual differences between all three of these senses of the same keyword. We would not necessarily want to recommend a Brazilian indigenous song for a video of packing robots in a warehouse.
This example shows the importance of having a unified taxonomy of concepts. This taxonomy distinguishes between the various meanings of a keyword in order to fully understand the terms associated with our digital assets.
Luckily, an extremely extensive crowdsourced taxonomy is publicly available for use, called Wikidata. It basically acts as a broader information layer above Wikipedia that allows machines to understand the relationships between unambiguous concepts, called entities.
At Shutterstock, we use machine learning to link assets and metadata to concepts from Wikidata. This allows us to understand our assets at a deeper level. It also enriches Wikidata’s metadata with all of the implicit, publicly available knowledge about the entities found in an asset.
Based on these links to Wikidata, let’s look at how we enrich our understanding of the third asset—a photo of the Brooklyn Bridge—with crowdsourced knowledge. For example, asset linking allows machines to understand a ton of information about the Brooklyn Bridge, such as:
- It is a suspension bridge designed by architect John Augustus Roebling.
- GPS coordinates for the Brooklyn Bridge are 40°42’20.4″N, 73°59’46.8″W.
- It’s located in the United States of America, in New York City, between Manhattan and Brooklyn.
- When written in Persian, it appears as پل بروکلین in Persian. We also know its translations across several other languages. This is of crucial importance, as it helps with localization of website features that display labels of these entities.
Let’s take one more step back with the bridge example. Considering all of this highly-specific knowledge of bridges, our catalog contains and understands many other kinds of bridges as well, such as:
Sharing Trends and Supply-Demand Information with Contributors
This is just scratching the surface though. This technology provides insights to contributors, clueing them in to demand for certain topics in photography, design, and videography. In turn, they can make more informed decisions about the subject matters of their future work.
Contributors can ask deeper questions, like:
- What types of entities are highly licensed, but undersupplied, in terms of contributor uploads?
- What type of entities are oversupplied and receiving very little licensing?
- What entities are currently experiencing atypical amounts of interest?
Beyond this, we can also reveal interesting licensing trends within our catalog. A good example of the last bulleted item (atypical amounts of interest) is the entity “Flag of Ukraine” during late February 2022. Interest in this particular image skyrocketed. Alongside this spike of interest in the Ukrainian flag, we saw similar spikes for images associated with both “Natural Gas” and “Petroleum.”
Other long-term correlations are extremely interesting, such as the relationship between Covid-19 and remote working concepts:
Finally, we can also examine periodic interests, such as the Olympic Games, and World Cup. Note the relatively higher interest in Olympics 2016, followed by lack of interest in 2020, since it was postponed until 2021!
AI Provides Opportunities to Cross-License Content
As mentioned earlier, it is important to recommend contextually relevant assets across different media types. When a customer purchases a photo of the Amazon River, we need to recommend rainforest sound effects, Brazilian music, video clips of river cruises, and more. Currently, Shutterstock is experimenting with such systems, all based on entities that use Wikidata knowledge to graph links between various multimedia assets.
For example, if someone purchases a video of a Nova Scotia Duck Tolling Retriever in the water, we want to recommend sound effects relating to dogs and water splashing noises. By handling ambiguity at the taxonomy level, we can mitigate issues that arise, and recommend the most relevant content for our customers. We hope to soon be able to expose these tools and insights to contributors, as well as develop further use cases for this technology.
License this cover image via MJgraphics.