We live in a world where all digital systems are powered by metadata. This metadata is everywhere, and sometimes in places you don’t even expect.
Metadata has been with us since the first librarian made a list of the items on a shelf of handwritten scrolls. The term "meta" comes from a Greek word that denotes "alongside, with, after, next,” and that it comes often in two forms: elements in a record alongside the item, or embedded within the item itself—The Dublin Core Metadata Initiative
Sometimes this metadata is visible to the end user for search, but often, there’s a lot of ‘behind-the-scenes’ metadata as well; metadata that provides valuable business information about rights management, asset creation and distribution, preservation, provenance, authorship, and more.
Metadata is the driving force behind digital asset management, but not everyone understands how embedded metadata can be utilized to help with DAM usage. Yet there are many freely-available tools such as Phil Harvey’s EXIFTool or Python scripts to edit, apply, and manage metadata to help improve the overall DAM experience.
The embedded metadata exercises in this post can be applied to a number of different goals, so think of them as building blocks for your own embedded metadata workflows.
There are many different types of metadata, most often broken out into three main categories: descriptive metadata, structural metadata, and administrative metadata. Metadata can live within files as embedded metadata actually stored in the file itself, or it can also live outside of the file.
Sidecar metadata: A separate file that describes another file. It’s not attached or embedded in the file itself, but rather, it has a relationship to the file. Examples of sidecar metadata include .xmp files and .xml files
Platform / tool-specific metadata: Examples of platform or tool specific metadata pertains to metadata inside a particular system, for example, a DAM platform
On page markup / structured data / webpage markup / semantic web data: Structured data is essentially metadata that is used to describe elements on a webpage. Structured data extends to describe files as elements on a web page as well, and can therefore also be considered metadata that relates to digital files that are published on the web.
Embedded metadata lives inside of the resource or item that it describes. It can come in the form of a variety of different namespaces from Dublin core to custom XMP namespaces for your own company. A namespace is “a logical grouping of metadata terms. Namespaces allow unique identification of metadata terms to allow those terms to be unambiguously used across applications.” (National Archives of Australia Glossary)
This information can be applied in a variety of ways, including from within information systems you use on a daily basis to command-line interfaces to custom applications specifically designed for embedded metadata application and viewing. Note that the exercises later in this article will only focus on EXIFTool and Python, even though there are a wide variety of metadata tools available.
Some types of embedded metadata are created automatically by applications, while others, require user input. When you fill out information a “File Info” panel in an application--you are usually writing to the embedded metadata of the file itself.
We’re pretty wealthy when it comes to the quantity and quality of metadata standards available today. Jenn Riley’s metadata universe can give you a clear understanding of the quantity and different types of standards out there on the market today, but many are domain-specific.
Widely used standards that you’ll most commonly find while working with digital asset embedded metadata include XMP, IPTC, EXIF, and Dublin Core (Please note this is not an all-encompassing guide to metadata standards...if you need more information check out Jenn Riley’s Understanding Metadata primer from NISO.)
It’s not required to know every standard (or easily doable) In fact, in Metadata Principles and Practicalities, the authors write that application profiles allow mixing and matching schemas, because there is no one schema that perfectly meets functional requirements in all systems. You’ll find that application profiles are quite common in DAM systems.
The following exercises will only focus on EXIFTool and Python, even though there are a wide variety of metadata tools available. So now that you are familiar with types of metadata and embedded metadata, the following exercises will prove useful when working with digital files, including how to:
So to get started, you’ll need to download the following:
Just getting started with terminal or command line? Learn the Command Line at Code Academy.
This exercise is useful for exploring what types of data are already in a file, its origin and creation, and opportunities for enrichment.
Steps to reproduce this exercise yourself:
1. Open up Terminal or a Shell window and cd to where your files are located. Example:
2. Type the command “exiftool yourfilename.jpg or path to file, i.e.
exiftool /Users/EmilyKolvitz/Desktop/Foldername/ to scan the entire folder itself.
*hint* use -X if you want to see namespaces like this:
exiftool -X /Users/EmilyKolvitz/Desktop/Foldername
3. Take a look at the data available for the file. Example:
This exercise will help you to manipulate and enrich metadata for large sets of digital assets. It’s a batch-process for automating repetitive, time-consuming data application.
exiftool -artist="Bynder Photographer" -copyright="2017 Bynder B.V." thenthefilename.jpg
(or the path to the folder instead of filename if you want to apply to all files in the folder)
Result: The fields “Artist” and “Copyright” were updated on that file with new values:
This exercise will help you migrate data and files from one system to another. Let’s say you have files on a local shared network drive and you want to upload them to a DAM repository. You may want to export all associated metadata to a .CSV file for a batch import along with assets, and apply additional metadata to the record for the file before ingesting. You may also want to check the data for quality before uploading it to a new system. Remember: good data in = good data out, and garbage in = garbage out.
Output: This command makes a handy spreadsheet of all the embedded metadata
Let’s say you don’t have a file-naming convention. This exercise can help you to create a file-naming convention from already existing metadata inside the file itself, or even just do some simple stuff like put the file dimensions in the existing filename.
This exercise is useful if you are combining data from disparate systems into one csv to enrich digital files. A prime example of this would be marrying product data to digital files, where you have an export of product data from one system without specific filenames, and an export of digital files and associated metadata from another system without the product data.
If you have SKU in your file-name, it’s easy to combine these two sheets using that common shared value, resulting in enriched digital files in an automated fashion.
Let’s say you have one spreadsheet with product data, including SKU, but no associated files. You also have one spreadsheet with filenames and a SKU. You can make one spreadsheet so that this data can be used to enrich digital assets before ingesting to a DAM system, or if you want to apply to the files as embedded metadata (as demonstrated in the next exercise.)
An example of the sheet with no filenames, but with a lot of product data:
An example of the sheet with filenames and SKU, but missing the rest of the product data:
Here is the result when you ‘marry’ them together with a simple script:
And this script is very simple:
Make sure your CSVs have at least one matching key/field. In this example, both of these CSVs have the field ‘sku.’ We will merge these files based on that common key using the ok.py script.
Now, after completing Exercise 5 you may want to apply this data to a big batch of files. For some DAM systems, you’ll be able to ingest the spreadsheet along with the files to apply the metadata in the system itself, but maybe you also want to populate the embedded metadata on these files and have it map automatically instead.
To write this data to your files, you’ll need to pull out EXIFTool again:
1. Change the name of the ‘filename’ column in your CSV to ‘SourceFile’ and move it to the first column in the sheet
2. Update your Exiftool Config file with your custom namespace (name it whatever you feel is suitable) and xmp tag names
In this example, two quick edits have been made to the config file found at the EXIFTool site:
Edit 1: Adding ‘bynder’ after UserDefined on line 126:
Edit 2: Adding the namespace ‘bynder’ again as well as custom tags from lines 245-259:
3. Save your file. Then you’ll have to move your config file to your directory where EXIFTool is run (rename it as you move it, i.e. mv /Users/emilykolvitz/Desktop/config /Usr/local/bin/.ExifTool_config )
Now that’s done, ready to check it out?
So you’re now able to add your own custom namespace and user-defined tags to your digital assets.
Finally, you may wonder how this information can be ingested into your DAM. DAM tools differ in how they are configured and set up, but with many of them you can specify what data fields you’d like to map to when ingesting files, and also what data you’d like to export when downloading files. In most DAM systems, you should be able to:
It’s also important to note that in some DAM systems, certain fields are mapped automatically without configuration on the part of the administrator, whereas some fields need to be configured to map correctly.
Example of mapping Dublin Core rights (dc:rights) upon ingest and also export:
Sometimes you may want to use embedded metadata, and other times using embedded metadata is not the standard practice (such as on e-Commerce websites.) It can be tricky sometimes to figure out when to use embedded metadata and when to avoid it.
You should use embedded metadata when:
You shouldn’t use embedded metadata when:
Finally, one really useful resource about embedded metadata is the embedded metadata initiative that is definitely worth checking out. In 2016, they did an extensive survey of social media sites to see which ones preserved and maintained embedded metadata during ingest, or upon download, which you can view here: http://www.embeddedmetadata.org/social-media-test-results.php.
Do you have any other ways to supercharge your work with embedded metadata workflows? If so, we’d love to hear them via chat!