The friendly python: Applying automation to media archive management

Blogs and Articles

Ordinarily, data mining requests like these would require multiple team members to spend hundreds of manual labor hours reviewing or tracking thousands of data points. But using Python, we’re able to automate requests like these to easily and efficiently deliver exactly what the client needs, at a tremendous savings of time and money to the client.

Bret Shefter
Bret Shefter
Software Engineer | Iron Mountain Media & Archive Services
April 30, 20217 mins
The Friendly Python: Applying Automation to Media Archive Management

A recent client in the music industry wanted to transfer hundreds of assets to Iron Mountain Media and Archive Services consisting of PDF documents from the 1990s. Their operators recorded painstaking details whenever they restored an audio recording, creating a PDF for each recording, with a few dozen different fields. These fields included details about the original recordings and the restoration process, such as the artist’s name, date, producer, original audio engineer, original sound levels, tape condition, restoration engineer, restoration levels, and even specifications regarding the baking temperature, if needed.

As we dug into the project, we realized that some of these records originated from another company, and they were so old that the PDFs didn't even have fields: they were just single images with text in them. The client needed all of this information transferred into one massive Excel spreadsheet that could then be ingested into a database like FileMaker, so that their entire company would have access to the content. 

Ordinarily, scenarios like these would require multiple team members to spend hundreds of manual labor hours reviewing or tracking thousands of data points to complete the customer’s request. But using Python, we’re able to automate data mining requests like these to easily and efficiently deliver exactly what the client needs, at a tremendous savings of time and money to the client.

Python is a scripting language with which we can create customized automation, accomplishing tasks like the client scenario above, in a fraction of the time it would take to do it by hand. Task automation uses scripts—a set of instructions performed on a computer system—to instruct a computer to do almost anything a human data analyst can do, but much faster and without the possibility of human error.

When setting up Python to solve this client challenge, we created two separate automation processes. We wrote one Python script to read swiftly all of the data from every field in each PDF and then save it all in the database exactly as it appeared in the original document, instead of employing a crew to open each individual PDF and manually type in all the data. For most of the PDFs, this script was able to read the information from the internal fields and assign them to spreadsheet columns based on the PDF field names. Second, to address the older PDFs—the ones that were just a single flat image—we created another Python script using OCR (Optical Character Recognition) to scrape the text from the image, then parsed that text into the appropriate categories.

In both cases, some fields required additional processing to ensure they were in the specific format the client needed, which the automation also handled. And get this: the client is still using the automation today! The script is able to read the information from the internal fields and assign them to spreadsheet columns based on the PDF field names.

Since scripts can also send commands to control hardware, Media and Archive Services can process assets around the clock on behalf of customers. Automation also enables Media and Archive Services to process clients’ assets at a remarkable rate. Python enables us to work smarter and faster, so that we can service more customers with a more diverse set of needs. Media and Archive Services has created its own internal script repository, and services customers with automated scripts at three locations: Hollywood, CA, Boyers, PA, and Moonachie, NJ.