Philadelphia's cultural institutions are scrambling this week after an internal review flagged thousands of duplicate and mislabeled photographs sitting inside publicly accessible digital archives — a problem that has quietly undermined search results, grant applications, and historical research for years. The push to clean up those records is now accelerating, with at least three major organizations announcing concrete steps before the end of the summer.
The issue gained traction locally after the Philadelphia City Archives, based on Broad Street, circulated a working memo to partner institutions in late June outlining the scope of the problem inside its own holdings. Staff identified more than 4,200 image records flagged as potential duplicates within a single collection catalogued between 2018 and 2024. The memo recommended adopting a standardized deduplication protocol before the city's next fiscal review in September.
What Happened This Week
On July 1, the Free Library of Philadelphia confirmed it had begun running its own audit across the digitized collections housed at the Parkway Central branch on Vine Street. Library staff are using open-source image-matching software to cross-reference roughly 80,000 photographs held in the Philadelphia Photo Arts Center partnership collection. Preliminary results, shared at a Tuesday morning stakeholder call, suggested a duplication rate of somewhere between six and nine percent — meaning potentially thousands of individual records pointing to the same underlying image file under different catalog numbers.
The timing matters. Several Philadelphia institutions are mid-cycle on federal grants tied to digital preservation standards set by the Institute of Museum and Library Services. Duplicate records can inflate collection size figures reported to funders, a discrepancy that carries compliance risk. The IMLS awarded Philadelphia-area organizations more than $2.1 million in grants during fiscal year 2025, according to publicly available IMLS grant data, and clean metadata is a core requirement for renewal applications.
The Philadelphia Museum of Art, on the Benjamin Franklin Parkway, separately confirmed this week that it completed the first phase of its own deduplication project, covering approximately 12,000 images from its prints and drawings department. A museum spokesperson said the process had been underway since January but that results were being validated before any public announcement. No figures on how many duplicates were found were released publicly as of Friday morning.
A Practical Fix — And What Comes Next
The solution being discussed most actively across institutions is not technically complex, but it is labor-intensive. Staff must reconcile duplicate entries manually when automated tools flag ambiguous matches — two photographs taken seconds apart at the same location, for instance, or multiple scans of a single print at different resolutions. Archivists at the Historical Society of Pennsylvania, on Locust Street in Center City, say that kind of human review is the bottleneck. The Society has posted two temporary cataloging positions at $22 per hour to help clear the backlog before October.
A broader coordination effort is expected to take shape through the Greater Philadelphia Cultural Alliance, which has offered to host a working group meeting later this month. The goal would be to agree on a shared metadata standard — likely Dublin Core with local extensions — so that institutions exchanging image records don't reintroduce duplicates through data imports.
For community researchers, genealogists, and journalists who rely on these collections, the near-term advice from archivists is straightforward: when searching digital portals, cross-reference any image's catalog number against the holding institution's contact desk before citing it, since some records currently visible online may be consolidated or removed as the cleanup proceeds. The Philadelphia City Archives desk can be reached directly at its Broad Street office for confirmation on specific records.
The July 4th holiday slowed some of the interagency coordination this week — several outdoor meetings were cancelled or moved indoors after extreme heat kept city workers off the streets — but staff at the Free Library said the internal audit work continued uninterrupted. Formal recommendations are expected before Labor Day.