Skip to main content
The Daily Philadelphia

All of Philadelphia, every day

News

How Philadelphia's City Archives Ended Up With Thousands of Duplicate Images — and What Went Wrong Along the Way

A decade of fragmented digitisation projects, shifting vendors, and budget gaps left the city's visual record riddled with redundant files that nobody caught until the damage was done.

Share

By Philadelphia News Desk · Published 4 July 2026, 2:51 PM

4 min read

Updated 5 h ago· 4 July 2026, 10:32 PM

How we reported this

This article was generated by AI from the linked public sources. The Daily Philadelphia is independently owned and covers Philadelphia news free from advertiser or sponsor influence. Read our editorial standards →

How Philadelphia's City Archives Ended Up With Thousands of Duplicate Images — and What Went Wrong Along the Way
Photo: Photo by Ítalo Delani Lopez on Pexels

Philadelphia's Department of Records is sitting on a backlog of roughly 40,000 duplicate image files spread across at least three separate digital storage systems, according to city staff familiar with the remediation effort currently underway at the Municipal Services Building on JFK Boulevard. The problem didn't happen overnight. It is the product of nearly ten years of piecemeal scanning contracts, a mid-project switch in content management platforms, and at least two rounds of emergency budget cuts that stripped the quality-control staffing needed to catch redundant uploads before they compounded.

The timing matters because Philadelphia is in the middle of a broader push to modernise its public records infrastructure ahead of the 2026 municipal budget cycle. Duplicate files inflate storage costs, slow public-records search tools, and — critically — can cause confusion when archivists and researchers retrieve what they believe is a unique historical image only to discover it is one of several near-identical versions with conflicting metadata tags. For a city whose photographic holdings include building permit imagery from South Philadelphia rowhouse blocks, street-grid surveys from Kensington, and neighbourhood documentation going back to the mid-twentieth century, that confusion has real consequences for planning decisions and historical scholarship alike.

Where the Problem Started

The roots trace back to around 2016, when the city launched its first large-scale digitisation push under a contract awarded through the Philadelphia Water Department's facilities documentation programme and an adjacent initiative housed at the Free Library of Philadelphia's Parkway Central branch on Vine Street. Both projects used different scanning specifications and different file-naming conventions. When the city attempted to consolidate those collections into a single repository between 2019 and 2021, automated migration scripts pulled source files without deduplication checks. The result was a layered archive in which the same image sometimes exists in three formats — original TIFF, a compressed JPEG derivative, and a second JPEG generated during the failed consolidation — each tagged with different creation dates and different department codes.

A subsequent shift to a new content management system in 2022, part of a citywide technology modernisation contract, introduced a fourth layer. Because the legacy platform exported metadata inconsistently, the new system treated files it had already ingested as new assets when staff attempted manual re-uploads to fill gaps. Nobody had a single authoritative file manifest against which to check.

The Philadelphia City Archives, which operates under the Department of Records and maintains holdings at its repository on Broad Street, flagged the duplication issue in an internal review completed in late 2024. That review identified storage costs running at roughly $180,000 annually for the combined digital holdings — a figure that staff believe could be reduced by at least 20 percent once duplicate and derivative files are properly rationalised. The Archives has not released that review publicly.

The Path to Remediation

City staff began a structured deduplication project in early 2025 using open-source file-hashing tools adapted from a model piloted by the Temple University Libraries digital preservation team in North Philadelphia. The approach involves generating cryptographic hash values for every file in the archive, cross-referencing those values against a master index, and flagging matches for human review before any deletion is authorised. Deletion without human sign-off is a hard requirement, because not every duplicate is truly redundant — some apparent duplicates carry unique annotations or represent intentional format derivatives that must be retained under the city's own records retention schedule, last updated in January 2023.

Progress has been slow. The team working the project numbers fewer than five full-time-equivalent staff positions, and the July 4th holiday weekend has paused field work this week. Estimates from inside the department suggest the active deduplication phase could run through at least the first quarter of 2027 before a clean, consolidated image repository is operational.

For residents, the practical upshot is limited but real. Anyone requesting historical building photographs or neighbourhood survey images through the city's Right-to-Know portal may continue to receive inconsistently formatted files, or — in some cases — multiple versions of the same image. The department's records office has advised requesters to note in their submissions if they receive what appears to be a duplicate delivery, so staff can log it as part of the ongoing audit. That feedback loop, modest as it sounds, is currently one of the few mechanisms the city has for catching duplicates that automated hashing has not yet reached.

You might also like

Editorial picks

How did this story land?

Spread the word

Share

Have your say

Loading comments…

Sources

About this article

Published by The Daily Philadelphia

Covering news in Philadelphia. This article was generated by AI from the linked sources and was not reviewed by a human editor before publishing. See our editorial standards.

Spread the word

Share

See something wrong? Suggest a correction.

Daily brief

Enjoyed this? Wake up to Philadelphia news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Philadelphia and accept our Privacy Policy. Unsubscribe anytime.