Philadelphia's effort to digitize its historical photography collection has hit a wall. Thousands of duplicate images — some catalogued two, three, or even four times under different file names — have accumulated inside the Philadelphia Digital Archives system managed by the City Archives office on Calder Way, slowing public search tools and drawing complaints from researchers, community historians, and city planners who rely on the database daily.
The problem matters now because the city is at a decision point. A five-year digitization contract with the vendor handling the bulk of the scanning work expires in the fall of 2026, and any structural fix to the duplicate image problem must be built into the next procurement cycle or it risks being locked out for another half-decade. Meanwhile, the Free Library of Philadelphia's Parkway Central branch — which cross-references the Archives database for its own Local History and Genealogy Room — has flagged the redundancy issue to city council staff as a barrier to a joint public access portal the two institutions have been trying to launch since 2024.
How the Backlog Built Up
Duplicate images accumulate for mundane reasons: different departments scan the same source photograph independently, batch uploads from neighborhood organizations like the Preservation Alliance for Greater Philadelphia bring in files that already exist in the system, and legacy migrations from older storage formats sometimes created multiple records for a single image. Staff at the City Archives estimated internally — in documents shared at a March 2026 public records committee meeting — that roughly 12 percent of the estimated 280,000 digitized images in the current collection are flagged as probable duplicates. That is not an unusually high rate for collections of this size, but it becomes a practical problem when the search interface surfaces three near-identical photographs of the same 1960s Kensington Avenue streetscape and users cannot tell which carries the authoritative metadata.
The Preservation Alliance, which donated a significant portion of the neighborhood photography currently in the system, has been working with archivists to develop a deduplication protocol since late 2025. The process involves matching image hashes — unique digital fingerprints generated from file content — against the existing catalogue, then flagging pairs for human review before any record is deleted. That human review step is where the bottleneck sits. The Archives currently has two full-time digital asset staff assigned to the project, a number that preservation advocates say is insufficient for a backlog of this scale.
The Decisions That Will Define the Next Phase
Three choices face city officials between now and the end of September. First, whether to allocate additional staffing resources to the deduplication review — a budget question that lands with the city's Office of Arts, Culture and the Creative Economy, which oversees the Archives. Second, whether the next digitization contract will include mandatory deduplication protocols as a vendor requirement, something that was absent from the 2021 contract. Third, how the city handles the legal and curatorial question of deletion: city archivists must determine whether duplicate records that contain slightly different metadata — a different neighbourhood tag, a slightly different date estimate — should be merged rather than simply removed, which is a more labour-intensive process but preserves more information.
The Free Library partnership adds urgency. The joint portal, which would allow users to search both the City Archives and the Parkway Central Local History collections from a single interface, has been in planning since January 2024. Library officials have indicated that launching with a duplicate-laden dataset would undermine public trust in the tool's reliability. A clean, deduplicated collection is essentially a precondition for the project moving forward.
Researchers and neighbourhood historians who use the collection regularly — particularly those documenting blocks in neighborhoods like Fishtown, Germantown, and Point Breeze — have a practical stake in the outcome. A merged, properly tagged database would significantly improve their ability to trace the visual history of specific streets and buildings. The coming procurement decision, expected before October 1, 2026, is the clearest near-term signal of how seriously the city intends to treat the problem.