Philadelphia's Department of Records is sitting on an estimated tens of thousands of duplicate digital images spread across municipal databases, a problem that auditors flagged as far back as 2019 and that has grown steadily worse as city agencies digitized paper files without coordinating their efforts. The issue came into sharper focus this spring when the city's Office of Innovation and Technology began a structured review of storage costs tied to redundant files held on servers at the Municipal Services Building on JFK Boulevard.
The timing matters. Philadelphia is mid-way through a broader digital infrastructure overhaul that city technology officials have tied to the 2023 Digital Services Strategic Plan. That plan committed the city to reducing redundant data holdings and cutting cloud storage expenditures, goals that are now colliding with a backlog that predates the plan itself by years. With the city's fiscal year 2027 budget under pressure, the storage bill for unnecessary duplicate files has become harder to ignore.
How the Backlog Built Up
The roots of the problem run back to the early 2010s, when individual city departments began digitizing records independently, without a shared naming convention or a central repository. The Philadelphia City Archives, located at 3101 Market Street in University City, maintains official historical records, but day-to-day operational images — permit photographs, inspection records, code enforcement documentation — were handled separately by agencies including the Department of Licenses and Inspections and the Philadelphia Water Department.
When L&I accelerated its digitization push after the 2013 building collapse on Market Street near 22nd, inspectors began attaching photographs to digital case files in volume. Without deduplication software running at intake, the same image could be uploaded multiple times to the same case — once by the inspector in the field, again when a supervisor reviewed the file, and sometimes a third time when the record was archived. The Water Department ran into similar patterns as it digitized decades of infrastructure photographs for the combined sewer system that runs under neighborhoods from Fishtown to Southwest Philadelphia.
By 2019, an internal audit — details of which were summarized in city budget documents reviewed by The Daily Philadelphia — noted that duplicate image files were consuming measurable portions of allocated server storage across at least four departments. The audit recommended implementing deduplication protocols at the point of upload, but the recommendation was not funded in the fiscal year 2020 budget, and the COVID-19 pandemic pushed the issue further down the priority list through 2021 and 2022.
The Cost of Inaction
Cloud and on-premise storage is not free. Industry pricing for government-grade managed storage has hovered in a range that makes even modest volumes of redundant files a budget line worth scrutinizing. The city has not published a specific dollar figure attributable solely to duplicate image storage, but the Office of Innovation and Technology's review, which began in March 2026, is expected to produce a cost estimate before the end of the third quarter.
Philadelphia is not alone in facing this. New York City's Department of Records and Information Services undertook a similar deduplication project for its DORIS digital archive between 2021 and 2023, a comparison that city technology staff have referenced in internal presentations. But Philadelphia's situation is complicated by the fact that multiple legacy systems — some running software that dates to the early 2000s — store images in formats that modern deduplication tools do not always process cleanly.
Community groups that rely on city records, including preservation advocates in neighborhoods like Germantown and Strawberry Mansion who regularly request historic permit photographs, have at times received duplicate or mislabeled image files in response to Right-to-Know requests, adding a practical dimension to what might otherwise seem like a back-office data management problem.
The Office of Innovation and Technology has said it plans to release a remediation framework by September 2026. Departments will then be asked to designate a records liaison responsible for image intake standards. For residents or organizations that regularly file Right-to-Know requests involving photographs or scanned documents, city officials have suggested checking the OpenPhilly data portal for updated dataset versions once the deduplication review concludes — though no firm public release date for cleaned datasets has been announced.