Philadelphia's Office of Innovation and Technology is working through a sprawling cleanup of the city's digital asset libraries, after an internal audit identified thousands of duplicate images clogging government databases — copies that have quietly inflated storage costs and slowed down public-facing platforms for at least three years.
The problem matters now because the city is mid-renovation on its flagship Philly311 service portal, a platform used by roughly 1.4 million residents to report potholes, file complaints, and track neighborhood service requests. Duplicate image files attached to service tickets have been flagged as one driver behind slow load times on the portal, particularly on mobile devices in neighborhoods like Kensington and West Philadelphia where cellular connections, not broadband, are the primary way people access city services.
A Problem Years in the Making
The roots of the duplication crisis trace back to 2019, when the city migrated its legacy content management system onto a new cloud infrastructure managed through a contract with a third-party vendor. During that migration, automated syncing tools failed to detect identical image files with different filenames — a known technical vulnerability in large-scale data transfers. Files moved, re-moved, and re-tagged in batches, generating redundant copies each time. Nobody caught it systematically at the time.
Then the pandemic hit. Between March 2020 and late 2021, the city's IT staff were redeployed to support emergency operations, and routine database hygiene — the kind of work that catches duplicate records before they compound — largely stopped. By the time normal operations resumed, the Libraries of Philadelphia, the Philadelphia City Planning Commission, and the Department of Streets had all added their own image libraries to shared city storage environments, each with its own naming conventions and zero cross-department deduplication protocol.
The Office of Innovation and Technology did not respond to a request for comment before publication. However, budget documents submitted to City Council in spring 2026 referenced a line item for "digital asset management remediation" valued at $340,000 — a figure that covers both the audit work and the procurement of automated deduplication software now being piloted across three city departments.
What the Cleanup Actually Involves
The practical work of duplicate image replacement — not just deletion, but the substitution of canonical, correctly tagged files across all systems that reference a given image — is more complex than it sounds. A photograph of the Reading Terminal Market used in twelve separate Planning Commission documents, each pulling from a slightly different file path, cannot simply be deleted. Every instance must be mapped, the master file identified, and cross-system references updated. For a database that grew largely without governance standards, that mapping is being done partly by hand.
The city is running a pilot of the deduplication software at the Free Library of Philadelphia's digital collections branch on Vine Street and within the Philadelphia Water Department's internal communications archive. Results from those pilots, according to the spring 2026 budget documents, are expected to inform a citywide rollout scheduled for the first quarter of 2027.
Duplicate image problems are not unique to Philadelphia. Chicago's data portal encountered similar issues after a 2018 infrastructure migration, and New York City's open data program spent considerable resources between 2020 and 2022 standardizing image asset governance across its agencies. The difference in Philadelphia's case is the timeline — the problem was identified later and compounded longer before resources were assigned to fix it.
For residents and the journalists, advocates, and developers who rely on city data, the practical advice right now is straightforward: if you are building an application or archival project using images pulled from Philadelphia's open data portal, verify file checksums independently rather than assuming that files with different names are different images. The city's Office of Innovation and Technology has published a data dictionary update on the city's developer resources page at data.phila.gov that flags known duplicate clusters by dataset. The full remediation is not done. But at least, as of this Fourth of July, the city knows what it's dealing with.