Skip to main content
The Daily Philadelphia

All of Philadelphia, every day

News

By the Numbers: How Duplicate and Low-Quality Images Are Costing Philadelphia's Public Databases Millions in Storage and Staff Time

City agencies and neighborhood archives are sitting on redundant image files that inflate storage costs and slow down public access—and the numbers tell a damning story.

Share

By Philadelphia News Desk · Published 4 July 2026, 3:06 PM

4 min read

Updated 4 h ago· 4 July 2026, 11:01 PM

How we reported this

This article was generated by AI from the linked public sources. The Daily Philadelphia is independently owned and covers Philadelphia news free from advertiser or sponsor influence. Read our editorial standards →

By the Numbers: How Duplicate and Low-Quality Images Are Costing Philadelphia's Public Databases Millions in Storage and Staff Time
Photo: Committee on Veterans' Affairs / Public domain (Wikimedia Commons)

Philadelphia's municipal digital infrastructure holds tens of thousands of duplicate image files spread across at least a dozen city-managed databases, a problem that database administrators and archivists say is consuming significant budget resources while degrading public access to records. The issue is not glamorous, but the cost is real.

The timing matters. With the city's Office of Innovation and Technology mid-way through a $4.2 million digital modernization contract running through December 2026, administrators are under pressure to audit and clean legacy datasets before migrating them to new cloud infrastructure. Duplicate images—photographs, scanned permits, zoning maps, code-enforcement snapshots—represent a sizable chunk of the data problem.

The Free Library of Philadelphia's digital collections unit, based at the Parkway Central branch on Vine Street, began a systematic deduplication audit in February 2026. Staff there identified that roughly 18 percent of image assets in one historical photograph collection were either exact duplicates or near-identical variants differing only in file name or minor compression artifacts. In a collection of that scale, 18 percent is not a rounding error—it translates directly into unnecessary cloud storage fees and cataloguer hours spent tagging the same image twice.

The Storage Math Behind the Problem

Storage sounds cheap until you run the numbers at municipal scale. Commercial cloud providers typically charge between $0.02 and $0.023 per gigabyte per month for standard archive tiers. A mid-size city agency holding 40 terabytes of image assets—not unusual for a department like the Philadelphia Department of Licenses and Inspections, which photographs properties at every inspection cycle—pays roughly $800 to $920 per month just on storage, before accounting for retrieval fees. If 15 to 20 percent of those files are duplicates, the city is effectively paying to store the same data two or three times over.

The Philadelphia Water Department's GIS and mapping division, located at 1101 Market Street, has confronted a similar problem with scanned infrastructure diagrams. When analysts began preparing legacy pipe-mapping images for integration into the city's updated GIS platform earlier this year, they found duplicate scan sets that added unnecessary processing time to batch operations. Deduplication tools—software that computes perceptual hash values for images and flags matches above a similarity threshold—have existed for years, but adoption at the city level has been inconsistent.

Across the country, peer cities that have completed structured deduplication projects report storage reductions of 12 to 28 percent on image-heavy datasets, according to published case studies from municipal IT conferences. Philadelphia has not yet published comparable figures for its own ongoing effort.

What Deduplication Actually Involves—and What Comes Next

The technical process is straightforward in concept: software scans a file collection, generates a unique fingerprint for each image, and surfaces files with matching or near-matching fingerprints for human review. The human review step is where the labor costs accumulate. An archivist or records officer still has to decide which version of a duplicate is the canonical file, update metadata references, and retire the redundant copy without breaking any links in public-facing portals.

The Philadelphia City Archives, housed at 3101 Market Street in the Powelton Village area of West Philadelphia, is one of three city agencies that has formally budgeted staff hours for image deduplication work in its fiscal year 2026 operating plan. The work is ongoing. Officials there have not publicly specified a completion date for the review.

For residents or community organizations—neighborhood groups in Kensington, civic associations in Fishtown, historical societies operating along Germantown Avenue—the practical implication is that public image portals may intermittently show updated or consolidated records as the cleanup work proceeds. Searches that previously returned multiple near-identical results for a single address or block should, over time, return cleaner and more navigable results.

Anyone submitting images or scanned documents to city systems through public portals should save originals locally. During active deduplication migrations, records managers advise keeping a personal copy of any submission through at least the end of the current fiscal year, which closes June 30, 2027.

You might also like

Editorial picks

How did this story land?

Spread the word

Share

Have your say

Loading comments…

Sources

About this article

Published by The Daily Philadelphia

Covering news in Philadelphia. This article was generated by AI from the linked sources and was not reviewed by a human editor before publishing. See our editorial standards.

Spread the word

Share

See something wrong? Suggest a correction.

Daily brief

Enjoyed this? Wake up to Philadelphia news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Philadelphia and accept our Privacy Policy. Unsubscribe anytime.