WikiMediaCache is a microservice that caches Wikimedia Commons geotagged photo coordinates and serves them to the Freemap mobile app.
Instead of querying the Wikimedia Commons API directly from the mobile app (which has bbox size limits and rate constraints), this service pre-caches all photo coordinates from the monthly Wikimedia data dump and exposes a simple bbox query endpoint.
GET /pictures?bbox=<minLon>,<minLat>,<maxLon>,<maxLat>&clientId=<uniqueId>&prefetch=<true/false>Returns a list of photos within the bounding box:
[
{ "pageId": 12345678, "lat": 48.1486, "lon": 17.1077 },
...
]Returns up to 5000 results. Full original metadata (author, license, description) is still fetched directly from the Wikimedia API on demand in the mobile app.
GET /wikimedia/thumbnail/<pageId>?size=<pixels>Example: /wikimedia/thumbnail/12345678?size=120 (size is optional, defaults to 120). The backend automatically snaps the requested size to standard Wikimedia thumbnail sizes (120, 250, 500...) to maximize cache hit rate.
Endpoint behavior and caching details:
- In-RAM Index: Upon startup, the service loads all existing filenames from the
/cachedirectory into memory (Set<string>). This allows it to check if a file is already downloaded inO(1)time without any CPU-blockingfs.existsSyncdisk calls, dramatically improving performance during mass coordinate requests. - Cache Hit: If the thumbnail exists, it immediately returns the image (MIME type:
image/jpeg). - Cache Miss (Queued): If the thumbnail is not downloaded yet, the backend adds it to an asynchronous download queue and returns HTTP 503 Service Unavailable with a
Retry-After: 15header. The app shouldn't block, but rather silently retry fetching it later. - Dead/Deleted files: If the upstream Wikimedia API reports that the file no longer exists (e.g.,
No imageinfo), the server dynamically generates a small placeholder JPEG (a red cross on a gray background) and saves it permanently to the cache under the same pageId. Subsequent requests for this deleted file will return200 OKwith the placeholder, thus preventing infinite retry loops from the mobile app. - Queue Cleanup: The background queue tracks the activity of each
clientId. If a client disconnects or stops panning for more than 120 seconds, any pending downloads associated with theirclientIdare silently dropped to save bandwidth.
GET /statusReturns the current status of the service, uptime, and the number of images currently waiting in the background download queue. It also provides cache statistics.
{
"service": "WikiMediaCache",
"version": "1.1.1",
"status": "ok",
"uptime": 1234.56,
"queueLength": 0,
"cachedFilesCount": 1500,
"cachedFilesSizeBytes": 20480000,
"activeClients24hCount": 42,
"cacheHitsCount": 1234,
"cacheMissesCount": 456,
"deadFilesCount": 5,
"apiErrorCount": 12
}GET /cancel-prefetch?clientId=<uniqueId>Silently removes all pending background download tasks from the queue for the specified clientId. This is useful when the user pans away to a different area and the old thumbnails are no longer needed.
Note: The server will also automatically drop queue items for clients that have not been active (sent any requests) in the last 120 seconds.
GET /cleanup-cache?region=svk&mode=outsideA powerful background task that scans all downloaded thumbnails and deletes those that fall outside (or inside) a specific bounding box. This is highly useful to free up disk space by removing globally downloaded images while keeping local ones (e.g. SVK/CZ).
Parameters:
region: Use predefined bounding boxes. Available options:svk,svk_cz, orall(completely wipes the entire cache directory, bypassing spatial checks).bbox: Use a custom bounding box:minLon,minLat,maxLon,maxLat.mode:outside(default, deletes everything outside the bbox) orinside(deletes everything inside the bbox).
Security: This endpoint checks ctx.ip and rejects any request that doesn't originate from 127.0.0.1 or ::1 (returns 403 Forbidden).
Response:
Returns 202 Accepted immediately. The cleanup task runs asynchronously in the background.
GET /metricsExposes standard Prometheus-compatible metrics, perfect for monitoring via Grafana or Uptime Kuma.
# HELP wikimedia_cache_files_count Number of cached thumbnail files
# TYPE wikimedia_cache_files_count gauge
wikimedia_cache_files_count 1500
# HELP wikimedia_cache_size_bytes Total size of cached files
# TYPE wikimedia_cache_size_bytes gauge
wikimedia_cache_size_bytes 20480000
# HELP wikimedia_active_clients_24h Number of unique clients in the last 24h
# TYPE wikimedia_active_clients_24h gauge
wikimedia_active_clients_24h 42
# HELP wikimedia_cache_hits_total Total number of cache hits
# TYPE wikimedia_cache_hits_total counter
wikimedia_cache_hits_total 1234
# HELP wikimedia_cache_misses_total Total number of cache misses
# TYPE wikimedia_cache_misses_total counter
wikimedia_cache_misses_total 456
# HELP wikimedia_api_errors_total Total number of upstream API errors
# TYPE wikimedia_api_errors_total counter
wikimedia_api_errors_total 12
# HELP wikimedia_dead_files_total Total number of deleted files placeholder hits
# TYPE wikimedia_dead_files_total counter
wikimedia_dead_files_total 5
# HELP wikimedia_prefetch_queue_length Number of items waiting to be downloaded
# TYPE wikimedia_prefetch_queue_length gauge
wikimedia_prefetch_queue_length 0
Requires PostgreSQL with the PostGIS extension.
sudo su - postgres
createuser freemap
createdb -E UTF8 -O freemap freemap
psql -d freemap -c "CREATE EXTENSION postgis;"
exitCreate a .env file in the project root:
DB_HOST=localhost
DB_PORT=5432
DB_USER=freemap
DB_PASSWORD=freemap
DB_NAME=freemap
PORT=4000By default, the server requires the following HTTP header for endpoints (except /status and /metrics):
X-Freemap-API-Key: your_secret_api_key_hereSecurity (Hashed Keys):
Create a file named api_keys.txt in the root directory (next to package.json). Add one SHA-256 hash per line. The server will hash incoming API keys from clients and compare them against this list.
To generate a hash for a new key in your terminal:
echo -n "my_super_secret_key" | sha256sumIf api_keys.txt is missing or empty, the server will reject all API requests with 401 Unauthorized.
The table and spatial index are created automatically on first start.
npm install
npx tsx src/index.tsDownload and import the full Wikimedia Commons geo tags dump (run once, then monthly):
cd /opt/WikiMediaCache
NODE_OPTIONS="--max-old-space-size=4096" npx tsx src/importDump.tsThis streams commonswiki-latest-geo_tags.sql.gz (~700 MB) from dumps.wikimedia.org, filters primary Earth coordinates (currently only type === 'camera' is imported; other available types on Wikimedia include object, landmark, church, city, mountain, etc. — see the Tools section below to analyze the dump yourself), and upserts them into the database. A full import takes roughly 10–20 minutes depending on connection speed.
This script includes several advanced mechanisms to ensure reliability:
NODE_OPTIONS="--max-old-space-size=4096": Required parameter. Processing and parsing the 700 MB SQL dump requires a significant amount of memory. This flag allows Node.js to use up to 4 GB of RAM (the default limit is 2 GB, which may cause the script to crash with anOut of Memoryerror).- Resume download: If the download fails or is interrupted, the script creates a
/tmp/geo_tags.sql.gz.partfile. On the next run, it automatically resumes the download from the exact byte where it left off. - Checkpointing: The database insertion saves its state (last processed record ID) to
/tmp/geo_tags_import.checkpointafter each batch. If the script crashes or is terminated, it will resume from the exact position where it stopped. - Redownload: The script does not download the dump again if the
/tmp/geo_tags.sql.gzfile already exists. To force a redownload of a fresh dump, you must manually delete it first:rm /tmp/geo_tags.sql.gz
- Deduplication (ON CONFLICT protection): The SQL dump often contains duplicate photos sequentially. The script deduplicates batches before sending them to PostgreSQL, protecting the database from
ON CONFLICTrow update errors. - Automatic Cache Cleanup: After the database import completes, the script scans the
/cachedirectory. It cross-references all downloaded thumbnails with the newly imported database records. Any thumbnail whosepage_idno longer exists in the updated Wikimedia dump (e.g., deleted images) is automatically unlinked and permanently deleted from the disk to free up space.
Run monthly to stay in sync with the Wikimedia Commons dump schedule (dumps are published around the 1st of each month):
0 3 2 * * cd /opt/WikiMediaCache && NODE_OPTIONS="--max-old-space-size=4096" npx tsx src/importDump.ts >> /var/log/wikimediacache-import.log 2>&1The tools directory contains utility Python scripts for analyzing and debugging the raw Wikimedia data dump:
analyze.py: Reads the SQL dump stream and aggregates the top 100gt_typeandgt_countryvalues. Useful for checking what kind of tags exist in the dump.python3 tools/analyze.py /tmp/geo_tags.sql.gz
find.py: Fast search utility to find all raw SQL columns for a specificgt_page_idwithout loading the whole file into memory. Useful for debugging specific photos.python3 tools/find.py /tmp/geo_tags.sql.gz <pageId>