Skip to content

Database Schema

Sonar Catalog supports two backends:

BackendUse CaseSetup
SQLite (default)Single-machine, zero-configWorks out of the box
PostgreSQLMulti-machine, concurrent accessRequires pip install ".[postgres]"

Both backends use the same schema. SQLite additionally uses FTS5 for full-text search.

One row per unique content hash. The single source of truth for file identity.

ColumnTypeDescription
content_hashTEXT PKBLAKE3 or SHA-256 hash of full file contents
file_sizeBIGINTFile size in bytes
partial_hashTEXTHash of first partial_hash_size bytes (fingerprint)
hash_algorithmTEXT"blake3" or "sha256"
mime_typeTEXTMIME type (if python-magic is available)
file_typeTEXTOutput of file command
sonar_formatTEXTDetected sonar format ("xtf", "jsf", etc.)
first_seenTIMESTAMPWhen this hash was first cataloged
last_seenTIMESTAMPWhen this hash was last seen

Every place a file is accessible. Multiple rows per content hash if the file exists on multiple NFS servers.

ColumnTypeDescription
idBIGINT PKAuto-increment ID
content_hashTEXT FKReferences files.content_hash
nfs_serverTEXTOrigin NFS server hostname
nfs_exportTEXTServer export path (e.g., /export/survey)
remote_pathTEXTFull path on the origin server
canonical_pathTEXT UKserver:/full/path — the authoritative pointer
is_localBOOLEANTrue if on local disk, not NFS
access_pathTEXTLocal mount path used to reach the file
access_hostnameTEXTHostname of the machine that crawled this
mount_sourceTEXTHow the mount was resolved (proc_mounts, autofs_map, fstab, local)
file_nameTEXTFilename only
directoryTEXTParent directory on origin server

Extended metadata and navigation data. One row per content hash.

ColumnTypeDescription
content_hashTEXT PKReferences files.content_hash
metadataTEXTJSON blob containing track array and properties
lat_min, lat_maxREALLatitude bounds of nav track
lon_min, lon_maxREALLongitude bounds of nav track
lat_center, lon_centerREALTrack centroid
has_navINTEGER1 if nav data extracted, 0 otherwise

Discovered hosts and their scan status.

Scan history for tracking crawl runs.

-- Fast lookup by content hash
CREATE INDEX idx_loc_hash ON locations(content_hash);
-- Find files by server
CREATE INDEX idx_loc_server ON locations(nfs_server);
-- Find files by format
CREATE INDEX idx_files_format ON files(sonar_format);
-- Spatial queries on nav data
CREATE INDEX idx_meta_has_nav ON file_metadata(has_nav);
CREATE INDEX idx_meta_bbox ON file_metadata(lat_min, lat_max, lon_min, lon_max);
CREATE INDEX idx_meta_center ON file_metadata(lat_center, lon_center);

SQLite builds an FTS5 full-text index on filenames for fast search:

CREATE VIRTUAL TABLE IF NOT EXISTS files_fts USING fts5(
file_name, content_hash, content='', tokenize='unicode61'
);

Rebuild with:

Terminal window
sonar-catalog rebuild-index
{
"database": {
"backend": "postgresql",
"pg_host": "db.example.com",
"pg_port": 5432,
"pg_database": "sonar_catalog",
"pg_user": "sonar_catalog",
"pg_password": "secret"
}
}

Requires: pip install ".[postgres]"