Skip to content

Crawling Files

Terminal window
sonar-catalog crawl /mnt/sonar-nas-01

This walks the directory tree, hashes each file, resolves its NFS origin, detects sonar format, and inserts everything into the catalog.

If the path isn’t on an NFS mount (or mount resolution fails), you can manually specify the host:

Terminal window
sonar-catalog crawl /data/survey --host sonar-server-01 --ip 192.168.1.10

Combine discovery and crawling in one step:

Terminal window
sonar-catalog crawl-all

This runs the full discovery engine, then crawls every accessible NFS mount.

By default, incremental mode is enabled. On subsequent crawls, files whose mtime and size haven’t changed are skipped entirely — no hashing needed.

Disable for a full rescan:

{
"crawler": {
"incremental": false
}
}

Only crawl specific file types:

{
"crawler": {
"include_extensions": [".xtf", ".jsf", ".s7k", ".all", ".kmall"]
}
}
{
"crawler": {
"min_file_size": 1024,
"max_file_size": 10737418240
}
}
{
"crawler": {
"exclude_dirs": [".git", ".svn", "__pycache__", ".Trash", "lost+found"]
}
}
SettingDefaultDescription
hash_workers4Parallel hashing threads
batch_size1000Files per DB batch insert
partial_hash_size4194304Bytes for partial fingerprint (4MB)
checkpoint_interval5000Files between checkpoint saves

For large NFS shares (millions of files), increase batch_size and checkpoint_interval:

{
"crawler": {
"batch_size": 5000,
"checkpoint_interval": 25000,
"hash_workers": 8
}
}