System Architecture
Database Table Abbreviations
- B - bedrijf
- BA - bedrijf_adres
- BIQ - bedrijf_index_queue
- BK - bedrijf_kvk
- BL - bedrijf_label
- BN - bedrijf_nieuw
- BT - bedrijf_taxonomie (SBI and rechtsvorm)
- BW - bedrijf_website
- BWQ - bedrijf_website_queue
- DH - datahunter_log
- L - label
- LR - label_relations
- N - netlist
- NM - netlist_match
- NR - netlist_record
- W - website
- WG - website_guesses
- WL - website_label
- WQ - website_queue
CRUD Operations with Database
- Spotbot Indexer: Reads BIQ | Writes timestamp to B
- DFE: Reads/Writes BW, BWQ, W, WG
- Datahunter: Reads WQ | Writes datahunter_date timestamp to W and NFS | Reads/Writes DH
- ML-API: Reads B, BA, BK, BW, W and from NFS | Writes WL, BL and to NFS
- ref_scrapers: Reads/Writes N, NR, NM, BL, W, BW, LR
Server Deployment (Ansible)
Servers managed: 7 servers with Ansible playbooks
albus - SOLR, Spotbot Indexer, dh_full (Datahunter instance)
hagrid - NFS (website scraped content and domain summaries)
hermione - Database (PostgreSQL)
luna - GitLab
ron - Web host (any page except "website") and QMODUS cronjobs
severus - ML-API, dh_fast (Datahunter instance), ref_scrapers, DFE, Web-ML orchestration
spotter - Ansible deployments, VPN