Core Infrastructure
DATAHUB (PostgreSQL DB + SOLR)
Multi-relation PostgreSQL database storing all entity data, managed by Flyway migrations. Includes SOLR search engine for indexing and querying entities.
Main Application
QMODUS (PHP)
Core project including Online Innovatiespotter, Bedrijfsspotter, API, Q-nection, Dashboard, Internal Dashboard, Admin, Subscription Management, and cron jobs via scripts/CLI.
Innovatiespotter New (WordPress)
Commercial main "Website" using WordPress, Elementor Pro, and Martex Jthemes.
Data Pipeline
DFE / Digital Footprint Explorer (Python/SQLAlchemy)
Website Guessing Pipeline/Process (WGP), queue coordination (determines scraping priority and queues websites), Working URL validation, and event-driven web orchestration layer.
Datahunter (NestJS/TypeScript)
Distributed web scraping service with dynamic scrapers that adapt to different website structures. Operates two isolated instances: dh_full (processes Correct URLs with comprehensive extraction) and dh_fast (processes Working URLs with optimized speed). Uses RabbitMQ for task distribution and PostgreSQL for queue orchestration, timestamps and logging, storing extracted content as compressed files and metadata as structured JSON files.
ML-API (Python/Flask)
Machine learning classification service using RNN/LLM models for website categorization, domain summary generation (LLM), and event-driven ML orchestration layer.
Supporting & Development Projects
Spotbot Indexer (Java/Gradle)
Indexes database to SOLR with "bedrijf" as the root document (main entity ID) - related data like websites and labels are child documents, with all scraped website content concatenated into the `q_website` field.
ML Models / Innovative Insight Engine (Notebooks)
Notebooks for the development of machine learning models for website classification (preprocessing, model training, and output analysis) and local experiments.
ref_scrapers / Innovatie-Ecosystem
Netlists: awards, subsidies, networks, and more (scraper, matchfinder).
Ansible (IaC)
Spotter servers overview, management, set-up, configuration, and automation via Ansible (Infrastructure as Code) + cronjob scheduling.