🗂️ Projects Overview

All system components at a glance

Core Infrastructure

DATAHUB (PostgreSQL DB + SOLR)

Multi-relation PostgreSQL database storing all entity data, managed by Flyway migrations. Includes SOLR search engine for indexing and querying entities.

Main Application

QMODUS (PHP)

Core project including Online Innovatiespotter, Bedrijfsspotter, API, Q-nection, Dashboard, Internal Dashboard, Admin, Subscription Management, and cron jobs via scripts/CLI.

Innovatiespotter New (WordPress)

Commercial main "Website" using WordPress, Elementor Pro, and Martex Jthemes.

Data Pipeline

DFE / Digital Footprint Explorer (Python/SQLAlchemy)

Website Guessing Pipeline/Process (WGP), queue coordination (determines scraping priority and queues websites), Working URL validation, and event-driven web orchestration layer.

Datahunter (NestJS/TypeScript)

Distributed web scraping service with dynamic scrapers that adapt to different website structures. Operates two isolated instances: dh_full (processes Correct URLs with comprehensive extraction) and dh_fast (processes Working URLs with optimized speed). Uses RabbitMQ for task distribution and PostgreSQL for queue orchestration, timestamps and logging, storing extracted content as compressed files and metadata as structured JSON files.

ML-API (Python/Flask)

Machine learning classification service using RNN/LLM models for website categorization, domain summary generation (LLM), and event-driven ML orchestration layer.

Supporting & Development Projects

Spotbot Indexer (Java/Gradle)

Indexes database to SOLR with "bedrijf" as the root document (main entity ID) - related data like websites and labels are child documents, with all scraped website content concatenated into the `q_website` field.

ML Models / Innovative Insight Engine (Notebooks)

Notebooks for the development of machine learning models for website classification (preprocessing, model training, and output analysis) and local experiments.

ref_scrapers / Innovatie-Ecosystem

Netlists: awards, subsidies, networks, and more (scraper, matchfinder).

Ansible (IaC)

Spotter servers overview, management, set-up, configuration, and automation via Ansible (Infrastructure as Code) + cronjob scheduling.