📖 Business Glossary

Business terminology and platform features

Entity Types

⚠️ Entity (SOLR/SQL field/table bedrijf)
All records (including stichting, vereniging, etc.) with or without address. Includes inactive records like bankrupt records.
Be careful: This legacy naming by IT includes much more than what Business team defines as "companies".
Registration / Registratie
All records (including stichting, vereniging, etc.) that are active and have an address.
Company
All records that are active and have an address, excluding some rechtsvormen like stichting, vereniging, etc.
Startups
Companies founded less than 10 years ago that are (or might be) innovative.
Innovative Registration/Company
What the client sees as innovative. What is included in Innovatiethemas.

Innovation & Classification

Innovatiethema (Innovation Theme)
Our innovation labels: agrifood, bouw, circulaire economie, energie, health (LSH), ICT, hightech (HTSM), logistiek, sociale impact, water.
Stored as public labels (im_public_*).
Note: "Topic" is the deprecated term for Innovatiethema.
Label
Classification markers attached to companies for categorization, filtering, and search.
Ecosystem
Refers to Netlists: awards, subsidies, networks, and more (ref_scrapers project).

Platform Components & Features

Online / Online Innovatiespotter
Client-facing platform for searching companies and accessing dashboards.
Bedrijfsspotter
Internal platform for company data management and processing.
Website
Refers to https://www.innovatiespotter.nl/, not the client platform.
Subscription
Clients with paid access to Online who can perform their own searches.
Dashboard
Client-specific dashboards created in Online.
Internal Dashboard
Dashboard showing user activity in Bedrijfsspotter (internal use only).
Admin
Bedrijfsspotter admin page for system management.
Qnection
Manual quality control validation process for companies.
API
RESTful API providing structured json access to entity data and search functionality for clients.
Detailed documentation at https://online.innovatiespotter.nl/api/documentation.
Query (Business Context)
SOLR search queries. When business discusses "queries," they mean SOLR, not SQL.
Query (IT Context)
SQL or SOLR query. Must be specified based on context.

Website Categories

Working URL [Priority 3]
Active and secure websites. Accessible only via SQL, not indexed in SOLR.
Correct URL [Priority 2] → url_active in SOLR
Active, secure, verified, and not blacklisted websites.
Scraped URL → url_scraped in SOLR
Websites with successfully scraped content stored in database.
ML Eligible URL [Priority 1] → url_word_threshold in SOLR
Scraped URLs with more than 100 words, with specific filters on employees and rechtsvormen.
rescrape_priority label [Priority 0]
Manually flagged websites to be re-scraped with highest priority.
dfe_priority label
Manually flagged companies to be processed by WGP (Website Guessing Process) with highest priority.

Website Scraping

Character threshold to scrape → url_char_threshold in SOLR
500 characters minimum per website. SOLR field is true if at least one active_url has >500 characters.
Word threshold for ML processing
100 words minimum.
url_scraped_chars in SOLR
Integer: total character count of all scraped content.
url_scraped_words in SOLR
Integer: total word count of all scraped content.
Maximum pages per website
20 pages.
Maximum crawl depth
How many levels deep the scraper follows links from the homepage.
Examples: www.example.nl/home = depth 0, www.example.nl/nl/products = depth 1
dh_full (Datahunter Full)
Scrapes Priority 0, 1, 2 URLs completely - maximum number of pages and depth.
dh_fast (Datahunter Fast)
Scrapes Priority 3 URLs partially - limited number of pages and depth.
url_summary in SOLR
LLM-generated summary of website content and company activities.