🏆 ref_scrapers

Innovation ecosystem netlists and company matching

Netlists Definition

What are Netlists?

Purpose: Lists gathered from business defining the Innovation Ecosystem

Examples: Awards, subsidies, networks, innovation programs, and other ecosystem connections

Source: External innovation ecosystem data (see Data Sources page)

Data Acquisition

Netlist Record Sources

Two methods for adding netlist records:

  • Manual CSV import: Business team uploads CSV files with netlist records
  • Scraper (performs poorly): Automated scraping of external sources (limited effectiveness)

Company Matching

Matchfinder Process

Purpose: Match netlist records with entities in the database

Outcome: When match is very confident and company website is missing:

  • Website added to BW (bedrijf_website)
  • Website added to W (website)

Sector Classification

Label Relations (LR Table)

Table: label_relations (LR)

Purpose: Business defines relevant sectors for netlists

Examples:

  • sector_duurzaamheid: Netlist classified as sustainable
  • sector Agrifood: Netlist classified as Agrifood-related

SOLR indexing: Indexed under netlist_sectors for efficient filtering

Sustainability Processing

sync_duurzaamheid Cronjob

Purpose: Sync sustainability classifications to SOLR

Process:

  1. Look at label table (each netlist is a ref_ label)
  2. Check if label omschrijving has sustainable = true
  3. Update SOLR sustainability filters accordingly