I have two existing PowerShell scripts: cveindexer.ps1 (Fetches all CVEs from the NIST/NVD database Stores them in NDJSON format) CPEMatcher.ps1 (Which generates CPEs with the info of the NDJSON format) Goal: From a list of non-normalized vendor / product names, generate the correct CPE identifiers as used in the NIST CPE dictionary. The main challenge: Vendor and product names in my source data are inconsistent, localized, or contain typos, compared to the official naming in the NIST CPE database. Examples of issues I want to solve: Different spellings / aliases Dell Inc. ↔ dell win.rar GmbH ↔ rarlab Notepad++ Team ↔ don_ho Language / format variations Microsoft OneNote - de-de ↔ onenote Citrix Workspace 2405 ↔ workspace_app Typos / “dirty” agent strings SOPHOS Network Thret Prot ↔ sophos:network_threat_protection I already have a list of software components (non-normalized vendor/product names) that should be mapped to the correct CPEs. What I’m Looking For I’m looking for a freelancer who can: Design and implement a robust matching pipeline that maps my raw vendor/product names to valid CPEs from the NIST CPE dictionary. Combine classical approaches (fuzzy matching, aliases, tokenization, normalization, etc.) with a modern NLP model (e.g., DistilBERT) to improve match quality. Export or run the NLP model via ONNX so it can be integrated into PowerShell (e.g., from within CPEMatcher.ps1). Handle edge cases such as: Multiple CPE candidates per input (ranking and choosing best match) Version handling where applicable Language variants and common vendor/product synonyms Technical Direction (my current idea) Use a DistilBERT or similar model to generate embeddings for: Raw vendor/product strings from my list Official CPE title/metadata from NIST Compute similarity (e.g., cosine similarity) to find the most likely CPE matches. Combine this with: Fuzzy string matching (Levenshtein, Jaro-Winkler, etc.) Custom alias dictionaries (e.g. win.rar GmbH → rarlab) Normalization rules (lowercasing, removing locale suffixes like -de-de, etc.) Package the model as ONNX and demonstrate how to call it from PowerShell. I’m open to your suggestions on the best architecture, tools, or libraries – as long as the final solution is automatable, reproducible, and works well with PowerShell and the NVD/NIST data. Deliverables Updated or new CPEMatcher.ps1 (or additional scripts/modules) that: Take my list of non-normalized software components as input Output the corresponding CPE(s) with a confidence score A documented matching pipeline, including: How the model is trained/fine-tuned (if applicable) How the ONNX model is generated and integrated How to update the NVD/CPE data and re-run the matching Optional: A small evaluation report or script that: Measures matching quality on a sample set Shows examples of correct/incorrect matches for validation Required Skills Strong experience with NLP and text similarity (BERT/DistilBERT or similar models) Experience exporting models to ONNX Good knowledge of PowerShell scripting Familiarity with NIST NVD / CPE / CVE data structures Experience with fuzzy matching and string normalization Work needs to be done in the next 9hours, please provide a poc so that we could move on fast.