Automated Student List Enrichment via LinkedIn

Бюджет: 250 $

Title: Build Automated Pipeline to Enrich Student List with LinkedIn Profiles + Experience Flags Summary: I have a list of ~2000 current university Masters / PhD students that includes name, email, and admit year. I want to build a pipeline that (1) finds each person's LinkedIn profile, (2) pulls their work history from a data provider API, and (3) classifies each person into two yes/no flags based on career background. I’m looking for someone who can build this end-to-end in Python and deliver a final CSV. Deliverable: A CSV with one row per student containing: * Full name (Provided) * Degree Type (Provided) * Email (Provided) * LinkedIn profile URL * Flag_A_big_tech (yes/no) * Flag_A_big_tech_eng (yes/no) * Flag_B_startup (yes/no) * Flag_B_startup_eng (yes/no) * Explanation_big_tech (short text, where they worked, title, why classification decision made) * Explanation_startup (short text, where they worked, title, why classification decision made) * (Multiple Columns) Full LinkedIn Data From API Provider (e.g., companies worked for, title - read more here https://brightdata.com/cp/scrapers/api/gd_l1viktl72bvl7bjuj0/pdp/overview?camp=plg) * Prompt Used To Evaluate Work History / Engineering Classification for Flag A * Prompt Used To Evaluate Work History / Engineering Classification for Flag B Also: documented Python code used to generate it. Project Details: Input: * You’ll receive a CSV with ~1,500 rows. * Columns include: name, email (many have a .edu address), admit year, degree type (Masters or PhD). * Example row: * Name: “Joe Roger” * Email: “[Joe_Roger01@x] ” * Student Type: “Masters” * Admitted Year: “2022” Step 1. Profile Resolution / LinkedIn URL Matching * For each student, use enrichment / people data APIs to identify their LinkedIn profile URL. * If email is available, use that as a strong identifier (preferred). * If email is not available, do a fuzzy search based on name + university + degree program + admit year. * Fall back to a Search API to find linkedin URL if the data brokers dont work * Output: stable LinkedIn URL for that person. Step 2. Work History Enrichment * Once you have a LinkedIn URL, call an API (e.g. brightdata / Proxycurl / People Data Labs / similar) that returns structured full work/education history: * Company * Title * Start date / End date * Save that JSON for each person (for audit/debug). (You will need to conduct QA to make sure these ppl actually match the targets) Step 3. LLM Classification For each person, call an LLM twice with their tagged work history to classify if they have: A) Big Tech Experience Definition: TRUE if they worked in any capacity (intern/full-time/etc.) at Fortune 500 tech companies, FAANG-style companies, or late-stage unicorns (Google, Meta, Amazon, Apple, Microsoft, Nvidia, Tesla, Stripe, Databricks, Palantir, Uber, Airbnb, Salesforce, etc.). Prompt output (JSON): { "big_tech_flag": "yes" or "no", "big_tech_eng_flag": "yes" or "no", (were they in a technical role at the company e.g. software engineer) "explanation": "short explanation citing specific companies/roles" } B) Startup / VC-Backed Experience Flag Definition: TRUE if they founded or worked at a venture-backed startup anywhere from Seed → IPO OR had an “early operator” style title (“Founding Engineer”, “First Sales Hire”, “Head of X” at a small startup, “Co-Founder”, etc.). This LLM call should have internet access so it can search the companies in the work history to check if they have startup experience. Prompt output (JSON): { "startup_flag": "yes" or "no", "startup_eng_flag": "yes" or "no", (were they in a technical role at the company e.g. software engineer) "explanation": "short explanation citing specific companies/roles" } Step 5. Final Output Produce a single CSV with columns: * Full name (Provided) * Degree Type (Provided) * Email (Provided) * LinkedIn profile URL * Flag_A_big_tech (yes/no) * Flag_A_big_tech_eng (yes/no) * Flag_B_startup (yes/no) * Flag_B_startup_eng (yes/no) * Explanation_big_tech (short text, where they worked, title, why classification decision made) * Explanation_startup (short text, where they worked, title, why classification decision made) * (Multiple Columns) Full LinkedIn Data From API Provider (e.g., companies worked for, title - read more here https://brightdata.com/cp/scrapers/api/gd_l1viktl72bvl7bjuj0/pdp/overview?camp=plg) * Prompt Used To Evaluate Work History / Engineering Classification for Flag A * Prompt Used To Evaluate Work History / Engineering Classification for Flag B Also deliver: * All intermediate JSON for each person’s work history * The full Python codebase (clean, commented, runnable locally) * README with instructions on: * Required API keys / environment variables * How to re-run the pipeline on a new list Technical Requirements: * Strong Python skills * Experience using enrichment / people data / lead gen APIs at scale (People Data Labs, Proxycurl, Clearbit, ZoomInfo, Apollo, etc.) * Able to handle rate limiting and retries * Comfortable calling an LLM programmatically and parsing JSON output * Comfortable with fuzzy matching / deduping people with similar names What to include in your proposal: 1. A short description of your relevant experience doing data enrichment / lead list building / LinkedIn data work. 2. Which enrichment API(s) you plan to use and why. 3. Rough sense of expected match rate for LinkedIn URLs (doesn’t need to be exact, but I want to know how you think about disambiguating common names). 4. Estimated total project fee. This is a contained project: I give you a CSV of ~2000 names, you return (a) the enriched CSV with the two flags and explanations, and (b) code + README so I can run this again on future lists. NOTE: I can pay for API fees, please provide an estimate of your fees and separately what you estimate API fees to be for your work. Code should be easy to read, not verbose, well documented. I can code and will do a code review. Looking forward to working with you.

Python

Реєстрація