Research Article Text Mining for Meta-Analysis

Заказчик: AI | Опубликовано: 01.11.2025
Бюджет: 250 $

Dear Estemeed Experts, I am releasing a Request for Proposal (RFP) for the following work. Title: Automated Text Mining and Meta-Analysis Dataset Creation from 85 Research Articles Project Overview I am conducting a meta-analysis of carefully selected reserach papers. The goal is to automatically extract structured quantitative data from 85 peer-reviewed academic papers (PDFs) and create a meta-analysis-ready dataset in CSV/Excel format. Scope of Work Phase 1 – Text Mining and Data Extraction The contractor will: • Parse 85 PDF research articles (provided) using text-mining or semi-automated extraction. • Identify and extract key fields from each paper’s abstract, methods, results, and tables. • Populate a standardized meta-analysis table with the following variables: study_id, effect_id, authors, year, title, doi, region, country_sample, asset_class, method, outcome_type, outcome_scale, yi, sei, vi, direction_original, where_found, extraction_mode, extraction_confidence, rob_overall Expected Output: meta_85_master.csv – fully populated meta-analysis dataset, validated and traceable (where_found = table/page reference). As an additional deliverable, the contractor may run the full meta-analysis and produce all statistical outputs and figures using the provided R/Python scripts. Tasks: • Execute REML random-effects models (metafor / statsmodels). • Perform multilevel and RVE (robust variance) meta-regressions. • Conduct heterogeneity, publication bias, PET-PEESE, Egger tests. • Generate publication-ready figures (forest, funnel, PET-PEESE, influence plots). • Export final results to: o tables/ (main, moderator, bias, subgroup) o figures/ (forest, funnel, contour, PET-PEESE) o report/ (HTML + PDF) Item Description 1 meta_85_master.csv – standardized dataset 2 validation_report.csv – duplicate/missing/format check 3 (Optional) Complete analysis bundle (artifacts/) with all tables, figures, and reports 4 Short script (validate.py) confirming dataset consistency (vi=sei², no duplicates) Provided Materials (for selected applicants) Qualified applicants will receive: • 85 PDF research papers • Excel file with initial metadata (titles, DOIs, and base fields) • Full bilingual “Meta-Analysis Data Extraction Protocol” and Execution Specification (R + Python) Requirements • Experience with text mining, NLP, or data extraction from academic PDFs • Proficiency in Python (pandas, regex, pdfminer, Camelot, PyMuPDF) or R • Understanding of meta-analysis methods (REML, PET-PEESE, RVE) preferred • Ability to ensure reproducible outputs and proper validation • Optional: capacity to run and deliver all analytical results (R/Python) NDA will also be a requirement to sign! Submission Please include: • A brief description of your experience with academic data extraction or meta-analysis. • Whether you can also deliver the Phase 2 (full meta-analysis execution). • Tools or frameworks you plan to use (e.g., Python, R, NLP models, OCR/LLM). • Basic offer for the text mining and if possible, an optional offer for the statistical output. Note: Only shortlisted applicants will receive the complete package (85 PDFs + Excel + specifications). All work must ensure data traceability and reproducibility.