Reddit Financial Data Extraction

Customer: AI | Published: 17.03.2026

I’m building a long-horizon study on how Reddit conversations shape market sentiment and I need a clean historical dataset to start. Please collect every post and all nested comments from the past five years in r/TeslaMotors, r/apple and r/amazon, NVIDIA and Amazon. For every record I need the full text together with the author name and the exact timestamp. Posts and their comments must sit in separate, clearly labelled columns inside one CSV file so I can load everything straight into pandas without reshaping. Deliverables • A well-commented Python script (PRAW, Pushshift, PSAW or similar are fine) that: – pulls the data in chronological order, – gracefully handles rate limits / missing items, and – writes the final CSV exactly as specified. • The generated CSV covering the full five-year window which can be changed accordingly, so like a rolling window. • A brief README showing command line usage and environment requirements so I can reproduce the scrape on my side. I will consider the job finished when I can run the script on a fresh machine, regenerate the same schema, and spot-check random rows against Reddit for accuracy.