Predictive Mixed-Data LLM Design

I want to build a custom large-language model that goes beyond text-only chat. The goal is a predictive engine that can read free-form text, combine it with numerical features, and return forward-looking insights. In practice that means designing an architecture able to embed and fuse both modalities, training it on my mixed dataset, and validating that the model can reliably forecast the target variables we care about. You will take me from data preprocessing through to an inference-ready checkpoint. I expect clean Python code (PyTorch or TensorFlow), sensible use of the Transformers or similar libraries, and a clear explanation of why each modelling choice was made. Please include evaluation notebooks that show the lift over conventional baselines and provide an API-style script so I can drop the model straight into production once testing is complete. Deliverables • End-to-end training pipeline with documented source code • Trained model weights and reproducible environment files • Evaluation report demonstrating predictive performance on unseen mixed data • Simple inference script or REST endpoint instructions If you have prior experience blending tabular and textual inputs or have leveraged architectures such as TabTransformer, RETAIN-style attention, or multimodal adapters on top of LLM backbones, mention it—those skills will be invaluable here.

Python

Регистрация