MSc Thesis: Scaling foundation models for time series analysis
About Us
In our previous work (Turgut et al., 2024), we developed a foundation model for time series analysis capable of handling diverse tasks, ranging from weather forecasting to disease classification. Our approach has shown strong generalisation capabilities across multiple domains, and we are now looking to scale and optimise our model to unlock the next generation of time series models.
Objective of the Thesis
The goal of this master thesis is to successfully scale our time series foundation model, building upon state-of-the-art practices from large language models (LLMs) and recent research in model optimisation. The project involves systematic experimentation and methodology development across the following three key areas.
1. Data Curation
Data Mixture & Scaling: Investigate how to mix pre-training datasets for optimal generalisation. (Llama Team, 2024)
Synthetic Data: Explore the role of synthetic data in improving model scaling and performance. (Hemmat et al., 2025).
Data Quality & Sample Selection: Implement strategies similar to Gunesakar et al. (2023) and Llama Team (2024), which introduced annealing phases, where high-quality samples boost model performance during the final stage of pre-training.
2. Training Recipe
Context Parallelism for Long Sequences: Explore attention mechanisms for handling extended time series contexts efficiently. (Llama Team, 2024)
Learning Rate Scheduling: Investigate the role of novel schedulers such as warmup-stable-decay (WSD). Determine the optimal ratio for warmup and annealing. Explore the role of peak and minimum learning rates. (Hu et al., 2024)
Hyperparameter Search: Use tensor programs to find the optimal hyperparameter settings. (Yang et al., 2022)
Scaling Laws: Investigate the scaling laws of the time series model. (Kaplan, 2022)
Inference Optimisation: Investigate techniques like KV caching for faster inference on long sequences. (Pope et al., 2023)
3. Post-Training Optimisation
Investigate the effect of weight averaging across top-k models instead of relying on a single best model after pre-training. (Llama Team, 2024)
What we offer
The opportunity to contribute to an ongoing project with the aim of publishing at top-tier conferences (e.g. NeurIPS, ICLR, ICML).
Potential transition into a PhD project at TUM.
Access to state-of-the art hardware and close supervision from an interdisciplinary network of time series experts.
Expected Contributions
A set of scaling laws for foundation models on time series, with special emphasis on data quality, synthetic augmentation, and training dynamics.
A comprehensive training recipe combining best practices from cutting-edge LLM training for time series models.
Open-source codebase and documentation to enable reproducibility and future research.
Required Skills
Strong background in deep learning, transformer architectures, and optimisation methods.
Experience with PyTorch, Hugging Face, and large-scale training frameworks including multi-GPU development.
Familiarity with literature on scaling laws, data-centric AI, and learning rate scheduling is a plus.
Strong interest in teamwork and inter-disciplinary research.
How to Apply
Please send your CV and transcript using the subject line: “MSc Thesis: Scaling foundation models for time series analysis” to oezguen.turgut@tum.de. Include brief summaries of your previous deep learning projects (mention your specific contributions and the frameworks used). If available, provide links to relevant GitHub repositories.
References
Hemmat et al., 2025 (https://arxiv.org/pdf/2502.15588)
Gunasekar et al., 2023 (https://arxiv.org/pdf/2306.11644)
Hu et al., 2024 (https://arxiv.org/pdf/2404.06395)
Kaplan et al., 2020 (https://arxiv.org/pdf/2001.08361)
Llama Team, 2024 (https://arxiv.org/pdf/2407.21783)
Pope et al, 2023 (https://proceedings.mlsys.org/paper_files/paper/2023/file/c4be71ab8d24cdfb45e3d06dbfca2780-Paper-mlsys2023.pdf)
Turgut et al., 2024 (https://arxiv.org/pdf/2410.07299)
Yang et al., 2022 (https://arxiv.org/pdf/2203.03466)