ML Modeling Fundamentals: Overview and Study Plan
ML Modeling Fundamentals: Overview and Study Plan
Section titled “ML Modeling Fundamentals: Overview and Study Plan”This section is built for a hybrid ML Modeling & Fundamentals interview: explain theory from first principles, reason about modeling choices, and implement small NumPy/PyTorch snippets.
The target domain is autonomous driving simulation, but the ideas also transfer to robotics, ranking, perception, prediction, and model debugging.
What you should be able to do
Section titled “What you should be able to do”After this section, you should be able to:
- Choose the right metric instead of optimizing blindly.
- Explain precision, recall, F1, PR-AUC, calibration, ADE/FDE, and ranking metrics.
- Explain why loss functions create incentives.
- Use weighted cross entropy, focal loss, and multi-modal trajectory losses.
- Explain diffusion from first principles: forward noise, reverse denoising, epsilon prediction.
- Explain why diffusion helps generate diverse future scenes.
- Evaluate simulation outputs for realism, safety, map validity, and physical feasibility.
- Debug a model that is not learning using a systematic checklist.
- Write small PyTorch snippets in CoderPad.
Recommended study order
Section titled “Recommended study order”1. Start with metrics
Section titled “1. Start with metrics”Read: ML Modeling Metrics
Why first: metrics tell you what “good” means. Without them, losses and models are easy to misuse.
Focus on:
- confusion matrix,
- precision vs recall,
- thresholding,
- PR-AUC for rare events,
- calibration,
- ADE/FDE limitations.
Notebook: ml_modeling_metrics_colab.ipynb
2. Learn custom losses
Section titled “2. Learn custom losses”Read: Custom Losses
Why next: losses are how you make the model care about the right mistakes.
Focus on:
- weighted cross entropy,
- focal loss,
- why hard examples are not always good examples,
- why MSE fails for multi-modal futures,
- multi-modal trajectory loss.
Notebook: custom_losses_colab.ipynb
3. Learn diffusion basics
Section titled “3. Learn diffusion basics”Read: Diffusion Basics
Why next: diffusion is easier once you understand loss design and multi-modal prediction.
Focus on:
- forward noising,
- reverse denoising,
- epsilon prediction,
- timestep conditioning,
- why sampling gives multiple plausible outputs.
Notebook: diffusion_basics_colab.ipynb
4. Apply diffusion to simulation
Section titled “4. Apply diffusion to simulation”Read: Diffusion for Simulation
Why next: this connects the math to autonomous driving and robotics.
Focus on:
- conditioning on map/history/lights/route/intent,
- controllability,
- rare scenario generation,
- realism vs safety tradeoffs.
Notebook: diffusion_for_simulation_colab.ipynb
5. Evaluate generated scenarios
Section titled “5. Evaluate generated scenarios”Read: Simulation Metrics
Why next: generated scenarios need different evaluation than ordinary prediction.
Focus on:
- collision,
- offroad,
- wrong-way,
- kinematic infeasibility,
- log divergence,
- realism vs safety.
Notebook: simulation_metrics_colab.ipynb
6. Finish with debugging
Section titled “6. Finish with debugging”Read: Debugging a Model That Is Not Learning
Why last: debugging requires knowing what the model, loss, and metrics are supposed to do.
Focus on:
- overfit-one-batch test,
- data skew,
- bad labels,
- normalization bugs,
- learning rate problems,
- leakage,
- gradient inspection.
Notebook: debugging_model_not_learning_colab.ipynb
One-week study plan
Section titled “One-week study plan”Day 1: Metrics
Section titled “Day 1: Metrics”Goal: never say “accuracy” blindly.
Tasks:
- Read the metrics article.
- Run the notebook.
- Explain precision and recall out loud using pedestrian crossing.
- Practice threshold tradeoffs.
Day 2: Custom losses
Section titled “Day 2: Custom losses”Goal: understand how losses change model incentives.
Tasks:
- Read the custom losses article.
- Run the notebook.
- Implement focal loss from memory.
- Explain why MSE fails for multi-modal futures.
Day 3: Diffusion basics
Section titled “Day 3: Diffusion basics”Goal: explain DDPM without paper notation overload.
Tasks:
- Read diffusion basics.
- Run the noising notebook.
- Write the equation for from memory.
- Explain why predicting epsilon is convenient.
Day 4: Diffusion for simulation
Section titled “Day 4: Diffusion for simulation”Goal: connect diffusion to future-scene generation.
Tasks:
- Read diffusion for simulation.
- Run the conditional trajectory notebook.
- Explain map/history/light/route conditioning.
- Discuss rare scenario generation tradeoffs.
Day 5: Simulation metrics
Section titled “Day 5: Simulation metrics”Goal: evaluate generated driving scenarios rigorously.
Tasks:
- Read simulation metrics.
- Run the metrics notebook.
- Implement ADE/FDE and kinematic checks.
- Explain realism vs safety.
Day 6: Debugging
Section titled “Day 6: Debugging”Goal: have a reliable debugging playbook.
Tasks:
- Read debugging article.
- Run the overfit-one-batch notebook.
- Practice diagnosing train/eval loss patterns.
- List autonomous-driving-specific data bugs.
Day 7: Mock interview loop
Section titled “Day 7: Mock interview loop”Goal: combine theory, modeling judgment, and code.
Practice prompts:
- “Why is accuracy bad for rare event detection?”
- “Implement focal loss.”
- “Why does MSE fail for multi-modal trajectories?”
- “Explain diffusion forward and reverse processes.”
- “How would you condition a diffusion model on map and traffic lights?”
- “How do you evaluate generated scenarios?”
- “Your model is not learning. Walk me through your debugging process.”
60-minute cram plan
Section titled “60-minute cram plan”If you only have one hour:
0-10 min Metrics: precision, recall, PR-AUC, calibration10-20 min Losses: weighted CE, focal loss20-30 min Multi-modal trajectory loss and MSE failure30-40 min Diffusion basics: x_t equation and epsilon prediction40-50 min Simulation: conditioning and scenario metrics50-60 min Debugging: overfit-one-batch and gradient inspectionCoderPad checklist
Section titled “CoderPad checklist”Be ready to implement:
- confusion matrix metrics,
- precision/recall/F1,
- focal loss,
- weighted cross entropy call,
- multi-modal trajectory loss,
- ADE/FDE,
- kinematic speed/acceleration checks,
- overfit-one-batch training loop,
- gradient norm helper,
- diffusion
q_sample.
Interview stance
Section titled “Interview stance”A strong answer usually has this shape:
- Define the task and the cost of mistakes.
- Choose metrics that reflect that cost.
- Choose a loss that optimizes toward the metric.
- Discuss tradeoffs and failure modes.
- Give a small implementation.
- Say how you would debug it.
That structure works for most ML modeling questions in autonomous driving simulation.