Personalized Training for SOTA AI Math
Personalized Training for SOTA AI Math
Couldn't load pickup availability
Are you a mid or senior-level AI/ML professional who appreciate the power of math?
I’m Prof. Tom Yeh. I offer a personalized 4-hour one-on-one training program designed to quickly get you up to speed on the math behind 15 state-of-the-art (SOTA) AI models, algorithms, and architectures, all by hand. ✍️
This program is for you if
- You have at least 3 years of experiences building AI products for real-world applications.
- You are a technical leader in your AI/ML team.
- You are already using many SOTA models, algorithms, and architectures in your products, but mostly as a blackbox.
- You value the importance of understanding the math inside the blackbox.
- But you are just too busy to find the 150 hours needed to deep dive into 15 papers and the hundreds of equations in those papers.
Instead of 150 hours, you can spend just 4 hours with us as follows:
1-on-1 consultation with me (30 minutes)
- I want to get to know you and understand your goals.
- Together we choose 15 SOTA exercises that best match your needs and interests.
Personalized training sessions (3 hours)
- My senior PhD students will work with you one-on-one to go through the 15 SOTA exercises of your choice.
- They will work with your busy calendar.
- You can either schedule three 1 hour sessions or two 1.5 hour sessions.
1-on-1 exit meeting with me (30 minutes)
- I want to know how your training sessions went.
- I want to address all your remaining questions.
Here is a sample of the SOTA exercises you can choose from:
Attention:
- Multi-Head Attention (MHA)
- Grouped Query Attention (GQA)
- Infinite Attention
- Multi-Head Latent Attention (MLA)
- Native Sparse Attention (MSA)
Feed-Forward Network:
- Positional-Wise Feed-Forward Network
- Mixture of Experts
- Sparse Mixture of Experts
GPU:
- Tiled Matrix Multiplication
- Systolic Array
- Flash Attention
Techniques:
- BatchNorm vs LayerNorm vs RMSNorm
- Low-Rank Adaptation (LoRA)
- Rotary Positional Embedding (RoPE)
- BitNet
- Sparse Auto-Encoder (SAE)
Alternatives to Transformers:
- Kolmogorov-Arnold Networks (KAN)
- LSTM vs xLSTM
- Mamba
- Receptance Weighted Key Value (RWKV)
As a professor, I spent hundreds of hours reading papers about SOTA models, algorithms and architectures. I spent even more time designing original materials to reimagine abstract math equations as concrete by-hand ✍️ exercises. Using my exercises, I believe our program can help you optimize the use of your time to level up!
Share
