Arxiv Papers

The paper introduces the Phased Consistency Model (PCM) to improve text-conditioned image generation in the latent space, outperforming existing models across multiple generation steps. https://arxiv.org/abs//2405.18407 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers --- Support this podcast: https://podcasters.spotify.com/pod/show/arxiv-papers/support

Duration:00:11:54

Phased Consistency Model

Duration:00:12:00

[QA] Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations

Understanding model scaling is crucial for designing effective training setups and architectures. This paper challenges the complexity of cosine schedules, proposing a simpler alternative with predictable scaling behavior and improved performance. https://arxiv.org/abs//2405.18392 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers --- Support this podcast: https://podcasters.spotify.com/pod/show/arxiv-papers/support

Duration:00:08:30

Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations

Duration:00:11:06

[QA] On the Origin of Llamas: Model Tree Heritage Recovery

The paper introduces Model Tree Heritage Recovery (MoTHer Recovery) to decode model relationships using weights, reconstructing model hierarchies like Llama 2 and Stable Diffusion. https://arxiv.org/abs//2405.18432 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers --- Support this podcast: https://podcasters.spotify.com/pod/show/arxiv-papers/support

Duration:00:07:49

On the Origin of Llamas: Model Tree Heritage Recovery

Duration:00:15:01

[QA] Transformers Can Do Arithmetic with the Right Embeddings

Adding position embeddings to digits in transformers improves performance on arithmetic tasks, enabling solving larger problems and enhancing multi-step reasoning abilities like sorting and multiplication. https://arxiv.org/abs//2405.17399 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers --- Support this podcast: https://podcasters.spotify.com/pod/show/arxiv-papers/support

Duration:00:07:51

Transformers Can Do Arithmetic with the Right Embeddings

Duration:00:12:45

[QA] EM Distillation for One-step Diffusion Models

EM Distillation (EMD) proposes a maximum likelihood-based approach to distill diffusion models into efficient one-step generators, outperforming existing methods in FID scores on ImageNet datasets. https://arxiv.org/abs//2405.16852 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers --- Support this podcast: https://podcasters.spotify.com/pod/show/arxiv-papers/support

Duration:00:08:10

EM Distillation for One-step Diffusion Models

Duration:00:16:26

[QA] Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization

The paper explores if transformers can learn implicit reasoning through grokking, showing varying generalization levels across reasoning types and suggesting improvements to transformer architecture for better reasoning. https://arxiv.org/abs//2405.15071 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers --- Support this podcast: https://podcasters.spotify.com/pod/show/arxiv-papers/support

Duration:00:09:50

Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization

Duration:00:18:56

[QA] Are Long-LLMs A Necessity For Long-Context Tasks?

Proposed LC-Boost framework enables short-LLMs to effectively handle long-context tasks by adaptively accessing and utilizing context, achieving improved performance with less resource consumption. https://arxiv.org/abs//2405.15318 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers --- Support this podcast: https://podcasters.spotify.com/pod/show/arxiv-papers/support

Duration:00:08:24

Are Long-LLMs A Necessity For Long-Context Tasks?

Duration:00:16:19

[QA] AGILE: A Novel Framework of LLM Agents

AGILE framework enhances conversational tasks with LLM agents, incorporating memory, tools, expert interactions, and reinforcement learning. Outperforms GPT-4 in question answering tasks. https://arxiv.org/abs//2405.14751 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers --- Support this podcast: https://podcasters.spotify.com/pod/show/arxiv-papers/support

Duration:00:08:23

AGILE: A Novel Framework of LLM Agents

Duration:00:17:47

[QA] Thermodynamic Natural Gradient Descent

Natural gradient descent (NGD) can match first-order method's computational complexity with appropriate hardware, enabling a new hybrid digital-analog algorithm for efficient large-scale training of neural networks. https://arxiv.org/abs//2405.13817 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers --- Support this podcast: https://podcasters.spotify.com/pod/show/arxiv-papers/support

Duration:00:08:37

Thermodynamic Natural Gradient Descent

Duration:00:15:42

[QA] DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data

Lean 4 proof data generated from math competition problems improves theorem proving in large language models, outperforming GPT-4 and enhancing LLM capabilities. https://arxiv.org/abs//2405.14333 YouTube: https://www.youtube.com/@ArxivPapers TikTok: https://www.tiktok.com/@arxiv_papers Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016 Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers --- Support this podcast: https://podcasters.spotify.com/pod/show/arxiv-papers/support

Duration:00:08:21

DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data