GiantPandaCV，作者每时AI

Fused AllGather_MatMul Triton工程实现

下午10时 2025/01/24 作者 GiantPandaCV

0x0. 前言
yifuwang 在 https://github.com/yifuwang/sym

下午10时 2025/01/21 作者 GiantPandaCV

我的课程笔记，欢迎关注：https://github.com/BBuf/how-to-optim-a

下午10时 2025/01/20 作者 GiantPandaCV

关于MLA，我想先简单记录下我了解它的心路历程：
我第一次了解MLA，是在它刚出来的档口
。在我读过

下午2时 2025/01/20 作者 GiantPandaCV

博客来源：https://pytorch.org/blog/accelerating-gemms-t

下午10时 2025/01/16 作者 GiantPandaCV

上海人工智能实验室对书生大模型进行了升级，推出了InternLM3.0版本，通过精炼数据框架提升了数据效率和思维密度，节约了75%以上训练成本，并实现了常规对话与深度思考能力融合。

下午2时 2025/01/15 作者 GiantPandaCV

博客来源：https://pytorch.org/blog/cutlass-ping-pong-ge

下午10时 2025/01/10 作者 GiantPandaCV

SmartFlowAI
点击上方
蓝字
关注我们
作者：企鹅火烈鸟🦩
全文约 2400 字，预计阅读

下午10时 2025/01/08 作者 GiantPandaCV

PyTorch通过FSDP2、DTensor和torchao实现Float8训练提升吞吐量50%，展示了Float8在不同模型规模上的有效性，并进行了模型质量和评估基准验证。

下午10时 2025/01/06 作者 GiantPandaCV

博客来源：https://pytorch.org/blog/llama-into-torchtune