GPT-Fast — невероятно быстрый вывод с помощью PyTorch (с Хорасом Хе) gpt


Смотреть видео GPT-Fast — blazingly fast inference with PyTorch (w/ Horace He) нейросеть онлайн бесплатно GPT-Fast — blazingly fast inference with PyTorch (w/ Horace He) 2024 в качестве онлайн


Опубликовано

в

от

Метки:

Комментарии

10 комментариев на ««GPT-Fast — невероятно быстрый вывод с помощью PyTorch (с Хорасом Хе) gpt»»

  1. Аватар пользователя @orrimoch5226

    Wow! It was very educational and practical!
    I liked the graphics in the presentation!
    Great job by both of you!
    Thanks!

  2. Аватар пользователя @xl0xl0xl0

    One thing that was not super clear to me. Are we loading the next weight matrix (assuming there is enough SRAM), as the previous matmul+activation is being computed?

  3. Аватар пользователя @xl0xl0xl0

    Wow, this presentation was excellent. Straight to the point. No over-complicating, no over-simplifying, no trying to sound smart by obscuring simple things. Thank you Horace!

  4. Аватар пользователя @XartakoNP

    I didn't understand one of the points made. In a couple of occasions Horace mentions that we are loading all the weights (into the registers I assume) with every token — that's also what the diagram shows at https://youtu.be/18YupYsH5vY?t=1972 . Is that what's happening? Can the registers load all the model weights at once? If that were the case why do you need to load them every time instead of leaving them untouched. I hope that's a not too stupid of a question.

  5. Аватар пользователя @SinanAkkoyun

    How does PPL look at int4 quants? Also, given GPTQ, how high is the tps with gpt-fast?

  6. Аватар пользователя @tljstewart

    awesome talks, can Triton target TPUs?

  7. Аватар пользователя @mufgideon

    Is there any discord for this channel community ?

  8. Аватар пользователя @xmorse

    Your questions about why fast-gpt is faster than the cuda version: kernel fusion, merging kernels into one is faster than multiple hand written ones

  9. Аватар пользователя @kimchi_taco

    speculative decoding is major thing, right? If so, not very fair comparison…

  10. Аватар пользователя @TheAIEpiphany

    Horace He joined us to walk us through what can one do with native PyTorch when it comes to accelerating inference!

Добавить комментарий

Ваш адрес email не будет опубликован. Обязательные поля помечены *