Commits · 798858f9f1043874a5d11b6609c63a41304a5694 · Shenguo Wang / Flash Attention

04 Sep, 2023 4 commits
- Fix test_baichuan · 798858f9
  Tri Dao authored 1 year ago
  
  798858f9
- [Gen] Add back num_last_tokens in gpt.py · 7b33743a
  Tri Dao authored 1 year ago
  
  7b33743a
- Remove unused sdPsum in dot_do_o function · 5953c4f5
  Tri Dao authored 1 year ago
  
  5953c4f5
- [Rotary] Implement varlen rotary · b28ec236
  Tri Dao authored 1 year ago
  
  b28ec236
03 Sep, 2023 6 commits
- [Rotary] Clean up rotary Triton implementation a bit · 861c8257
  Tri Dao authored 1 year ago
  
  861c8257
- [Rotary] Speed up rotary kernel when interleaved=True · 1c523c1c
  Tri Dao authored 1 year ago
  
  1c523c1c
- Fix splitKV combine function when local LSEs are all -inf · 26d7d92f
  Tri Dao authored 1 year ago
  
  26d7d92f
- [Rotary] Pass max_seqlen from mha.py to rotary during inference · de2949f3
  Tri Dao authored 1 year ago
  
  de2949f3
- [Rotary] Implement rotary in Triton · 942fcbf0
  Tri Dao authored 1 year ago
  
  942fcbf0
- [CI] Add CUDA 12.2 · 08e98471
  Tri Dao authored 1 year ago
  
  08e98471
01 Sep, 2023 2 commits
- Remove commented out code in bwd (#512) · 37e32feb
  Sophia Wisdom authored 1 year ago
```
* Remove lots of comments

* Remove unused traits
```
  37e32feb
- Remove old code in utils.h (#511) · dd8a7549
  Sophia Wisdom authored 1 year ago
  
  dd8a7549
30 Aug, 2023 5 commits
- bump cutlass submodule (#504) · 866a9d33
  Aman Gupta Karmani authored 1 year ago
  
  866a9d33
- Support LLaMa2 and CodeLLaMa (#491) · c9d4a816
  dan_the_3rd authored 1 year ago
```
Co-authored-by: danthe3rd <danthe3rd>
```
  c9d4a816
- Support MQA + MP for decoding (#490) · 011ec323
  dan_the_3rd authored 1 year ago
```
Co-authored-by: danthe3rd <danthe3rd>
```
  011ec323
- [bugfix] handle_x not define when using checkpoint_lvl = 2 (#502) · 0cb595ad
  GAOXinyu authored 1 year ago
```
when using checkpoint_lvl=2, we all_gather_raw(x) without async_op=True.
So we don't need to wait for handle. Just skip.
```
  0cb595ad
- Fix typo with lse_max == -INFINITY · 31920dda
  Tri Dao authored 1 year ago
  
  31920dda
29 Aug, 2023 4 commits
- [Gen] Minor fix to modify logits for top_p · 8a326bbc
  Tri Dao authored 1 year ago
  
  8a326bbc
- fix citation in README (#501) · 1d817a8f
  Jeffrey Quesnelle authored 1 year ago
  
  1d817a8f
- add unpad_input_for_concatenated_sequences (#499) · 8f6f48d8
  Su Zhu authored 1 year ago
```
* add unpad_input_for_concatenated_sequences

* modify docstring
```
  8f6f48d8
- Implement splitKV attention · b1fbbd83
  Tri Dao authored 1 year ago
  
  b1fbbd83
28 Aug, 2023 4 commits
- Use generate_kernels.py script from Driss Guessous · 7a983df7
  Tri Dao authored 1 year ago
  
  7a983df7
- [ft_attention] Fix for seqlen=8136 (#488) · c3f2a632
  dan_the_3rd authored 1 year ago
```
When seqlen=8136, `smem_sz = 48840`, and apparently starting the kernel returns an `invalid argument` CUDA error.

`48840 < 48 * 1024` but apparently it's still above the limit somehow..?
Tested on A100
```
  c3f2a632
- Update Cutlass to v3.2.0 · 757058d4
  Tri Dao authored 1 year ago
  
  757058d4
- [Gen] Clone logits before returning when cg=True · 9f42cb6e
  Tri Dao authored 1 year ago
  
  9f42cb6e
27 Aug, 2023 1 commit
- [GPT] Generalize last_token_only arg to num_last_tokens · f8aea6ea
  Tri Dao authored 1 year ago
  
  f8aea6ea
26 Aug, 2023 6 commits
- [Gen] Fix decode function not using top_p during iterative decoding · 7a3bd55f
  Tri Dao authored 1 year ago
  
  7a3bd55f
- [Gen] Refactor decode function a bit · 847abe65
  Tri Dao authored 1 year ago
  
  847abe65
- [GPT] Test generation when passing in multiple tokens · 371e2065
  Tri Dao authored 1 year ago
  
  371e2065
- [GPT] Move more tests to test_gpt.py · c000c3a2
  Tri Dao authored 1 year ago
  
  c000c3a2
- Change causal for CrossAttention in mha.py to align to bottom right · a2974e85
  Tri Dao authored 1 year ago
  
  a2974e85
- [GPT] Move GPT and OPT generation tests to test_{gpt,opt}.py · 9b713872
  Tri Dao authored 1 year ago
  
  9b713872
25 Aug, 2023 4 commits
- Move pyproject.toml to flash-attn and tests dir to avoid PEP 517 · 73bd3f3b
  Tri Dao authored 1 year ago
  
  73bd3f3b
- add benchmark for xformers fa2 wrapper (#492) · b4b6e903
  Aman Gupta Karmani authored 1 year ago
  
  b4b6e903
- Add newlines to README · 45ba93cd
  Tri Dao authored 1 year ago
  
  45ba93cd
- Change causal mask to be aligned to bottom-right instead of top-left · 9e5e8bc9
  Tri Dao authored 1 year ago
  
  9e5e8bc9
24 Aug, 2023 2 commits
- Support flash attention 2 with causal masking when KV's seq length is longer... · e07aa036
  BoxiangW authored 1 year ago
```
Support flash attention 2 with causal masking when KV's seq length is longer than Q's seq length. (#436)
```
  e07aa036
- add llama support to GPTPreTrainedModel.from_pretrained (#479) · e0b09891
  Aman Gupta Karmani authored 1 year ago
  
  e0b09891
22 Aug, 2023 2 commits
- Bump version to 2.0.9 · 6711b3bc
  Tri Dao authored 1 year ago
  
  6711b3bc
- [GPT] Fix loading weights from HF hub · ef6d8c75
  Tri Dao authored 1 year ago
  
  ef6d8c75

GitLab

Menu