Commits · 6711b3bc40073e7ced2a4c7d8266feec7e6e137f · Shenguo Wang / Flash Attention

22 Aug, 2023 2 commits
- Bump version to 2.0.9 · 6711b3bc
  Tri Dao authored 1 year ago
  
  6711b3bc
- [GPT] Fix loading weights from HF hub · ef6d8c75
  Tri Dao authored 1 year ago
  
  ef6d8c75
21 Aug, 2023 1 commit
- FEAT: add codes which supporting for baichuan-inc/Baichuan-7B (#425) · a8c35b4f
  GAOXinyu authored 1 year ago
  
  a8c35b4f
20 Aug, 2023 2 commits
- handle uneven heads across ranks when combining state_dicts; resolves #467 (#468) · 25d6b1db
  Xuechen Li authored 1 year ago
```
* q

* add comment.
```
  25d6b1db
- Import torch before flash_attn_2_cuda · d431f167
  Tri Dao authored 1 year ago
  
  d431f167
19 Aug, 2023 2 commits
- Run isort and black on test files · 0e8c46ae
  Tri Dao authored 1 year ago
  
  0e8c46ae
- map custom model state_dict back to huggingface format (#465) · 7fcd3e6a
  Xuechen Li authored 1 year ago
```
* fix name.

* set inv function.

* add map back function.

* handle gqa.

* add type annotation to avoid confusion.

* fix docstr.

* test inverse remap logic.
```
  7fcd3e6a
18 Aug, 2023 5 commits
- Run isort and black on python files · f1a73d07
  Tri Dao authored 1 year ago
  
  f1a73d07
- Don't need to set TORCH_CUDA_ARCH_LIST in setup.py · cbb4cf5f
  Tri Dao authored 1 year ago
  
  cbb4cf5f
- support when num_heads is not divisible by world_size; resolves #459 (#461) · bb4cded1
  Xuechen Li authored 1 year ago
```
* uneql rank.

* trim.

* enable passing in number of heads for each rank.

* simplify.

* simplify.

* cleanup.

* fix col parallel.

* fix bug with row parallel.

* fit out proj.

* refac.

* fix sharding logic.

* refac sharding.

* refac.

* support multiple of.

* make fn reuseable.

* fix bug in dimensions.

* scaffold.

* test uneven heads.

* fix test by adding barrier.

* refac.

* reuse code.

* clean up.
```
  bb4cded1
- [ViT] Run black on vit.py · ada4710d
  Tri Dao authored 1 year ago
  
  ada4710d
- [ViT] Minor fix so it runs · a81900d4
  Tri Dao authored 1 year ago
  
  a81900d4
17 Aug, 2023 4 commits
- [GPT] Run black on gpt.py · 4b661a56
  Tri Dao authored 1 year ago
  
  4b661a56
- [MHA] Run black on mha.py · bec5b3d3
  Tri Dao authored 1 year ago
  
  bec5b3d3
- [FusedDense] Allow Row/ColumnParallelLinear to have uneven split · cb0daccc
  Tri Dao authored 1 year ago
  
  cb0daccc
- [FusedDense] Run black on fused_dense.py · bcfa7c97
  Tri Dao authored 1 year ago
  
  bcfa7c97
16 Aug, 2023 2 commits
- Bump to v2.0.8 · 2286d7ce
  Tri Dao authored 1 year ago
  
  2286d7ce
- Fix Bwd NaN for varlen when seqlen_q >> seqlen_k and causal · c65b5106
  Tri Dao authored 1 year ago
  
  c65b5106
15 Aug, 2023 1 commit

enable loading hf llama checkpoints for training (#446) · 0f7853c6

Xuechen Li authored 1 year ago

* prelim.

* add hf convertion fn.

* mlp.

* change name.

* fix bug.

* inverse permute.

* change comment.

* revert style changes.

* fix.

* add doc.

* revert.

* enable load safe.

* fix safe load.

* fix import.

* fix typing-related lints.

* fix ckpt loading logic.

* make single gpu work.

* test with parallel.

* ckpt format.

* enable pretrained state dict.

* remove unused imports.

* remove unused.

* mark idea related.

0f7853c6

14 Aug, 2023 4 commits
- Bump to v2.0.7 · c60851a8
  Tri Dao authored 1 year ago
  
  c60851a8
- fix binary wheel installation when nvcc is not available (#448) · aab603af
  Aman Gupta Karmani authored 1 year ago
  
  aab603af
- [CI] Fix MATRIX_CUDA_VERSION check · f8dccfc9
  Tri Dao authored 1 year ago
  
  f8dccfc9
- Use single thread compilation for cuda12.1, torch2.1 to avoid OOM CI · 9c531bdc
  Tri Dao authored 1 year ago
  
  9c531bdc
13 Aug, 2023 7 commits

Bump to v2.0.6 · 67ae6fd7
Tri Dao authored 1 year ago

67ae6fd7
Fix wheel building · 2ddeaa40
Tri Dao authored 1 year ago

2ddeaa40

Merge branch 'piercefreeman-feature/demo-wheels' · d8ec6a2f

Tri Dao authored 1 year ago

* piercefreeman-feature/demo-wheels: (25 commits)
  Install standard non-wheel package
  Remove release creation
  Build wheel on each push
  Isolate 2.0.0 & cuda12
  Clean setup.py imports
  Remove builder project
  Bump version
  Add notes to github action workflow
  Add torch dependency to final build
  Exclude cuda erroring builds
  Exclude additional disallowed matrix params
  Full version matrix
  Add CUDA 11.7
  Release is actually unsupported
  echo OS version
  Temp disable deploy
  OS version build numbers
  Restore full build matrix
  Refactor and clean of setup.py
  Strip cuda name from torch version
  ...

d8ec6a2f

Merge branch 'feature/demo-wheels' of... · 3c458cff

Tri Dao authored 1 year ago

Merge branch 'feature/demo-wheels' of https://github.com/piercefreeman/flash-attention into piercefreeman-feature/demo-wheels

* 'feature/demo-wheels' of https://github.com/piercefreeman/flash-attention: (25 commits)
  Install standard non-wheel package
  Remove release creation
  Build wheel on each push
  Isolate 2.0.0 & cuda12
  Clean setup.py imports
  Remove builder project
  Bump version
  Add notes to github action workflow
  Add torch dependency to final build
  Exclude cuda erroring builds
  Exclude additional disallowed matrix params
  Full version matrix
  Add CUDA 11.7
  Release is actually unsupported
  echo OS version
  Temp disable deploy
  OS version build numbers
  Restore full build matrix
  Refactor and clean of setup.py
  Strip cuda name from torch version
  ...

3c458cff

Prepare for Cutlass 3.2 · dbd79237
Tri Dao authored 1 year ago

dbd79237
Bump to v2.0.5 · c5e87b11
Tri Dao authored 1 year ago

c5e87b11
Update to Cutlass 3.1 · 3524e13c
Tri Dao authored 1 year ago

3524e13c

11 Aug, 2023 4 commits
- Install standard non-wheel package · 6ef3bd80
  Pierce Freeman authored 1 year ago
  
  6ef3bd80
- Remove release creation · ecc65354
  Pierce Freeman authored 1 year ago
  
  ecc65354
- Build wheel on each push · bc6d4992
  Pierce Freeman authored 1 year ago
  
  bc6d4992
- Isolate 2.0.0 & cuda12 · 565615c6
  Pierce Freeman authored 1 year ago
  
  565615c6
10 Aug, 2023 1 commit
- [MLP] Change the check for out_features being None · 364a5b4a
  Tri Dao authored 1 year ago
  
  364a5b4a
01 Aug, 2023 5 commits
- Bump to v2.0.4 · d30f2e1c
  Tri Dao authored 1 year ago
  
  d30f2e1c
- Fix race condition in bwd (overwriting sK) · 1c41d2b0
  Tri Dao authored 1 year ago
  
  1c41d2b0
- Bump to v2.0.3 · a4e5d1ed
  Tri Dao authored 1 year ago
  
  a4e5d1ed
- [Docs] Fix docstring about Q nheads being divisible by KV nheads · 8f4cd4c1
  Tri Dao authored 1 year ago
  
  8f4cd4c1
- Fix masking of bwd when seqlen is not divisible by 128 · a4f148b6
  Tri Dao authored 1 year ago
  
  a4f148b6

GitLab

Menu