- 26 May, 2024 7 commits
- 23 May, 2024 1 commit
-
-
lancerts authored
-
- 06 May, 2024 1 commit
-
-
Wei Ji authored
Set `packaging` and `ninja` as build time dependencies rather than runtime dependencies.
-
- 26 Apr, 2024 3 commits
- 08 Apr, 2024 4 commits
- 05 Apr, 2024 1 commit
-
-
Ivan Komarov authored
All integer parameters are specialized by default, so the two parameters removed in this commit could lead to kernel re-compilation, even if they were completely unused.
-
- 28 Mar, 2024 2 commits
-
-
Driss Guessous authored
-
ljss authored
-
- 19 Mar, 2024 1 commit
-
-
Tri Dao authored
-
- 15 Mar, 2024 3 commits
-
-
Markus Krimmel authored
-
Driss Guessous authored
-
Grigory Sizov authored
* Enable paged attention in varlen forward * Format + fix padding
-
- 14 Mar, 2024 2 commits
-
-
Arvind Sundararajan authored
-
Chirag Jain authored
-
- 02 Mar, 2024 2 commits
- 21 Feb, 2024 4 commits
- 20 Feb, 2024 1 commit
-
-
Tri Dao authored
-
- 18 Feb, 2024 1 commit
-
-
Qubitium authored
Optimize compile to 1: avoid oom 2: minimize swap usage 3: avoid threads starvation when letting ninja decide how many workers to spawn or manual MAX_JOBS "guesses". Logic is to take the min value of MAX_JOBS auto-calculated by two metrics: 1: cpu cores 2: free memory. This should allow flash-attn to compile close to the most efficient manner under any consumer/server env. (#832)
-
- 10 Feb, 2024 4 commits
-
-
Tri Dao authored
-
Tri Dao authored
-
Tri Dao authored
-
Brian Hirsh authored
-
- 08 Feb, 2024 1 commit
-
-
Grigory Sizov authored
-
- 31 Jan, 2024 2 commits