• Qubitium's avatar
    Optimize compile to 1: avoid oom 2: minimize swap usage 3: avoid threads... · f45bbb4c
    Qubitium authored
    Optimize compile to 1: avoid oom 2: minimize swap usage 3: avoid threads starvation when letting ninja decide how many workers to spawn or manual MAX_JOBS "guesses". Logic is to take the min value of MAX_JOBS auto-calculated by two metrics: 1: cpu cores 2: free memory. This should allow flash-attn to compile close to the most efficient manner under any consumer/server env. (#832)
    
    f45bbb4c
setup.py 14.6 KB