This article is based on the research paper 'FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours'
DeepMind launched AlphaFold 2 last year, which made headlines for its incredible accuracy in predicting protein structure. The success of AlphaFold demonstrated that deep neural networks could be used to solve complex and critical structural biology problems.
FastFold is a highly effective protein structure prediction model formulation for training and inference developed by a group of researchers from the National University of Singapore. Although AlphaFold 2 is a game-changer in protein structure prediction, training and inference is still time-consuming and expensive. This is something that worries the study team.
What makes it better? FastFold can halve the overall training time of AlphaFold 2, from 11 days to 67 hours, and provide 9.5x speedups for interpreting long sequences while increasing to 6.02 PetaFLOP and parallel efficiency of 90 .1%. FastFold is credited with saving a significant amount of money while simultaneously improving training and inference.
How did researchers contribute to this model?
The performance of AlphaFold operators is optimized using AlphaFold-specific performance characteristics. When combined with kernel fusion, the kernel implementation achieved tremendous speedups. The research team proposed dynamic axial parallelism. Other model parallelism methods have higher communication overhead. In terms of communication optimization, the suggested duality asynchronous operation uses a dynamic computational graph framework like PyTorch to implement computation-communication overlap.
The AlphaFold model training was successfully scaled up to 512 NVIDIA A100 GPUs, yielding a total of 6.02 PetaFLOPs at the training stage. Our FastFold accelerates large sequences by 7.5 to 9.5 times and enables inference on extremely long sequences at the inference stage. The total duration of the training is reduced from 11 days to 67 hours, which translates into significant savings.
The traditional AlphaFold model is made up of several parts:
- This component encodes the multiple sequence alignment (MSA) and matrix information of the target sequence in the MSA representations, including the co-evolutionary information of all similar sequences and the pair representations carrying the interaction information of the pairs of residues in the sequences.
- Evoformer Blocks: MSA Stack and Pair Stack approaches are used to power representations in MSA and Pair representations. The highly processed modeling information from the generated models is fed into a structure module, which produces a three-dimensional structure prediction for the protein.
- Backbone Evoformer: Communication overhead is reduced by using dynamic axial parallelism, a unique model parallelism method that exceeds existing standard tensor parallelism in scaling efficiency.
For communication optimization, the researchers created a Duality Async operation to implement computation-communication overlap in dynamic computational graph frameworks like PyTorch.
FastFold has also been compared to AlphaFold and OpenFold by researchers. FastFold dramatically reduces the time and cost of training and inference for basic protein structure prediction models, reducing AlphaFold’s overall training time from 11 days to 67 hours, achieving speedups of 7 .5 to 9.5 for long-sequence involvement and scalability to a total of 6.02 PetaFLOPs with 90.1% parallel efficiency.
FastFold’s excellent model parallelism scaling efficiency makes it a viable solution to AlphaFold’s huge training and inference processing overhead.