IBM Research says that its software does deep learning training fully synchronously with very low communication overhead, and as a result, when it scaled to a large cluster (ImageNet-22K dataset) with hundreds of NVIDIA GPUs, it yielded a record image recognition accuracy of 33.8% of 7.5 million images from the dataset, vs. the previous best published result of 29.8% by Microsoft. This distributed deep learning (DDL) approach enabled them to train a ResNet-101 neural network model in just 7 hours, by leveraging the power of tens of servers, equipped with hundreds of NVIDIA GPUs. This took Microsoft 10 days to train the same model.
To achieve this, IBM Research created the DDL code and algorithms to overcome issues inherent to scaling these otherwise powerful deep learning frameworks, according to the company.
These results are on a benchmark designed to test deep learning algorithms and systems to the extreme, so while 33.8% might not sound like a lot, it’s a result that is noticeably higher than prior publications,” wrote Hillery Hunter, IBM Fellow (pictured). “Given any random image, this trained AI model will gives its top choice object (Top-1 accuracy), amongst 22,000 options, with an accuracy of 33.8%. Our technology will enable other AI models trained for specific tasks, such as detecting cancer cells in medical images, to be much more accurate and trained in hours, re-trained in seconds.”
With this approach, IBM Research also beat Facebook’s previously impressive time. Previously, the best scaling for 256 GPUs was from Facebook’s AI Research (FAIR) team, which used a smaller deep learning model, ResNet-50, on a smaller dataset, ImageNet-K, which has about 1.3 million images. With a large minibatch size of 8192, using 256 GPUs, Facebook researchers trained the ResNet-50 model in one hour while maintaining the same level of accuracy as a 256 minibatch baseline. (Around 89%). This was accomplished by using a linear scaling rule for adjusting learning rates as a function of minibatch size and developing a new warmup scheme that overcomes optimization challenges early in training by gradually ramping up the learning rate from a small to large value and the batch size over time to help maintain accuracy.
For a ResNet-50 model and the same dataset as Facebook, the IBM Research DDL software achieved an efficiency of 95% using Caffe, running on a cluster of 64 “Minsky” Power S822LC systems, with four NVIDIA P100 GPUs each. IBM Research also did this in 50 minutes, compared to Facebook’s previous record of one hour. For training the larger ResNet-101 model on 7.5 million images from the ImageNet-22K dataset, with an image batch of 5120, IBM Research achieved a scaling efficiency of 88%.
For developers and data scientists, the IBM Research DDL software presents an API that each of the deep learning frameworks can hook into, to scale to multiple servers, according to the company. IBM Research—which has released a technical preview in version 4 of the PowerAI enterprise deep learning software offering—expects that by making this DDL features available publicly, it will see more higher-accuracy runs, as others leverage the power of clusters for AI model training.
View theIBM Research blog post.