Based in Malmö, Sweden, Mapillary provides a service for sharing geotagged photos and sets out to represent the whole world—not only streets—with photos using crowdsourcing. Its research group, Mapillary Research, has a mission of "pushing the boundaries of visual computing through machine intelligence.” Research done by the group targets fundamental challenges in computer vision and machine learning. One such technology is deep learning, and with this newly-developed technique, the group can improve over the winning semantic segmentation method of this year’sLarge-Scale Scene Understanding Workshop on the challenging Mapillary Vistas Dataset, setting a new state of the art, according to the company.
At Mapillary, the company uses computer vision for extracting map data from street-level images. The company uses a device-agnostic platform, but the extracted map data, from both the number of images and the image resolution, can get massively large. Semantic segmentation helps us to understand images on a pixel level, forming the basis of true scene understanding; but in doing this, two major challenges are presented. First, recognition models must be trained that can absorb all the relevant information from training data. Second, once these models are acquired, they must be applied to new and previously unseen images, so they can recognize all objects in which the user is interested.
To address the first challenge, Mapillary developed a novel, memory-saving approach to training recognition models. In a technical paper, the group presents present its technique, In-Place Activated Batch Normalization (INPLACE-ABN), which substitutes the conventionally used succession of BatchNorm + Activation layers with a single plugin layer, hence avoiding invasive framework surgery while providing straightforward applicability for existing deep learning frameworks, according to a technical paper on the technique.
To provide context, Mapillary stated that its previous models were trained using Caffe, where they could only use a single crop of pixel size 480 x 480 per GPU when training on Mapillary Vistas. Now, the group has migrated to PyTorch, which together with our memory-saving idea drastically increases data throughput to handling three crops per GPU, each of size 776 x 776. This, according to the company, means they can pack about eight times more data on GPUs during training than they could before.
Mapillary’s proposed resolution, according to the company, allows them to recover necessary quantities by re-computing them from saved intermediate results in a computationally very efficient way.
"In essence, we can save ~50% of GPU memory in exchange for minor computational overhead of only 0.8–2.0%," wrote Peter Kontschieder for Mapillary.