November 2017 snapshots: Thermal imaging on MTV, deep learning research, 360° vision system in the NFL, Ford and Domino's partner on self-driving pizza delivery cars.

In the November 2017 snapshots, learn about thermal cameras that were used at the MTV Video Music Awards, deep learning software from IBM Research that achieved record performance, 360° vision systems that are now deployed in multiple NFL stadiums, and Ford and Domino's partnership on self-driving pizza delivery car testing.

1711vsd Snap P04

Self-driving pizza delivery cars being tested by Ford and Domino's

Domino's Pizza (Ann Arbor, MI, USA; www.dominos.com) and Ford Motor Company (Dearborn, MI, USA; www.ford.com) have announced collaboration to understand the role that self-driving vehicles can play in the delivery of pizza.

Research from the two companies will examine customer reactions to interacting with a self-driving vehicle as part of the delivery experience. This research, according to Ford, will help both companies understand customers' perspectives around the future of food delivery with self-driving vehicles. Customers in the Ann Arbor, MI, USA area who agree to participate will be able to track the delivery vehicle through GPS via the Domino's Tracker app, and will receive text messages as the self-driving vehicle approaches, which will guide them on how to retrieve their pizza using a unique code to unlock the Domino's Heatwave Compartment inside the vehicle.

1711vsd Snap P04

While the testing is set to take place soon, the actual delivery vehicles will not be "driverless." Instead, the car will be manually driven by a Ford safety engineer and staffed with researchers, who will be gathering data primarily on the last 50 feet of the delivery process.

"We're interested to learn what people think about this type of delivery," said Russell Weiner, president of Domino's USA. "The majority of our questions are about the last 50 feet of the delivery experience. For instance, how will customers react to coming outside to get their food? We need to make sure the interface is clear and simple. We need to understand if a customer's experience is different if the car is parked in the driveway versus next to the curb. All of our testing research is focused on our goal to someday make deliveries with self-driving vehicles as seamless and customer-friendly as possible."

Ford notes in its press release that as the company builds out its business enabled by self-driving vehicles, research such as this will be key to ensuring that the technology is applied in ways that enhance the customer experience. With the aim of getting self-driving vehicles in production by 2021, Ford is "taking steps to design a business to meet the needs of both partner companies and their customers."

"As we increase our understanding of the business opportunity for self-driving vehicles to support the movement of people and goods, we're pleased to have Domino's join us in this important part of the development process," said Sherif Marakby, Ford vice president, Autonomous and Electric Vehicles. "As a company focused on the customer experience, Domino's shares our vision for a future enabled by smart vehicles in a smart environment that enhance people's lives."

Preliminary testing of the delivery process was completed using the vehicle in self-driving mode at Mcity, the simulated urban environment on the University of Michigan's (Ann Arbor, MI, USA; https://umich.edu) campus. The city of Ann Arbor also has been supportive of the testing process, according to Ford.

Thermal cameras capture 30 Seconds to Mars in live MTV Video Music Awards broadcast

Led by Academy Award-winning actor Jared Leto, rock band 30 Seconds to Mars had their performance at the 34th annual MTV Video Music Awards (VMA) filmed as part of a live thermal video broadcast.

Leading up to the event, Leto had been teasing a unique live performance, which took place at The Forum in Inglewood, CA, USA. Weeks prior to the VMAs, the band's production company approached systems integrator and distributor MoviTHERM (Irvine, CA, USA; www.movitherm.com) with the desire to use several FLIR (Wilsonville, OR, USA; www.flir.com) infrared cameras as part of the performance. MoviTHERM then assembled a team of thermal imaging engineers and began working with the creative production team, choreographers, dancers, choir, and band on the thermal video performance.

1711vsd Snap P01

"This was history in the making," said Vatche Arabian, Senior Manager of Strategic Communications at FLIR. "This was the first time ever that four FLIR HD Thermal Cameras had been used in a live, world-wide TV broadcast!"

As part of the live broadcast, MoviTHERM used two FLIR A8300sc and two FLIR SC8300 HD thermal cameras with HD-SDI outputs, suitable for broadcasting in 720, according to Markus Tarin, President & CEO at MoviTHERM. The A8300sc is a compact, low-noise mid-wave infrared camera featuring a 1280 x 720 Indium Antimonide (InSb) infrared detector with a 14 μm pixel pitch and a spectral range of 3 to 5 μm. The camera streams infrared data via GigE, USB3, or CoaXPress interface and also features a 14-bit dynamic range and a 60 fps frame rate.

Featuring a 1,344 x 784 InSb detector with a 14 μm pixel pitch and a spectral range of 3 - 5 μm or 1.5 - 5 μm, the Science 8300 (SC8300) camera features both GigE and Camera Link Full interfaces, a 14-bit dynamic range, and a 132 fps frame rate.

"When it became apparent that temperature span adjustments had to be made on-the-fly during live broadcast, we decided to develop special thermal imaging and camera control software to accommodate that task," said Tarin. "The software enabled us to optimize the thermal contrast of the performers in real-time during the live performance and respond rapidly to cues of the video director."

Billboard Magazine called the performance "mind-bending" in its coverage of the event.

Jared Leto spoke to FLIR about the performance, and why he wanted to utilize thermal imaging cameras for it.

"To be able to bring this idea to life, it took a village, a very large village," he said. "When you do a show like the MTV Awards you have an opportunity to explore and experiment, to revisit the past or to push towards the future. We knew we wanted to do something that hadn't been done before."

He continued, "Playing in total darkness, and capturing our performance via thermal signal and doing it live was complex, challenging and seemingly impossible but with the help of a very patient MTV and an enthusiastic team of creative dreamers, we accomplished our goal. Working with the FLIR team has been inspiring and an absolute pleasure."

View the performance from the VMAs here: http://bit.ly/VSD-30STM.

360-degree sports replay vision system from Intel now installed in 11 NFL stadiums

Intel's (Santa Clara, CA, USA; www.intel.com) freeD (free dimensional video) technology has been installed at the home stadiums of 11 NFL teams. The Arizona Cardinals, Baltimore Ravens, Carolina Panthers, Cleveland Browns, Houston Texans, Indianapolis Colts, Kansas City Chiefs, Minnesota Vikings, New England Patriots, San Francisco 49ers, and the Washington Redskins stadiums all feature the 360° vision system that uses 38 high-resolution industrial cameras.

1711vsd Snap P03

For games at these freeD technology-enabled stadiums, fans can access 360° highlights via NFL.com, the NFL Mobile app, the NFL YouTube channel, and across NFL team digital offerings. The replay system uses 38 cameras installed around the stadium, along with proprietary algorithms, to enable views of gameplay action from every angle. Each camera in the stadium connects to Intel-based servers capable of processing up to 1 terabyte of data per 15- to 30-second clip. Each Intel freeD technology system requires more than 50 servers. Included are Gigabyte X-99 motherboards with an Intel Core i5 processor. The volumetric video capture travels over miles of fiber-optic cables and is fed to a special control room where a team of producers select and package the replays, according to Intel.

"By expanding freeD to more teams across the NFL, we're empowering fans to see every side of the play and relive the excitement of game-changing moments," said James Carwana, general manager of Intel Sports. "During Super Bowl LI, fans experienced a pivotal play from the quarterback's point of view. Seeing key plays up close and from new perspectives is redefining what it means to watch the game."

Intel's freeD captures true 3D scenes that can be tapped to produce any desired viewing angle. The system utilizes 36 Spark Series SP-20000 industrial cameras from JAI (San Jose, CA, USA; www.jai.com) that are mounted around the upper level of the stadium to continuously capture the action from every angle. SP-20000 cameras feature the 20 MPixel CMV20000 CMOS image sensor from CMOSIS (now ams Sensors Belgium; Antwerp, Belgium; www.cmosis.com), which features a 6.4 μm pixel size. The cameras also feature built-in high dynamic range mode, which is designed to handle the high contrast sun and shade conditions common in outdoor stadiums, golf courses, and other sports venues.

Synchronized feeds of high-resolution video are processed using algorithms to create a 3D database of voxels. After the freeD database is created, an interactive real-time rendering engine allows for the viewing of the captured scene from any desired angle (as long as it is within the coverage range of the original sensors).

Vishal Shah, SVP, Digital Media at the NFL, also commented: "We're thrilled to bring this innovative content to NFL fans both in stadium and at home with freeD technology," said Shah. "Partnering with Intel has enabled a new way for fans to experience the excitement of our game. The vision of this technology to place the viewer anywhere on the field has the potential to be impactful across multiple areas of the League."

In early 2016, Intel acquired the company that developed freeD Technology, Replay Technologies. In an interview with Vision Systems Design, Matteo Shapira, the chief technology officer and co-founder from Replay Technologies, said freeD "enables a new way of capturing reality which breaks us free from the constraints of where a physical camera with a particular lens had been placed, to allow a freedom of viewing which has endless possibilities."

Shapira also noted that the company developed the technology about a year and a half before the 2012 Olympics games, when the three founders of the company gathered to come up with the idea of shooting reality from infinite angles.

Read the interview with Shapira, who is now the senior director of innovation and technology, immersive reality/sports, at Intel here: http://bit.ly/VSD-FreeD.

Deep learning software from IBM Research achieves record performance

To reduce the training times for large models with large data sets, IBM Research has developed deep learning software that achieves efficiency on the Caffe deep learning framework.

IBM Research says that its software does deep learning training fully synchronously with very low communication overhead, and as a result, when it scaled to a large cluster (ImageNet-22K dataset) with hundreds of NVIDIA (Santa Clara, CA, USA; www.nvidia.com) GPUs, it yielded a record image recognition accuracy of 33.8% of 7.5 million images from the dataset, vs. the previous best published result of 29.8% by Microsoft. This distributed deep learning (DDL) approach enabled them to train a ResNet-101 neural network model in just 7 hours, by leveraging the power of tens of servers, equipped with hundreds of NVIDIA GPUs. This took Microsoft 10 days to train the same model.

To achieve this, IBM Research created the DDL code and algorithms to overcome issues inherent to scaling these otherwise powerful deep learning frameworks, according to the company.

These results are on a benchmark designed to test deep learning algorithms and systems to the extreme, so while 33.8% might not sound like a lot, it's a result that is noticeably higher than prior publications," wrote Hillery Hunter, IBM Fellow (pictured). "Given any random image, this trained AI model will give its top choice object (Top-1 accuracy), amongst 22,000 options, with an accuracy of 33.8%. Our technology will enable other AI models trained for specific tasks, such as detecting cancer cells in medical images, to be much more accurate and trained in hours, re-trained in seconds."

With this approach, IBM Research (Cambridge, MA, USA; www.research.ibm.com) also beat Facebook's (Menlo Park, CA, USA; www.facebook.com) previously impressive time. Previously, the best scaling for 256 GPUs was from Facebook's AI Research (FAIR) team, which used a smaller deep learning model, ResNet-50, on a smaller dataset, ImageNet-K, which has about 1.3 million images. With a large minibatch size of 8192, using 256 GPUs, Facebook researchers trained the ResNet-50 model in one hour while maintaining the same level of accuracy as a 256 minibatch baseline. (Around 89%). This was accomplished by using a linear scaling rule for adjusting learning rates as a function of minibatch size and developing a new warmup scheme that overcomes optimization challenges early in training by gradually ramping up the learning rate from a small to large value and the batch size over time to help maintain accuracy.

For a ResNet-50 model and the same dataset as Facebook, the IBM Research DDL software achieved an efficiency of 95% using Caffe, running on a cluster of 64 "Minsky" Power S822LC systems, with four NVIDIA P100 GPUs each. IBM Research also did this in 50 minutes, compared to Facebook's previous record of one hour. For training the larger ResNet-101 model on 7.5 million images from the ImageNet-22K dataset, with an image batch of 5120, IBM Research achieved a scaling efficiency of 88%.

For developers and data scientists, the IBM Research DDL software presents an API that each of the deep learning frameworks can hook into, to scale to multiple servers, according to the company. IBM Research-which has released a technical preview in version 4 of the PowerAI enterprise deep learning software offering-expects that by making these DDL features publicly available, it will see more higher-accuracy runs, as others leverage the power of clusters for AI model training.

More in Emerging