Academic researchers developing new activity recognition algorithm

May 14, 2014
Hamed Pirsiavash, a postdoc at MIT, and his former thesis advisor, Deva Ramanan of the University of California at Irvine have developed a new activity recognition algorithm that uses techniques from natural language processing to enable computers to more efficiently search video for actions.

Hamed Pirsiavash, a postdoc at MIT, and his former thesis advisor, Deva Ramanan of the University of California at Irvine have developed a new activity recognition algorithm that uses techniques from natural language processing to enable computers to more efficiently search video for actions.

While previous algorithms that perform similar tasks have been developed, the new algorithm reportedly has a number of advantages over its predecessors. According to the MIT news release, these include:

  • Execution time. The new algorithm’s execution time scales linearly with the size of the video file it’s searching, meaning that if one file is 10 times larger than another, the algorithm will take 10 times as long to search it, not 1,000 times longer, as with earlier algorithms.
  • Predicting actions. The algorithm is able to see a partially completed action and issue a probability that the action is of the type that it is looking for. It may revise this probability as the video continues, but does not have to wait until the action is complete to assess it.
  • Fixed memory. Regardless of how many frames of video the algorithm has reviewed, the amount of memory it requires is fixed, meaning that, unlike many of its predecessors, it can handle video streams of any length or size.

Pirsiavash and Ramanan’s algorithm utilizes aspects of a type of algorithm used in natural language processing, which is a field of computer science concerned with the interactions between computers and human (natural) languages. In the MIT new release, Pirsiavash explains how the natural language processing algorithm applies to activity prediction.

"One of the challenging problems they try to solve is, if you have a sentence, you want to basically parse the sentence, saying what is the subject, what is the verb, what is the adverb," Pirsiavash said. "We see an analogy here, which is, if you have a complex action — like making tea or making coffee — that has some subactions, we can basically stitch together these subactions and look at each one as something like verb, adjective, and adverb."

Page 1 | Page 2

About the Author

James Carroll

Former VSD Editor James Carroll joined the team 2013.  Carroll covered machine vision and imaging from numerous angles, including application stories, industry news, market updates, and new products. In addition to writing and editing articles, Carroll managed the Innovators Awards program and webcasts.

Voice Your Opinion

To join the conversation, and become an exclusive member of Vision Systems Design, create an account today!