Saturday, November 22, 2014

Latest From Science, Technology And Medicine [11.22.14]

Google /
Artificial Intelligence / 
Computer Vision



"Many efforts to construct computer-generated natural descriptions of images propose combining current state-of-the-art techniques in both computer vision and natural language processing to form a complete image description approach. But what if we instead merged recent computer vision and language models into a single jointly trained system, taking an image and directly producing a human readable sequence of words to describe it? 
This idea comes from recent advances in machine translation between languages, where a Recurrent Neural Network (RNN) transforms, say, a French sentence into a vector representation, and a second RNN uses that vector representation to generate a target sentence in German. 
Now, what if we replaced that first RNN and its input words with a deep Convolutional Neural Network(CNN) trained to classify objects in images? Normally, the CNN’s last layer is used in a final Softmaxamong known classes of objects, assigning a probability that each object might be in the image. But if we remove that final layer, we can instead feed the CNN’s rich encoding of the image into a RNN designed to produce phrases. We can then train the whole system directly on images and their captions, so it maximizes the likelihood that descriptions it produces best match the training descriptions for each image."

No comments:

Post a Comment