基于图像回答算法的智能回复系统设计与实现
摘要:智能回复系统是近年来比较火热的研究领域之一。现有的智能回复系统大部分只能对文字相关的问题才能做出回复,但是对于图像相关的问题就无能为力了。所以本文实现了一个图像问答算法,并且设计并实现了一个智能回复系统app软件。该图像问答算法主要包括问题处理、图像处理和答案生成三个部分。卷积神经网络用来提取图像特征,将问题转换成词向量,然后将问题的词向量和图像特征同时作为长短期记忆网络的输入,输出即为答案。该图像问答算法适用于一些简单的问题,例如”What is ” “How many” 进行回答。算法在数据集VQA (v1)下的准确率达到了52.84%。基于图像问答算法和图灵机器人的Web接口,本文设计并实现了一个智能回复系统软件。该智能回复系统拥有服务端和安卓客户端两个部分。其中服务端利用Python的Web.py框架实现。该智能回复系统实现了图像问答、闲聊、开放领域问答、查询天气、讲笑话、讲故事等功能。
关键词:图像问答;智能回复系统;卷积神经网络;长短期记忆网络;
Design and implementation of intelligent reply system based on image answer algorithm
Abstract: Chat Robot are one of the hot research fields in recent years .Now, most of the chat robots can only deal with the text questions. However, when it comes to something about images, they can not have any response. So we implement a visual question answering algorithm (VQA) which based on the paper --”Visual Question answering”. And then we design and implement a chat robot software. The visual question answering algorithm is based on the convolutional neural network(CNN) and the long-short memory(LSTM). The CNN is used to extract the image features. We translate the natural language question into word vectors. Then we put the word vectors of the question and the vector of image feature into LSTM, and the output is the answer to the question. To train the model,we use the pretrained COCO image features and GloVe which published by Stanford.The algorithm we implemented can answer some simple questions, such as "What is" "How many",etc. The accuracy of the algorithm in the dataset VQA (v1) can reach to 52.84%.However, when we use the model we should extract the image feature by ourslves.So we implement the VGG-16 network to extract image feature. Based on this image question answering algorithm and the Turing robot's web interface, we design and implement a chat robot. The chat robot has both the server and the android client. Where the server is implemented by using Python's Web.py framework. Also, We use the Baidu Translation to transfer the Chinese question to English question. Then we can use the VQA model to answer the question written by Chinese. The main role of the server is to receive messages sent by the client and make the appropriate response at the same time. The function of the client is to send messages to the server and receive messages. The android app can also take photos and see the system gallery. The chat robot has the image question answering, chatting, open field question answering, query the weather, tell jokes, tell stories and other functions.
Key words: Visual Question Answering; Chat Robot; CNN; LSTM;
目 录






















