ECU Libraries Catalog
Librarian View
LEADER 03534cam 2200505Ia 4500
001
ocn810191424
003
OCoLC
005
20141212032702.0
006
m d
007
cr bn|||||||||
008
120918s2012 ncua ob 000 0 eng d
035
a| (Sirsi) o810191424
035
a| (OCoLC)810191424
040
a| ERE
c| ERE
d| OCLCO
d| ERE
d| UtOrBLW
049
a| EREE
090
a| TK8315
100
1
a| Adeli, Hossein.
?| UNAUTHORIZED
245
1
0
a| Modeling salient object-object interactions to generate textual descriptions for natural images /
c| by Hossein Adeli.
260
a| [Greenville, N.C.] :
b| East Carolina University,
c| 2012.
300
a| 51 pages :
b| illustrations (some color), digital, PDF file
336
a| text
2| rdacontent
337
a| computer
2| rdamedia
338
a| online resource
2| rdacarrier
538
a| System requirements: Adobe Reader.
538
a| Mode of access: World Wide Web.
502
b| M.S.
c| East Carolina University
d| 2012.
500
a| Presented to the faculty of the Department of Computer Science.
500
a| Advisor: M. H. Nassehzadeh Tabrizi.
500
a| Title from PDF t.p. (viewed Sept. 20, 2012).
520
3
a| In this thesis we consider the problem of automatically generating textual descriptions of images which is useful in many applications. For example, searching and retrieving visual data in overwhelming number of images and videos available on the Internet requires better understanding of the multimedia content that is not provided by user annotated tags and meta-data. While this task remains a very challenging problem for machines, humans can easily generate concise descriptions of the images; they can avoid what seems to be unnecessary and not related to the main point of the images and talk about the objects, their actions and attributes, their interactions with each other and the context that all is happening. Our method consists of two main steps to automatically generate the image description. By using saliency maps and object detectors, it determines the objects that are of interests to the observer and hence, should appear in the description of the image. Then pose (body part configuration) of those objects/entities is used to recognize the single actions and interactions between them. For generating the sentences, we use a syntactic model that first orders the nouns (objects) and then builds sub-trees around the detected objects using the predicted actions. The model then combines those sub-trees using the recognized interactions and at the end, the context of interactions, which is detected with a separate algorithm, is added to create a full sentence for the image. The results show the improved accuracy of the descriptions generated, using our method.
504
a| Includes bibliographical references.
650
0
a| Imaging systems.
=| ^A85862
650
0
a| Image analysis.
=| ^A368011
650
0
a| Metadata.
=| ^A413621
653
a| Computer science
653
a| Artificial intelligence
700
1
a| Tabrizi, M. H. N.
?| UNAUTHORIZED
710
2
a| East Carolina University.
b| Department of Computer Science.
?| UNAUTHORIZED
856
4
0
z| Access via ScholarShip
u| http://hdl.handle.net/10342/3935
949
o| jgml
994
a| C0
b| ERE
596
a| 1 4
998
a| 3108445
999
a| CLICK ON WEB ADDRESS
w| ASIS
c| 1
i| 3108445-1001
l| JNET
m| JOYNER
r| Y
s| Y
t| JNE3ETD
u| 9/18/2012
x| ETD
z| JERESOURCE
999
a| CLICK ON WEB ADDRESS
w| ASIS
c| 1
i| 3108445-2001
l| HSLELEC
m| HSL
r| Y
s| Y
t| HEETD
u| 9/18/2012
x| ETD
z| HERESOURCE