Some of you may know my passion about Linguistics, concretely Lexical Semantics, in the field in which I carried out my MA’s thesis. Now, I am sharing with you some of the most used computational resources that can be crucial in order to improve the results of information retrieval. In order to keep on the discussion about different resources related with argument structure that I started in this blog some time ago with a post related with TimeML Specification Language, I would like to talk now about the Proposition Bank, which is one of the most extended approximations in argument structure recognition.
Proposition Bank (Palmer et al., 2005), hereinafter called PropBank, is an annotation language that aims to insert argument structure and semantic roles to previously syntactically parsed corpora. The theoretical framework for this approximation can be found in Levin’s theory (Levin, 1993), which describes different classes of English verbs depending on their syntactic behavior. The classification is based on diathesis alternations, which is the capacity of certain predicates and their arguments to occur or not to occur in certain syntactic constructions. PropBank and VerbNet (Kipper et al., 2000) share their theoretical framework (Levin’s verbal classification, Levin (1993)) and PropBank structure is based on VerbNet approximation. The main difference between PropBank and VerbNet is the way in which the different arguments a predicate can take are codified: whereas in VerbNet is a set of roles (e.g. Agent, Patient, Experiencer, etc.), in PropBank is a set of numbered arguments (e.g. Arg0, Arg1, etc.).
Before annotation, a set of arguments has to be defined. These arguments will be the base to build up PropBank framesets. Framesets are built with a set of numbered arguments derived from VerbNet classification; PropBank assigns a numbered argument to each VerbNet thematic role. PropBank defines numbered arguments for each semantic argument, beginning with 0. These arguments are defined based on syntactic constituents. PropBank differentiates between numbered arguments, which are the ones that a predicate must carry; and adjuncts, which are optional arguments that may appear in almost every predicate
PropBank annotation is based on framesets. These framesets are built with the arguments previously defined. Every frameset has assigned different arguments and an example where the proper annotation of this predicate is shown. Framesets do not have explicit syntactic information, but arguments are defined in base on syntactic constituents, so each frameset also carries some implicit syntactic information.
In examples (10) and (11) (Palmer et al., 2005), there are two different framesets with different arguments for two predicates defined in PropBank. PropBank framesets inherit the same structure of VerbNet entries. Framesets have a fixed structure in order to allow a further mapping with VerbNet thematic roles:
(10) Frameset accept.01 “take willingly”
Arg1: Thing accepted
Ex:[Arg0 He] [ArgM-MOD would][ArgM-NEG n’t] accept [Arg1 anything of value] [Arg2 from those he was writing about]. (wsj 0186)
(11) Frameset kick.01 “drive or impel with the foot”
Arg1: Thing kicked
Arg2: Instrument (defaults to foot)
Ex1: [ArgM-DIS But] [Arg0 two big New York banksi] seem [Arg0 *trace*i] to have kicked [Arg1 those chances] [ArgM-DIR away], [ArgM-TMP for the moment], [Arg2 with the embarrassing failure of Citicorp and Chase Manhattan Corp. to deliver $7.2 billion in bank financing for a leveraged buy-out of United Airlines parent UAL Corp]. (wsj 1619)
Ex2: [Arg0 Johni] tried [Arg0 *trace*i] to kick [Arg1 the football], but Mary pulled it away at the last moment.
PropBank design allows each predicate to be tagged independently, which differentiates this approximation from the traditional dependency-parsing proposal (Loper et al., 2007). PropBank allows a syntactic constituent to act as argument for two different verbs. In consequence, some verbs may share arguments, although constituents act as different numbered arguments in each verb.
PropBank assumes that the corpus to be annotated is previously parsed with syntactic information, since PropBank annotation uses syntactic nodes to assign numbered arguments. In Penn Treebank annotation project, the first criterion followed to annotate different constituents was to assign the PropBank tag to the NP node in the case of main numbered arguments (e.g. Arg0, Arg1, etc.) and to the PP node in the case of the adjunct arguments (e.g. ArgMs). However, in order to keep consistency in annotation it was decided to tag numbered arguments at the PP level. Since this decision was taken for English annotation, all approximations using PropBank language in other languages have followed this criterion. A great example of this international use is AnCora Corpora, for Spanish and Catalan. Here, you can discover how this resource can be used in other languages. This globalization is a great proof that PropBank annotation can be really important in the creation of lexical annotated resources in order to train tools that can improve the results of information retrieval tools and, perhaps in the future, the argument structure recognition.
I have presented only two resources in computational lexical semantics, but there are others that can also play a role in this process and I am planning to talk about them in the future. However, it would be great if you may share any other resource you know that may be useful to achieve this aim. For example, have you heard about FrameNet or WordNet? Do you think ontologies can also play a significant role in argument structure recognition? I am eager to know you opinion about all this topic!