MIT Press

Figure 1.

Our framework includes a reward estimator based on a generative adversarial network (GAN) to issue dynamic rewards with regard to the labels (actions) committed by event extractor (agent). The reward estimator is trained upon the difference between the labels from ground truth (expert) and extractor (agent). If the extractor repeatedly misses Execute label for “death“, the penalty (negative reward values) is strengthened; if the extractor makes surprising mistakes: label “death” as Person or label Person “Masih” as Place role in Sentence event, the penalty is also strong. For cases where extractor is correct, simpler cases such as Sentence on “death” will take a smaller gain while difficult cases Execute on “death” will be awarded with larger reward values.

This Feature Is Available To Subscribers Only