Our framework includes a reward estimator based on a generative adversarial network (GAN) to issue dynamic rewards with regard to the labels (actions) committed by event extractor (agent). The reward estimator is trained upon the difference between the labels from ground truth (expert) and extractor (agent). If the extractor repeatedly misses Execute label for “death“, the penalty (negative reward values) is strengthened; if the extractor makes surprising mistakes: label “death” as Person or label Person “Masih” as Place role in Sentence event, the penalty is also strong. For cases where extractor is correct, simpler cases such as Sentence on “death” will take a smaller gain while difficult cases Execute on “death” will be awarded with larger reward values.
This site uses cookies. By continuing to use our website, you are agreeing to our privacy policy. No content on this site may be used to train artificial intelligence systems without permission in writing from the MIT Press.