Abstract
We present Plato, a probabilistic model for entity resolution that includes a novel approach for handling noisy or uninformative features, and supplements labeled training data derived from Wikipedia with a very large unlabeled text corpus. Training and inference in the proposed model can easily be distributed across many servers, allowing it to scale to over 107 entities. We evaluate Plato on three standard datasets for entity resolution. Our approach achieves the best results to-date on TAC KBP 2011 and is highly competitive on both the CoNLL 2003 and TAC KBP 2012 datasets.
This content is only available as a PDF.
©2015 Association for Computational Linguistics. Distributed
under a CC-BY 4.0 license.
2015
Association for Computational Linguistics
This is an open-access article distributed under the terms of the
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International
License, which permits you to copy and redistribute in any medium or format,
for non-commercial use only, provided that the original work is not remixed,
transformed, or built upon, and that appropriate credit to the original
source is given. For a full description of the license, please visit https://creativecommons.org/licenses/by/4.0/legalcode.