Abstract

Long queries often suffer from low recall in Web search due to conjunctive term matching. The chances of matching words in relevant documents can be increased by rewriting query terms into new terms with similar statistical properties. We present a comparison of approaches that deploy user query logs to learn rewrites of query terms into terms from the document space. We show that the best results are achieved by adopting the perspective of bridging the “lexical chasm” between queries and documents by translating from a source language of user queries into a target language of Web documents. We train a state-of-the-art statistical machine translation model on query-snippet pairs from user query logs, and extract expansion terms from the query rewrites produced by the monolingual translation system. We show in an extrinsic evaluation in a real-world Web search task that the combination of a query-to-snippet translation model with a query language model achieves improved contextual query expansion compared to a state-of-the-art query expansion model that is trained on the same query log data.

This content is only available as a PDF.

Author notes

*

Brandschenkestrasse 110, 8002 Zürich, Switzerland. E-mail: riezler@gmail.com.

**

1600 Amphitheatre Parkway, Mountain View, CA. E-mail: yliu@google.com.