In statistical machine translation (SMT), the optimization of the system parameters to maximize translation accuracy is now a fundamental part of virtually all modern systems. In this article, we survey 12 years of research on optimization for SMT, from the seminal work on discriminative models (Och and Ney 2002) and minimum error rate training (Och 2003), to the most recent advances. Starting with a brief introduction to the fundamentals of SMT systems, we follow by covering a wide variety of optimization algorithms for use in both batch and online optimization. Specifically, we discuss losses based on direct error minimization, maximum likelihood, maximum margin, risk minimization, ranking, and more, along with the appropriate methods for minimizing these losses. We also cover recent topics, including large-scale optimization, nonlinear models, domain-dependent optimization, and the effect of MT evaluation measures or search on optimization. Finally, we discuss the current state of affairs in MT optimization, and point out some unresolved problems that will likely be the target of further research in optimization for MT.