Conventional genetic programming (GP) can guarantee only that synthesized programs pass tests given by the provided input-output examples. The alternative to such a test-based approach is synthesizing programs by formal specification, typically realized with exact, nonheuristic algorithms. In this article, we build on our earlier study on Counterexample-Based Genetic Programming (CDGP), an evolutionary heuristic that synthesizes programs from formal specifications. The candidate programs in CDGP undergo formal verification with a Satisfiability Modulo Theory (SMT) solver, which results in counterexamples that are subsequently turned into tests and used to calculate fitness. The original CDGP is extended here with a fitness threshold parameter that decides which programs should be verified, a more rigorous mechanism for turning counterexamples into tests, and other conceptual and technical improvements. We apply it to 24 benchmarks representing two domains: the linear integer arithmetic (LIA) and the string manipulation (SLIA) problems, showing that CDGP can reliably synthesize provably correct programs in both domains. We also confront it with two state-of-the art exact program synthesis methods and demonstrate that CDGP effectively trades longer synthesis time for smaller program size.