Test results for all models on C levr and C losure. † stands for models trained with gold programs, ‡ for oracle models evaluated using gold programs, and ∓ for deterministically executed models that depend on domain-knowledge for execution (execution is not learned).