Prior knowledge constraints are imposed upon a learning problem in the form of distance measures. Prototypical 2D point sets and graphs are learned by clustering with point-matching and graph-matching distance measures. The point-matching distance measure is approximately invariant under affine transformations—translation, rotation, scale, and shear—and permutations. It operates between noisy images with missing and spurious points. The graph-matching distance measure operates on weighted graphs and is invariant under permutations. Learning is formulated as an optimization problem. Large objectives so formulated (∼ million variables) are efficiently minimized using a combination of optimization techniques—softassign, algebraic transformations, clocked objectives, and deterministic annealing.