We present an algorithm that PAC learns any perceptron with binary weights and arbitrary threshold under the family of product distributions. The sample complexity of this algorithm is of O[(n/ε)4 ln(n/δ)] and its running time increases only linearly with the number of training examples. The algorithm does not try to find an hypothesis that agrees with all of the training examples; rather, it constructs a binary perceptron based on various probabilistic estimates obtained from the training examples. We show that, under the restricted case of the uniform distribution and zero threshold, the algorithm reduces to the well known clipped Hebb rule. We calculate exactly the average generalization rate (i.e., the learning curve) of the algorithm, under the uniform distribution, in the limit of an infinite number of dimensions. We find that the error rate decreases exponentially as a function of the number of training examples. Hence, the average case analysis gives a sample complexity of O[n ln(1/ε)], a large improvement over the PAC learning analysis. The analytical expression of the learning curve is in excellent agreement with the extensive numerical simulations. In addition, the algorithm is very robust with respect to classification noise.