The capacity of a model describes the complexity of the input-output relationships it can model; that is, how large a set of functions is allowed in the hypothesis space of the model. For example, a linear regression model can be generalized to include polynomials rather than only linear functions. This can be done by taking n integral powers of x as input along with x while building the model. The capacity of the model can also be controlled by adding multiple hidden nonlinear layers to the network. So, we can make the neural network model either wider or deeper, or both, to increase the capacity of the model.
However, there is a trade-off between the model capacity and the generalization error of the model:
(Right) A polynomial of degree 9 fit to the data suffers from overfitting
Models with very high capacity can overfit the training set by learning patterns in training sets that may not generalize well to unseen test sets. Also, it can fit very well on small amounts of training data. On the other hand, models with low capacity may struggle to fit the training set: