In stochastic mode, the method introduces corrections to the weight coefficients immediately after calculating the network output on one training sample.
The stochastic method is slower than the batch method. Given it does not carry out an accurate gradient descent, instead introducing some noise using an undeveloped gradient, it can get out of local minima and produce better results. It is also easier to apply when working with large amounts of training data.