The family of Auto Regressive Integrated Moving Average (ARIMA) models are worth mentioning because they are traditionally used in time series forecasting. While I'm obviously a big fan of deep neural networks (in fact I wrote a book about them), I suggest starting with ARIMA and progressing towards deep learning. In many cases, ARIMA will outperform the LSTM. This is especially true when data is sparse.
The ARIMA model is a combination of three parts. The AR, or autoregressive part, is the part that seeks to model the series based on it's own autocorrelation. The MA portion attempts to model local surprises or shocks in the time series. The I portion covers differencing, which we've just covered. The ARIMA model typically takes three hyperparameters, p, d, and q, which correspond to the number of autoregressive lags modeled, the degree of differencing, and the order of the moving average portion of the model, respectively.
The ARIMAX model allows for the inclusion of one or more covariates in the time series model. What's a covariate in this case, you ask? It's an additional time series that is also correlated to the dependent variable and can be used to further improve the performance of forecasting.
A common practice amongst traders is to attempt to predict the value of some commodity by using one or more lags from another commodity as well as autoregressive portions of the commodity we're forecasting. This is a case where the ARIMAX model would be useful.
If you have many covariates with intricate higher order interactions, you've landed in the sweet spot of LSTM for time series prediction. At the beginning of the book, we talked about how a multilayer perceptron can model complicated interactions between input variables giving us automatic feature engineering that provides lift over a linear or logistic regression. This property is carried forward to using LSTMs for time series prediction with many input variables.