Coding the hidden layers for our example

For our example problem, I'll use five hidden layers because I think there are lots of interactions between features. My hunch is primarily based on domain knowledge. Having read the data description, I know this is a cross-sectional slice of a time series and maybe auto correlated.  

I'll start with 128 neurons on the first layer (slightly fewer than my input size) and then collapse down to 16 by halves as we get toward the output. This isn't at all a rule of thumb, it's based on my own experience alone. We will use the following code to define our hidden layers:

x = Dense(128, activation='relu', name="hidden1")(inputs)
x = Dense(64, activation='relu', name="hidden2")(x)
x = Dense(64, activation='relu', name="hidden3")(x)
x = Dense(32, activation='relu', name="hidden4")(x)
x = Dense(16, activation='relu', name="hidden5")(x)

In each layer, I used relu activation, as it's usually the best and safest choice, but to be sure this is also a hyperparameter that can be experimented with.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.217.95