Generating Death Metal Band names with Recurrent Neural Networks
For anyone familiar with Death Metal you know it’s a genre known for it’s extremes. The music, the names, the subject matter - it’s all extreme! I thought it would be interesting to see what a recurrent neural network would come up with asked to generate it’s own death metal band names.
The real reason that I did this was to code up a LSTM with multiplicative integration and see how well it did with sequence modeling compared to a vanilla LSTM.
But first, the fun part! A selection of some of the generated band names below. A word of caution, due to the nature of death metal some readers might find the generated names offensive.
Sadistic Stench Joyless Homicide Enthrallment Suns Torment of Torment Psychologist Old Scorched Words of Death Suburbia Ghost of Pain Gore Disease Fragile Riot Wormhorn Utopia Corpse Psychic Inc. Burn Me Binder Rest in Death Bloodgasm Jonas of Man The Antisocial Ransatary Youth Crushing Ancient Torment Human Terror Through Darkness You Morbid Defilement Parasitical Mandality Terrorable Life Zombification Human Stench Gate of Here Unknown Grinder Inverted Brain Zombie Strength Hate Slug Just Scream
LSTM with multiplicative intergration in PyTorch
The real reason that I did this other than to be funny is to see how easy it would be to code up this type of LSTM in PyTorch and see how it’s performance compared to a vanilla LSTM for sequence modeling. I’m happy to report that coding this in PyTorch was a breeze! However, I didn’t really see any improvement over a vanilla LSTM. Most likely this problem wasn’t complex enough to really test the capacity of the MI-LSTM.
Coding the MI-LSTM
In PyTorch we define a forward function that explicitly controls the computation of our network. It specifies the operations that take us from the input tensor to the output tensor, in our case it will be all of the operations that create a MI-LSTM.
Appendix A of "On Multiplicative Integration with Recurrent Neural Networks"
To turn these mathematical expressions into a PyTorch model we just have to set up the matricies and specify the computations. The resulting forward function pretty much looks identical to the mathematical notation.
def forward(self, inp, h_0, c_0): # encode the input characters inp = self.encoder(inp) # forget gate f_g = F.sigmoid(self.alpha_f * self.weight_fx(inp) * self.weight_fh(h_0) + (self.beta_f1 * self.weight_fx(inp)) + (self.beta_f2 * self.weight_fh(h_0))) # input gate i_g = F.sigmoid(self.alpha_i * self.weight_ix(inp) * self.weight_ih(h_0) + (self.beta_i1 * self.weight_ix(inp)) + (self.beta_i2 * self.weight_ih(h_0))) # output gate o_g = F.sigmoid(self.alpha_o * self.weight_ox(inp) * self.weight_oh(h_0) + (self.beta_o1 * self.weight_ox(inp)) + (self.beta_o2 * self.weight_oh(h_0))) # block input z_t = F.tanh(self.alpha_z * self.weight_zx(inp) * self.weight_zh(h_0) + (self.beta_z1 * self.weight_zx(inp)) + (self.beta_z2 * self.weight_zh(h_0))) # current cell state cx = f_g * c_0 + i_g * z_t # hidden state hx = o_g * F.tanh(cx) out = self.decoder(hx.view(1,-1)) return out, hx, cx
MI-LSTM vs LSTM
This quick experiment is based on the char-rnn from Andrej Karpathy’s blog post ‘The Unreasonable Effectiveness of Recurrent Neural Networks’. Most of the PyTorch code other than the mi-lstm was borrowed and adapted from the examples and tutorials created by Sean Robertson in his repo ‘Practical PyTorch’.
I took the list of death metal band names from the kaggle dataset. Then trained a MI-LSTM and LSTM with 1024 hidden units on the corpus ofband names. The network was trained with Adam with an initial learning rate of 0.0005 for 3 epochs. The learning rate was scaled down by 75% at the start of epoch 2 and 3.
Training loss curves for LSTM and MI-LSTM.
Both LSTM styles convereged to similar loss values. This task may have been too simple to test the benefits of MI-LSTM.
- Yuhuai Wu, Saizheng Zhang, Ying Zhang, Yoshua Bengio, Ruslan Salakhutdinov “On Multiplicative Integration with Recurrent Neural Networks”, link
- Andrej Karpathy, “The Unreasonable Effectiveness of Recurrent Neural Networks”, link
- Diederik Kingma, Jimmy Ba, “Adam: A Method for Stochastic Optimization”, link
- Sean Robertson, “Practical PyTorch”, link