Thanks for the compilation. You mention "stability and capacity" for number 1, and you did not find a source. I diagnose the problem as disjoint supports. The distribution of the real data and generative distribution are disjoint -- not connected at any point -- so it is a cinch for the discriminator to reach perfect performance. The gradients for the generator therefore falls to zero, and no training happens. The solution to that is to try a weaker discriminator, not to add weights to the generator or look online; though adding noise to the inputs of the discriminator, which you mention later in the article, can work too.
This comment is a while after the post, but I figured that the diagnosis and tip would be useful to other readers.