Bag of Tricks for Training Deeper Graph Neural Networks: A Comprehensive Benchmark Study

Training deep graph neural networks (GNNs) is notoriously hard. Besides the standard plights in training deep architectures such as vanishing gradients and overfitting, it also uniquely suffers from over-smoothing, information squashing, and so on, which limits their potential power for encoding the high-order neighbor structure in large-scale graphs. Although numerous efforts are proposed to address these limitations, such as various forms of skip connections, graph normalization, and random dropping, it is difficult to disentangle the advantages brought by a deep GNN architecture from those “tricks” necessary to train such an architecture. Moreover, the lack of a standardized benchmark with fair and consistent experimental settings poses an almost insurmountable obstacle to gauge the effectiveness of new mechanisms. In view of those, we present the first fair and reproducible benchmark dedicated to assessing the “tricks” of training deep GNNs. We categorize existing approaches, investigate their hyperparameter sensitivity, and unify the basic configuration. Comprehensive evaluations are then conducted on tens of representative graph datasets including the recent large-scale Open Graph Benchmark, with diverse deep GNN backbones. We demonstrate that an organic combo of initial connection, identity mapping, group and batch normalization attains the new state-of-the-art results for deep GNNs on large datasets. Codes are available: https://github.com/VITA-Group/Deep GCN Benchmarking.