StellarGraph

Generator of graphs and symbolic inputs to graphs

Hi,

I am trying to replicate this paper (https://arxiv.org/pdf/1904.12577.pdf) and for graph convolution I would like to use StellarGraph. After researching other GCN libraries I find this one the most advanced, as I prefer to use Keras within TensorFlow 2.2.

My training datases consists of about 20k heterogenous graphs - PDF documents where words for a graphs. My task is to classify each word and tag it with its respective meaning - retrieve specific financial metrics in written text. Being each word a node in graph this should be a node classification problem.

Right now I use RelationalGraphConvolution layer (I need multiple edge types) without Stellargraph generator as my input is my own fit generator providing features for each document with fixed number of words (padded).

Another reason for not using Stellargraph generator is the preprocessing of node features with character embedding, so I can use symbolic tensors for my inputs to RGCN.

Should I use some kind of custom RelationalFullBatchNodeGenerator and modify it as a “fit generator in generator” ? In that case I do not know right now how to preprocess those inputs and learn character embeddings.

Putting aside that currently I cannot save the model (because of https://github.com/stellargraph/stellargraph/issues/1252), my network is performing poorly right now.

Any advice would be highly appreciated.

Thank you for this great python library and keep up the great work.

Thanks for getting in touch!

Awesome!

If you’ve got something working right now, I think that’s fine. It’s definitely true that the StellarGraph class isn’t optimal for end-to-end training of data preprocessing in addition to the graph model.

Do you have a bit more information about what you mean by “performing poorly”? It’s hard to give advice without some specifics.

Thank you for your quick reply!

I may have an error in network construction perhaps. So let me first quickly introduce model architecture.

At the moment the model has 7 inputs:

  • node text - vector of single characters prepared for character convolution
  • node features - text features like punctuation and special characters and if the text is number
  • padding mask - padding mask to identify padded words
  • adjacency matrix of words and its adjacent words in up direction
  • adjacency matrix of words and its adjacent words in down direction
  • adjacency matrix of words and its adjacent words in left direction
  • adjacency matrix of words and its adjacent words in right direction

Node text is then sent to 2 Conv2D(50,(1,3)) layers and maxpooled. Then the result is concatenated with node features and sent to RelationalGraphConvolution(128, num_relationships=4, num_bases=4) layer.
Adjacency matrices are simple with only 0s and 1s denoting the edges and diagonal 1s.

After that, there is MultiHeadAttention (8 heads) layer with inputs of the RCG layer and padding mask.

Finally, there is a simple Dense layer with softmax activation. All other activations in the model are RELUs.

There are 25 labels in total. Labels are highly unbalanced as there are low thousands of words in each document and every “not null” label is in each document only once. For this, I use class weights (using sample weights in fit generator) of 1 for labeled class and 1/pad_size for unlabeled words.

My compile function is:

model.compile(loss=‘categorical_crossentropy’,
optimizer=Adam(lr=0.001),
sample_weight_mode=“temporal”,
weighted_metrics=[‘accuracy’, macro_f1, macro_soft_f1]))

Best results I can get during training is:
loss: 0.0136 - accuracy: 0.3149 - macro_f1: 0.0278 - macro_soft_f1: 0.9709

Experiments so far:

  • decreasing learning rate after 5th epoch -> not better
  • ditching the attention layer -> not better

So, I guess there is something principally wrong in my NN…

If you would rather see my raw messy source code, I am happy to share it :slight_smile:

Thanks for the info!

Many algorithms, including RGCN, normalise the adjacency matrices, so that they’re not an “inflation”. For RGCN, this is dividing by the degree: for each type t, D_{t}^{-1} A_{t}, where D_{t} is the “degree matrix” of A_{t} (sum of the rows).

Another option to try might be removing the RGCN layer as well, and see whether that is helping or hindering.

Thank you for further explanation.
I tried RGCN and ClusterGCN (with and without them) and performance seems to be better with them.

I think the major problem in my case is in sample weights. Is it possible to tell the generator to use weights?

:tada:

Not at the moment: #604 is related.

Could you instead use oversampling or the class_weights parameter to fit (both covered by https://www.tensorflow.org/tutorials/structured_data/imbalanced_data)? For the latter, you may have to be careful to collapse out the size 1 batch dimension. (See #603.)