StellarGraph

Hinsage link prediction and imbalanced classification

Dear StellarGraph Community,

I have been trying to implement HinSAGE link prediction model in support of a food recommendation use case. The issue is that I have not been able to obtain a decent AUC.
Currently, I have the following bipartite graph where there are 2 set of nodes - consumers and food items:

StellarGraph: Undirected multigraph
 Nodes: 127566, Edges: 156522

 Node types:
  consumer: [127379]
    Features: float32 vector, length 31
    Edge types: consumer-default->food_item
  food_item: [187]
    Features: float32 vector, length 146
    Edge types: food_item-default->consumer

 Edge types:
    consumer-default->food_item: [156522]
        Weights: range=[0, 1], mean=0.013308, std=0.114591
        Features: none

I have attempted to convert the frequency of interactions (or weights) to 1 or 0 since I intend to make it a binary classification problem. I have also come up with the following HinSAGE model:

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_3 (InputLayer)            [(None, 10, 146)]    0                                            
__________________________________________________________________________________________________
input_5 (InputLayer)            [(None, 50, 31)]     0                                            
__________________________________________________________________________________________________
input_6 (InputLayer)            [(None, 50, 146)]    0                                            
__________________________________________________________________________________________________
input_1 (InputLayer)            [(None, 1, 31)]      0                                            
__________________________________________________________________________________________________
reshape (Reshape)               (None, 1, 10, 146)   0           input_3[0][0]                    
__________________________________________________________________________________________________
reshape_2 (Reshape)             (None, 10, 5, 31)    0           input_5[0][0]                    
__________________________________________________________________________________________________
input_4 (InputLayer)            [(None, 10, 31)]     0                                            
__________________________________________________________________________________________________
reshape_3 (Reshape)             (None, 10, 5, 146)   0           input_6[0][0]                    
__________________________________________________________________________________________________
dropout_1 (Dropout)             (None, 1, 31)        0           input_1[0][0]                    
__________________________________________________________________________________________________
dropout (Dropout)               (None, 1, 10, 146)   0           reshape[0][0]                    
__________________________________________________________________________________________________
dropout_5 (Dropout)             (None, 10, 146)      0           input_3[0][0]                    
__________________________________________________________________________________________________
dropout_4 (Dropout)             (None, 10, 5, 31)    0           reshape_2[0][0]                  
__________________________________________________________________________________________________
input_2 (InputLayer)            [(None, 1, 146)]     0                                            
__________________________________________________________________________________________________
reshape_1 (Reshape)             (None, 1, 10, 31)    0           input_4[0][0]                    
__________________________________________________________________________________________________
dropout_7 (Dropout)             (None, 10, 31)       0           input_4[0][0]                    
__________________________________________________________________________________________________
dropout_6 (Dropout)             (None, 10, 5, 146)   0           reshape_3[0][0]                  
__________________________________________________________________________________________________
mean_hin_aggregator (MeanHinAgg multiple             4475        dropout_1[0][0]                  
                                                                 dropout[0][0]                    
                                                                 dropout_7[0][0]                  
                                                                 dropout_6[0][0]                  
__________________________________________________________________________________________________
mean_hin_aggregator_1 (MeanHinA multiple             4475        dropout_3[0][0]                  
                                                                 dropout_2[0][0]                  
                                                                 dropout_5[0][0]                  
                                                                 dropout_4[0][0]                  
__________________________________________________________________________________________________
dropout_3 (Dropout)             (None, 1, 146)       0           input_2[0][0]                    
__________________________________________________________________________________________________
dropout_2 (Dropout)             (None, 1, 10, 31)    0           reshape_1[0][0]                  
__________________________________________________________________________________________________
reshape_4 (Reshape)             (None, 1, 10, 50)    0           mean_hin_aggregator_1[1][0]      
__________________________________________________________________________________________________
reshape_5 (Reshape)             (None, 1, 10, 50)    0           mean_hin_aggregator[1][0]        
__________________________________________________________________________________________________
dropout_9 (Dropout)             (None, 1, 50)        0           mean_hin_aggregator[0][0]        
__________________________________________________________________________________________________
dropout_8 (Dropout)             (None, 1, 10, 50)    0           reshape_4[0][0]                  
__________________________________________________________________________________________________
dropout_11 (Dropout)            (None, 1, 50)        0           mean_hin_aggregator_1[0][0]      
__________________________________________________________________________________________________
dropout_10 (Dropout)            (None, 1, 10, 50)    0           reshape_5[0][0]                  
__________________________________________________________________________________________________
mean_hin_aggregator_2 (MeanHinA (None, 1, 50)        2550        dropout_9[0][0]                  
                                                                 dropout_8[0][0]                  
__________________________________________________________________________________________________
mean_hin_aggregator_3 (MeanHinA (None, 1, 50)        2550        dropout_11[0][0]                 
                                                                 dropout_10[0][0]                 
__________________________________________________________________________________________________
reshape_6 (Reshape)             (None, 50)           0           mean_hin_aggregator_2[0][0]      
__________________________________________________________________________________________________
reshape_7 (Reshape)             (None, 50)           0           mean_hin_aggregator_3[0][0]      
__________________________________________________________________________________________________
lambda (Lambda)                 (None, 50)           0           reshape_6[0][0]                  
                                                                 reshape_7[0][0]                  
__________________________________________________________________________________________________
link_embedding (LinkEmbedding)  (None, 100)          0           lambda[0][0]                     
                                                                 lambda[1][0]                     
__________________________________________________________________________________________________
dense (Dense)                   (None, 1)            101         link_embedding[0][0]             
__________________________________________________________________________________________________
reshape_8 (Reshape)             (None, 1)            0           dense[0][0]                      
==================================================================================================
Total params: 14,151
Trainable params: 14,151
Non-trainable params: 0

However, the AUC seems to perform badly after 20 epochs.

I suspect that the model could improve its performance since the data is highly imbalanced since untrained model shows the followings:

Train Set Metrics of the initial (untrained) model:
	loss: 0.7360
	tp: 25893.0000
	fp: 114971.0000
	tn: 4.0000
	fn: 1.0000
	accuracy: 0.1838
	precision: 0.1838
	recall: 1.0000
	auc: 0.4984

Test Set Metrics of the initial (untrained) model:
	loss: 0.7365
	tp: 2849.0000
	fp: 12804.0000
	tn: 0.0000
	fn: 0.0000
	accuracy: 0.1820
	precision: 0.1820
	recall: 1.0000
	auc: 0.4965

And i could be wrong as there could be other reasons why the model is not improving its performance.

Look forward to solutions or suggestions on this.

Hi, thanks for asking a question!

Are the edges with weight 0 representing “false” edges? As in, consumer A -> food_item B edge with weight 0 represents that A didn’t interact with B? If so, this might be part of the cause: the model will be traversing them when learning, and so the prediction of link between a consumer and a food item will depend on false edges in addition to the true edges, and, there won’t be anything to tell them apart. Until #357 is fixed, the best way to handle this will be to create a StellarGraph with only the true edges (edge weight 1), but still use the false edges for training.

Alternatively, does edge weight = 0 represent a “dislikes” edge? If so, again this won’t be taken into account by the model (until #1329). One way to model this would be to have separate edge types, like “likes” and “dislikes”. The HinSAGE model built using two node types and two edge types might work better.

If neither of these work, it might indeed be some sort of imbalance problem, given only 1.3% of the edges have weight 1. One approach might be oversampling the training edges passed into your generator.flow(...) calls, e.g. duplicating each of the “true” training examples 10× or more to have the classes be a little more similar.

What do you think?

(Also, I edited your post to change ''' to ``` for separating the code blocks to make it easier to read. I hope that’s ok. Markdown can definitely be a bit tricky sometimes!)

Hi huon,

Thankful for the swift reply. As the graph edges were formed from order transaction data, there would be at least 1 interaction between the consumer and food item nodes. In this case, there may not be any form of false edges where 0 interaction between nodes may be formed.

Approach 1:
As per your advice, I have used oversampling such that classification labels 1 and 0 are balanced where 1 means that the number of interactions between nodes is greater than 1 while 0 means that the number of interaction is 1. I have obtained the following performances:

Test Evaluation:
	loss: 0.4780
	tp: 0.0000
	fp: 0.0000
	tn: 12776.0000
	fn: 2877.0000
	accuracy: 0.8162
	precision: 0.0000
	recall: 0.0000
	auc: 0.5001

Approach 2:
I have also tried to run another experiment by using a classification label of 1 and 0 where 1 means that the number of interactions between the nodes is greater than 0 (i.e. true edge). By doing this, I am assuming that StellarGraph’s EdgeSplitter.train_test_split() is able to generate false edges where there should be 0 interaction between nodes.

Please see below for the result:

Test Evaluation:
	loss: 0.0098
	tp: 15653.0000
	fp: 0.0000
	tn: 0.0000
	fn: 0.0000
	accuracy: 1.0000
	precision: 1.0000
	recall: 1.0000
	auc: 0.0000

Seems like both of my attempts still yield weird model performances. I am wondering if the model performance might be affected by the data itself since the number of edges between nodes are not significantly greater than number of nodes. Is the the treatment of true edge and false edge correct in Approach 1 or Approach 2?

(And thank you for pointing out on the Markdown. Helpful :slight_smile: )

(PS: new user can only post 1 image)

That’s a bit strange. I note that it seems that the model is learning very little, since it’s just classifying everything as 0 (since there’s only true negatives and false negatives). This would suggest the model isn’t being set up correctly, there might be some problem in the data (e.g. every data point’s features are identical), or the training procedure is missing any positive examples.

I’m not sure I can debug this effectively remotely, but I can offer some tips to help you debug:

  • simplify the HinSAGE model as much as possible, e.g. reduce to one hidden layer
  • consider using a completely different simple model, to validate that there’s some useful underlying signal. This can be done for the two components of the data separately:
    • Node features: create a link prediction model using only the features (one way to do this might be a HinSAGE model with num_samples=[], but that extreme case doesn’t work), something like:
      from tensorflow.keras import Model, Input, layers, backend as K
      from stellargraph.layer import link_classification
      
      food_inp = Input(146)
      consumer_inp = Input(31)
      
      food_hidden = layers.Dense(32)(food_inp)
      consumer_hidden = layers.Dense(32)(consumer_inp)
      
      output = link_classification()([food_hidden, consumer_hidden])
      
      model = Model([food_inp, consumer_inp], output)
      model.compile("adam", loss="bce")
      
      The link_classification call could even be simplified to something more basic like K.sigmoid(K.batch_dot(food_hidden, consumer_hidden)). The model can then be trained against arrays/DataFrames of the features for each end of the links.
    • Graph structure: use an algorithm that only looks at the graph structure, like Node2Vec.

This looks a bit closer to how I would frame a link prediction problem, where I’m predicting any true edge.

However, that test evaluation looks like it only has positive examples (there’s only true positives). For a meaningful evaluation of the model performance, there will need to be some negative examples in the test set. EdgeSplitter.train_test_split returns some negative examples in this manner, but they can also be computed manually by random sampling (in a large sparse graph like this, a randomly sampled edge is unlikely to be a true edge). For instance, if the food_df variable is a Pandas DataFrame containing the food data (and similarly for consumer_df), a random sample of 100 edges could be done like:

import numpy as np
food_ids = food_df.index.to_numpy()
consumer_ids = consumer_df.index.to_numpy()

negative_edges = pd.DataFrame(
    {
        "source": np.random.choice(food_ids, size=100),
        "target": np.random.choice(consumer_ids, size=100),
    }
)

This can be used to create an appropriate set of negative edges for training and testing.

(Just to be clear, I think the first plot there is from Approach 2?)


Does any of this help?

(Oh, btw, another approach instead of oversampling is to use the class_weights argument to fit: https://www.tensorflow.org/tutorials/structured_data/imbalanced_data#class_weights )

Hi huon,

I have continued to use approach 2 coupled with EdgeSplitter.train_test_split. Fortunately, I am able to achieve AUCs of 0.8 plus to 0.9 for both train and test data since the sampled data (from EdgeSplitter.train_test_split) has an equal proportion of positive and negative edges. Please see the improved result below:

Using class_weight would be an elegant solution to resolving the imbalanced classification problem.

However, there seems to be some issues with the way I convert the data to the following StellarGraph format:

StellarGraph: Undirected multigraph
 Nodes: 127566, Edges: 156522

 Node types:
  consumer: [127379]
    Features: float32 vector, length 31
    Edge types: consumer-link->food
  food: [187]
    Features: float32 vector, length 146
    Edge types: food-link->consumer

 Edge types:
    consumer-link->food: [156522]
        Weights: all 1 (default)
        Features: none

This format has a downstream impact on the way EdgeSplitter.train_test_split split the data. It returns a list of IDs in 2 formats: (i) 'consumer-301742911' 'food-0060']' or (ii) 'food-0060' 'consumer-301742911']'. The (ii) format gives an # ValueError: Node pair (food-0060, consumer-301742911) not of expected type (consumer, food) error when the train/test data is put into train_model_generator.flow.

Alternatively, I have tried to convert to directed StellarGraph instead since consumer can order food but not the other way round. I can also avoid the situation where EdgeSplitter.train_test_split might return the train/test data in format (ii) and this is how it looks like:

StellarDiGraph: Directed multigraph
 Nodes: 127566, Edges: 156522

 Node types:
  consumer: [127379]
    Features: float32 vector, length 31
    Edge types: consumer-link->food
  food: [187]
    Features: float32 vector, length 146
    Edge types: none

 Edge types:
    consumer-link->food: [156522]
        Weights: all 1 (default)
        Features: none

Although this could ensure that EdgeSplitter.train_test_split return train/test data in format (i) only, it has a downstream impact on the HinSAGE which returns an IndexError: list index out of range error. Upon checking the source codes, I think the issue could be caused by HinSAGELinkGenerator class as it could be due to the way it uses SampledHeterogeneousBreadthFirstWalk function.

The issue could be fixed if we were to introduce a new DirectedHinSageLinkGeneratorclass similar to DirectedGraphSAGELinkGenerator class that uses DirectedBreadthFirstNeighbours function.

Please see below for the source codes I am referring to for HinSAGELinkGenerator class and DirectedGraphSAGELinkGenerator class:
https://stellargraph.readthedocs.io/en/stable/_modules/stellargraph/mapper/sampled_link_generators.html?highlight=f"Node%20pair%20({src}%2C%20{dst})%20not%20of%20expected%20type%20(%7Bexpected_src_type%7D%2C%20%7Bexpected_dst_t#

Thank you for the help thus far :slight_smile:

That looks much better! Out of curiosity, does the HinSAGE model outperform any other models you’ve tried?

Ah, I think the EdgeSplitter class doesn’t allow for considering the source/target node types when computing edges. It looks like there’s a couple of things that make this relatively easy to handle in this particular case:

  • this is a bipartite graph, with consumers only connecting to foods and vice versa
  • the IDs have a consistent ordering: consumer IDs are always before food IDs, because "consumer-..." < "food-..."

This allows doing something like

result_graph, sampled_edges, labels = edge_splitter.train_test_split(...)
sampled_edges.sort(axis=1)

and then the first column sampled_edges[:, 0] will be only consumer IDs, and sampled_edges[:, 1] will be only food IDs.

The option of manually generating negative edges with np.random.choice might be the easiest way to be sure this works, though.

Because there’s never an edge food -> consumer, I don’t think it’s strictly necessary to model this as a directed graph. The information in the undirected edges is identical to the information in the directed ones.

Hi huon,

My team’s aim is to compare graph neural network models such as HinSAGE against traditional recommender system models. I can let you know the model performances when my team finishes the experiments. The sort function (i.e. np.sort) that you recommended was helpful. Thank you for explaining the inner workings of EdgeSplitter.

While fine-tuning the HinSAGE, I have the following questions:

Qn1: Generate train-test data without using EdgeSplitter.train_test_split

Understand that it will be easier to use np.random.choice to generate the negative edges compared to EdgeSplitter.train_test_split. As you have demonstrated, I could generate the following dataframe of negative edges:

import numpy as np
food_ids = food_df.index.to_numpy()
consumer_ids = consumer_df.index.to_numpy()

negative_edges = pd.DataFrame(
    {
        "source": np.random.choice(food_ids, size=100),
        "target": np.random.choice(consumer_ids, size=100),
    }
)

The question is that how do I create a train graph even if I could generate the positive edges and negative edges using numpy array or pandas dataframe (just like how EdgeSplitter.train_test_split could return G_train, edge_ids, edge_labels )?

Qn2: How can I increase the p in edge_splitter_test.train_test_split?

Currently, I am generating the following samples based on p = 0.0009:

Network has 156522 edges of type link
** Sampled 140 positive and 140 negative edges. **

I wanted to increase the p so that I could use more samples to build the model. By using p = 0.009, this caused an error - ValueError: Unable to sample 1408 negative edges. Consider using smaller value for p.

Could the error be caused by the fact the graph network is sparse?
What is the typical p value to be used for generating samples?

That’d be interesting :smile:

Ah, that’s true. I guess one approach is using EdgeSplitter and do the post-processing to get it right.

Another option would be doing the sampling before creating the graphs. For example, given a DataFrame edges of the source/target links, one can split it up using https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html and create independent StellarGraphs:

from sklearn import model_selection

graph_edges, positive_edges = model_selection.train_test_split(edges, test_size=10000)

# create the graph with the 146522 "graph edges"
graph = StellarGraph(nodes, graph_edges)

# use the 10000 sampled edges along with random negative edges
negative_edges = ... # random sampling
test_edges = pd.concat([
    positive_edges.assign(label=1), 
    negative_edges.assign(label=0)
])
# test_edges has columns: 'source', 'target' and positive-or-negative 'label'

If you want to do even more model selection or other validation along the lines of https://stellargraph.readthedocs.io/en/stable/demos/link-prediction/node2vec-link-prediction.html#Construct-splits-of-the-input-data , you could use additional train_test_split calls to create appropriate DataFrames.

Does that help?

Hm, being sparse should mean that large p: the limit is the total possible edges (for a bipartite graph like this, this will be num consumers * num foods) minus the number of positive edges. For this graph, that would imply that there’s space for 23.7 million negative edges.

That is to say, I don’t know what’s going on there.

Hi huon,

thanks for the help :slight_smile:

yes the sampling before creating graphs would help especially when I need to create a validation/test dataset for comparing between models - HinSAGE vs other non StellarGraph models.

I guess why p has to be set low is also because there are not many edges that are specifically “consumer -> food” in the network.