StellarGraph

HinSage question

Hi all, thanks for taking time to look through my question. Newbie question here, I have 2 node types for example clients and servers. In this example, servers cannot interact with servers only with clients. Here is the output of info created with,

G = StellarDiGraph({“client”: node_client, “server”: node_server}, edges=edge, edge_type_column=“type”, edge_weight_column=“weight”)

Graph info

StellarDiGraph: Directed multigraph
Nodes: 781029, Edges: 638688

Node types:
client: [780603]
Features: float32 vector, length 13
Edge types: client-upload->server, client-transfer->client
server: [426]
Features: float32 vector, length 8
Edge types: none

Edge types:
client-upload->server: [623355]
Weights: range=[1, 3.1007e+07], mean=84553.3, std=269460
client-transfer->client: [15333]
Weights: range=[10000, 7.3e+06], mean=91392.3, std=148739

Now I start to define the train and test sets,

train-test-split and generator

edges_train, edges_test = train_test_split(
edge, train_size=train_size, test_size=test_size, random_state=7
)
labels_train = edges_train[“weight”]
edgelist_train = [tuple(x) for x in edges_train[[“source”, “target”]].values]
train_gen = generator.flow(edgelist_train, labels_train, shuffle=True)

I get an error at train_gen saying that “Node pair (client_id1, client_id2) not of expected type (client, server)”. I realize this is a client to client transfer instead of client to server upload (client_id2 is not in the node_server dataframe). To overcome this I placed the client and server info in the same dataframe, but this doesn’t make sense as these are two different groups with different features. Whats the right way of defining the DiGraph for this problem?

Thank you again for your help! :smiley:

Hi munchmuch, thanks for your question!

Based on the details included, it seems like you’re looking to do link prediction using HinSAGE.

The digraph as shown by the graph info looks good to me, but one thing to keep in mind with the way HinSAGE operates on heterogeneous graphs is that you must define your task to be predicting on a particular node/edge type. So in your case, you’d need to decide whether you want to be predicting links of type client-upload->server or client-transfer->client. Then, when calling generator.flow, make sure that the edgelist you’re passing in are all of that same type that you want to predict.

The reason for this is that the architecture of HinSAGE is a generalisation of GraphSAGE that creates additional weight matrices to account for different node types (and therefore different feature dimensions) in the graph, but the types being fed into the model must remain consistent - the HinSAGE documentation has a more thorough explanation (https://stellargraph.readthedocs.io/en/stable/hinsage.html)

Although HinSAGE allows you to aggregate information from different node types, the “target” nodes for supervised attribute inference must still be of a particular node type.

Similar to the way unsupervised learning was restricted to learning embeddings for a particular node type in a heterogeneous setting, the link prediction algorithm is also limited to learning links of a particular type.

Hope that helps! And let us know if you think you’ve already defined your task in such a way and you’re still running into issues.

Hi Kevin thank you! Perfectly answers my doubts :smiley:

1 Like