Designing a Network Schema (Edges vs. Node Features)

I’ve had some early success with GraphSAGE node embedding. “Success” in this case being clusters that seem to be well-separated.

As I’ve worked with Stellargraph and GraphSAGE more I think that my original network schema wasn’t optimal. Basically, my nodes have few features and I opted to create more nodes with edges to express the relations.

A more concrete example: I’m embedding linguistic data. My current approach would be to create a node for the word, and then separate nodes for each of the syntactic features. So “computers” would have edges to nodes like “category:noun” and “number:plural”. My thinking was then that all noun words would have a common neighbor node.

The other approach that occurs to me is that instead of having all these syntax nodes, each word node has numerical attributes that correspond to these syntactic categories. So “computers” would be like [NOUN, PLURAL, NONE] and “operate” would be [VERB, NONE, PRESENT].

So my question is that is one of these schemas (lots of nodes vs. more node attributes) preferrable? Are there general rules for how one might work compared to the other?

Hi cfalk, thanks for using StellarGraph, and that’s a very good question!

I’m not sure about a “general rule”, but it can be useful to think about what type of graph would be most suitable for the algorithm you’ve decided to use.

If you’re using an algorithm that uses node features (like GraphSAGE), I’d say it’s worth a try to encode more information as node features instead of having additional nodes. You could almost think about it as pre-processing a graph convolution step, where your nodes already have the information from their first-hop neighbours. However, you would miss out on the second-hop links (e.g. prediction on computer node will no longer be influenced by a path like computer <-> category:noun <-> desk if the information from category:noun was converted to node features) so it would also depend on what other links you have in the graph that are still connecting these nodes in a meaningful way.

For GraphSAGE, you’d also want to think about the num_samples parameter used with the model based on the graph. If the graph is defined in such a way that you want to aggregate information across many more hops over the neighbourhood, then num_samples should grow in length, whereas if you’ve “condensed” the graph, maybe just one or two hops is enough.

Also, You could consider going the opposite direction and encode everything as additional nodes / edge types, to create a “knowledge graph”. We have a few knowledge graph algorithms like ComplEx and DistMult - these demos are for link prediction but the node embeddings can be obtained via the embeddings method. With the linguistic data example, a knowledge graph could have edges on computers like: (computers)-[category]->(noun), (computers)-[number]->(plural), and so on for all attributes for computers.

Hope that gives you some ideas to try!



Thanks! There is a semantic component to my network (that I didn’t mention for the sake of simplicity). That certainly mirrors a knowledge graph closely. So I hadn’t considered ComplEx and DistMult.

Maybe I should explain my intended downstream application? That might make my requirements clearer. I’ve already mentioned my goal of representing words as nodes and then embedding those nodes. My downstream goal is to use those embeddings in an LSTM and train it on partially annotated data. The LSTM would generate values for out-of-vocabulary words (those without known good embeddings). Those OOV values would be used to generate new nodes for the graph. That intended application was what first drew me to GraphSAGE.

Hi cfalk,

Thanks for providing more details! Hmm I’m not super familiar with the particular domain, but I’m picturing something like:

  1. generate embeddings for your vocabulary words using the graph structure and node features
  2. train an LSTM model on sentences so that you can generate embeddings for the OOV words based on its context around other nodes that do have embeddings
  3. use these generated embeddings to create new nodes in the original graph

Does that sound roughly correct?

In terms of deciding which algorithm to use… I think it’s a bit more difficult to reason about which graph embedding algorithm is appropriate based on the downstream application, and instead I would tend to think about which is appropriate based on what the data looks like (such as things I mentioned in the previous response).

That being said, I do recall some data exploration we’ve done, where GraphSAGE embeddings seemed to be heavily influenced by node features, so I’d say it’s worth considering whether you think the node features intuitively seem like good predictors for the downstream task.

Is there a more specific problem with the results you’ve gotten so far that’s making you suspect that the approach is suboptimal? That might help me give you some more concrete suggestions!

What you’re picturing is correct.

So my data is structured as an ontology. There’s a mixture of semantic and syntactic information. The number of different pieces of syntactic information is fairly small and could be considered a finite set. Syntactic information could be input into an sklearn DictVectorizer and rendered as vector of numeric values. The semantic information is basically a knowledge graph. It’s similar to RDF triples. So the edge types are important.

I don’t have a specific problem with the results so far. Although I haven’t progressed so far as to try to generate new nodes.

Implementing a different attribute vector appears to have improved the clustering. I used sklearn’s DictVectorizer and ended up with a 35-dimensional vector.

Nice! The embeddings definitely seem like they’ve improved. Have you tried training the LSTM? It’d be interesting to see how the generated embeddings appear on that same plot.

In terms of creating new nodes based on the generated embeddings, I guess it depends what additional information you consider as a requirement in order to place a generated embedding as a new node in the graph.

If you just need links that connect the new nodes to other nodes in the graph, you could try running a link prediction task once you have the generated embeddings. The Node2Vec link prediction demo is probably the most relevant notebook to look at for this scenario - the first half of the notebook runs Node2Vec to generate node embeddings before proceeding with predicting links, so you could swap out the first half with your own workflow to generate embeddings via LSTM.

I’m annotating some samples to try with an LSTM. For a small set of relatively small samples I’m not sure how “profound” the initial results will be. But there has to be a first step sometime.

There is also a demo in the Github code for link prediction with GraphSAGE, correct? I’d also like to predict the numerical attribute vector for a node. And it would also be useful if the links predicted had a label associated with them. I think DistMult would do something like this? At least initially I’ll focus on what GraphSAGE can provide.

There is also a demo in the Github code for link prediction with GraphSAGE, correct?

Yes the GraphSAGE link prediction demo is available here.

I’d also like to predict the numerical attribute vector for a node.

And it would also be useful if the links predicted had a label associated with them. I think DistMult would do something like this?

Both of the above are possible with many different algorithms available in the library, and yes DistMult would probably be a good choice for the link prediction if you also want to predict the label on the link, since it predicts triples.

If you want to focus on GraphSAGE for now, predicting the numerical attribute vector for a node can probably be achieved by setting up the final layer(s) and loss of the model as a regression task on top of the GraphSAGE layers. Unfortunately all our demos for node-based tasks are for classification instead of regression so there’s no other demo I can point you to here, but let me know if you’d like more details on how to do this!

I previously pointed you to Node2Vec instead of GraphSAGE for the link prediction part since I figured you’d be predicting links based on embeddings rather than based on a “graph”. Usually if you’re starting with a graph then you’d want something like GraphSAGE/GCN/etc to perform graph convolution to obtain “embeddings” in the hidden layer of the network and feed them as input to link prediction. However if you’ve already obtained all node embeddings from a previous step, then there’s no need for a graph-based method to predict links, you can use whatever binary classification workflow you’re familiar with - the Node2Vec demo in this section uses LogisticRegressionCV from scikit-learn.

That’s an interesting idea about re-using/extending the Model to solve the attribute vector problem. I may have to come back to that after I’m satisfied that the solution will be useful.

I assume that I’ll want to plug Node2Vec embeddings into the link prediction classifier? Or can I mix and match with any time of embedding?

I assume that I’ll want to plug Node2Vec embeddings into the link prediction classifier? Or can I mix and match with any time of embedding?

You can just mix and match with any type of embedding, so you don’t need to run Node2Vec if you’ve already obtained embeddings using another method!

If you have node embeddings you’d need a binary operator that combines a pair of node embeddings into a “link embedding” - this can be any function you choose, but I’ve personally only tried the four binary operators that were used in the Node2Vec paper which are: Hadamard, L1, L2, Average (they’re included in the Node2Vec demo).

So once you have link embeddings, “link prediction” is more-or-less the same as any binary classification task, so you need positive and negative examples of links to feed into a binary classification model of your choice. We have a helper class called EdgeSplitter to help with sampling the positive and negative examples if you have a graph defined.

I did do a quick Node2Vec implementation of embeddings. With the gensim library I could do the vector operations and sometimes get useful results. Like “king” - “man” + “woman” = “queen”. But often the inverse would fail like “queen” - “woman” + “man” = “object”.

I do have a basic LSTM model training a regression to… something. The accuracy is very low, which I expected with a small data set. I may focus on just one example, annotate it 100%, and then do a cloze test by selective deleting one word to see if it can predict the expected result.