Background: I’m using HinSage because my objective is to perform node classification on a heterogenous graph based on graph structure and node features. I’m trying to reduce the need for manual feature engineering from graph patterns.
In my initial testing with 3 different node types, I setup a single dummy feature on each node (all values set to 1) to get me going where the intent was to add true features once I’d proven HinSage was making some good predictions based on the structure of a simple graph. For my ‘simple test’, I produced a test graph where the classification of one node type A was directly correlated to its attachment of a node of Type C (via node B). HinSage didn’t ‘find’ that pattern. Rather than delve into the specifics of the HinSage implementation I hypothesised that maybe HinSage didn’t have an awareness of the node types in its training - the loss function was pretty flat over the epochs. As a quick hack to test my hypothesis, I generated an artificial random feature on each node, where each node type’s random value was a gaussian distribution around a different mean (because I am performing feature normalisation) - the idea being this would help the algorithm learn the type of nodes by the differing means. This seemed to work and saw the loss function output drop and give good results.
My hack doesn’t feel sustainable as features are added, I’m curious to hear thoughts on what I’m observing and what I should be expecting from HinSage when there are limited features on nodes. I know there are other algorithms what look at graph structure without node features, I was hoping HinSage could do this and roll in features to boot where available.