I’ve had some early success with GraphSAGE node embedding. “Success” in this case being clusters that seem to be well-separated.
As I’ve worked with Stellargraph and GraphSAGE more I think that my original network schema wasn’t optimal. Basically, my nodes have few features and I opted to create more nodes with edges to express the relations.
A more concrete example: I’m embedding linguistic data. My current approach would be to create a node for the word, and then separate nodes for each of the syntactic features. So “computers” would have edges to nodes like “category:noun” and “number:plural”. My thinking was then that all noun words would have a common neighbor node.
The other approach that occurs to me is that instead of having all these syntax nodes, each word node has numerical attributes that correspond to these syntactic categories. So “computers” would be like [NOUN, PLURAL, NONE] and “operate” would be [VERB, NONE, PRESENT].
So my question is that is one of these schemas (lots of nodes vs. more node attributes) preferrable? Are there general rules for how one might work compared to the other?