StellarGraph

Losing edges when importing a networkx graph to stellargraph

Hello everybody,

first of all, thank you for putting out such an easy to understand graph machine learning library! It has made the whole topic a lot easier for me, but the import functions raise questions sometimes.

I use the following code to import a graph into a stellargraph object:

def loadGraphData(path="path/to/example.graphml"):

    dataset_location = os.path.expanduser(path)

    g_nx = nx.read_graphml(dataset_location)
    print("Number of nodes {} and number of edges {} in graph.".format(g_nx.number_of_nodes(), g_nx.number_of_edges()))

    stellar_g_nx = StellarGraph(g_nx)
    print("Number of nodes {} and number of edges {} in graph.".format(stellar_g_nx.number_of_nodes(), stellar_g_nx.number_of_edges()))

    del g_nx

    return stellar_g_nx

As you can see, since there is no read_graphml() for stellargraph, i take the extra step to import a graphml file via networkx and then import that into stellargraph.

The behaviour that confused me was the following output:

Number of nodes 47031 and number of edges 2249619 in graph.
Number of nodes 47031 and number of edges 2247051 in graph.

So, by transfering the object from networkx to stellargraph, i somehow lost ~2500 edges.
Do you have any idea where that behaviour might come from?

Thanks in advance for any help!

(I use stellargraph version 0.6.1 and networkx version 2.3)

Hi, thanks for trying out StellarGraph, we are glad it’s making graph machine learning more approachable for you!

The issue that you raise is difficult to diagnose without knowing more about the graph you are loading. However, we have seen these issues before when loading directed graphs in NetworkX. The issue is most likely caused by loading a directed graph which is converted to undirected when used with a StellarGraph object (which is an undirected NetworkX MultiGraph).

Converting a directed graph to an undirected graph can lose edges, as directed edges in both directions will be converted to a single undirected edge. To see if this is the cause, look at the type of the g_nx object. The NetworkX read_graphml will create either a directed DiGraph or an undirected Graph for homogeneous graphs or a directed MultiDiGraph or an undirected MultiGraph for heterogeneous graphs.

If the g_nx object is a DiGraph or MultiDiGraph try converting it directly to an undirected graph using NetworkX and see if the number of edges is consistent with the StellarGraph object:

 g_nx = nx.read_graphml(dataset_location)
 print("Number of nodes {} and number of edges {} in graph.".format(g_nx.number_of_nodes(), g_nx.number_of_edges()))

 g_nx_undirected = g_nx.to_undirected()
 print("Number of nodes {} and number of edges {} in undirected graph.".format(g_nx_undirected.number_of_nodes(), g_nx_undirected.number_of_edges()))

Currently, StellarGraph does not support directed graphs (we are working to extend our algorithms to support them) so for now the solution is to convert the graph to undirected.

Please let us know if this doesn’t fix your problem!

Best regards,
Andrew

Hi!
Thanks for your swift reply. That could very well be the case, the graph i use is indeed an NetworkX MultiDiGraph. With the code you posted, i get the following results:

Number of nodes 47031 and number of edges 2249619 in graph.
Number of nodes 47031 and number of edges 2249619 in undirected graph.

So we dont have Edge loss here, even though the graph is now a NetworkX MultiGraph.

Then i continued and transformed both into StellarGraph objects with the following code:

stellar_g_nx = StellarGraph(g_nx)
print("Number of nodes {} and number of edges {} in previously directed StellarGraph.".format(stellar_g_nx.number_of_nodes(), stellar_g_nx.number_of_edges()))

undi_stellar_g_nx = StellarGraph(g_nx_undirected)
print("Number of nodes {} and number of edges {} in previously undirected StellarGraph.".format(undi_stellar_g_nx.number_of_nodes(), undi_stellar_g_nx.number_of_edges()))

And got the following result:

Number of nodes 47031 and number of edges 2247051 in previously directed StellarGraph.
Number of nodes 47031 and number of edges 2249619 in previously undirected StellarGraph.

So the Edge loss only occurs when we convert a nx MultiDiGraph to a Stellargraph (which is undirected, as you stated), but not if we have the extra step over a nx MultiGraph.

So, looking further into this, the Neo4j query to count the number of directed edges in both directions is:

MATCH (c)-->()-->(c) RETURN count(c)

and returns 5156, which is roughly twice the Edge loss we are experiencing, so most likely that is the case. Unfortunately the numbers don’t match exactly, but these are two different versions of the database i built and i cant take that one offline and load the version that’s in graphml just now.

For my usecase thats most likely irrelevant since I am using Metapath2Vec anyway, but its always a red flag if your data shrinks without explanation. So thank you very much, it is much clearer now. Maybe a few more import format options is something you could look into in the future, making this reformatting thing a bit less complicated. :slight_smile:

Thanks a lot,
Florin

PS: I am using (a slightly altered version of) the hetionet database which you can get from het.io if you want to reproduce the problem.

Hi Florin,

I’m glad that converting to an undirected graph first prevented this edge loss. However, I do agree that it definitely a red flag that this is occurring. I suspect that this is a NetworkX issue, as we have seen similar issues, to confirm that you could do the following test by converting to a MultiGraph directly:

g_nx_2 = nx.MultiGraph(g_nx)
print("Number of nodes {} and number of edges {} in previously directed StellarGraph.".format(g_nx_2.number_of_nodes(), g_nx_2.number_of_edges()))

g_nx_undi_2 = nx.MultiGraph(g_nx_undirected)
print("Number of nodes {} and number of edges {} in previously undirected StellarGraph.".format(g_nx_undi_2.number_of_nodes(), g_nx_undi_2.number_of_edges()))

Thanks for posting us the dataset that you are using. We are definitely looking for new and interesting use-cases to apply our library too. Please let us know if you get any interesting results!

Best regards,
Andrew

Hi @fratajcz.
As a matter of fact, I experienced the same problem: when I convert an initial directed networkx multigraph to undirected MultiDigraph I lose some of the edges. Quite annoying :). So, I found that the solution for me is to filter the edges in the pandas dataframe first (if you have your data in pandas dataframe), and then to use it directly to pass to networkx multigraph. And the code that does that:

def pandas_directed_to_undirected(data):
data.dropna(inplace=True) # drop nans
print(“In directed graph ‘{}’ number of edges is {}”.format(data.name, data.shape))
data[‘sorted_row’] = [sorted([a,b]) for a,b in zip(data.Source, data.Target)]
data[‘sorted_row’] = data[‘sorted_row’].astype(str)
data.drop_duplicates(subset=[‘sorted_row’], inplace=True)
data.drop([‘sorted_row’], axis=1, inplace=True)
print(“In undirected graph ‘{}’ number of edges is {}”.format(data.name, data.shape))
return(data)
and then:
edges = pandas_directed_to_undirected(pandas_edges)
Gnx = nx.from_pandas_edgelist(edges, source=“Source”, target=“Target”, edge_attr=‘etype’, create_using=nx.MultiGraph())

Hope that helps :slight_smile:

Thanks all for your help!

I tested the two lines that Andrew stated and

g_nx_2 = nx.MultiGraph(g_nx)
print("Number of nodes {} and number of edges {} in previously directed nx Graph.".format(g_nx_2.number_of_nodes(), g_nx_2.number_of_edges()))

g_nx_undi_2 = nx.MultiGraph(g_nx_undirected)
print("Number of nodes {} and number of edges {} in previously undirected nx Graph.".format(g_nx_undi_2.number_of_nodes(), g_nx_undi_2.number_of_edges()))

returns

Number of nodes 47031 and number of edges 2247051 in previously directed nx Graph.
Number of nodes 47031 and number of edges 2249619 in previously undirected nx Graph.

so the loss is actually happening when converting a directed nx Graph to an undirected nx Graph by just applying nx.MultiGraph(directed_graph) to it. Since directed_graph.to_undirected() does not show the edge loss, this method should be preferred i guess.

Maybe you should keep that in mind in your StellarGraph() function, that it checks if the graph is directed, and if it is, it first applys to_undirected() to it? Just some thought :slightly_smiling_face:

@anna.leontjeva i havent thought about creating the graph via a pandas dataframe, but since i already have a few formats i am juggling around with i’d prefer not to. But for everybody starting out from a pandas dataframe this might be worth a look!

Thanks,
Florin