StellarGraph

Trying to model a computer networkk

Hi, I am trying to model a directional graph for node classification and anomaly detection on a computer network. I am not sure how to fit all the features of the edges since the example with multiple features relate only to the nodes. Any link to a similar datset/notebook would help!

graphedges = pd.DataFrame(
    {
        "Time": [100,200,300,400,500],
        "SrcVM": ["a", "b", "c", "d", "a"],
        "DestVM": ["b", "c", "d", "a", "c"],
#        "packets":[4000, 500, 2000, 300, 700] - can't accept this feature for edge
    }
)
graphedges

node_data = pd.DataFrame(
    {"x": [1, 2, 3, 4], "y": [80, 80, 1000, 25]}, index=["a", "b", "c", "d"]
)
node_data

graph_dir = StellarDiGraph(
     {"corner": node_data}, {"line": graphedges}, 
     source_column = "SrcVM" , 
     target_column = "DestVM",
     edge_weight_column = "Time"
)
print(graph_dir.info())

StellarDiGraph: Directed multigraph
Nodes: 4, Edges: 5

 Node types:
  corner: [4]
    Features: float32 vector, length 2
    Edge types: corner-line->corner

 Edge types:
    corner-line->corner: [5]
        Weights: range=[100, 500], mean=300, std=158.114

Hi! Thanks for you reaching out.

Like you noticed, we currently don’t support edge features but we’re very actively working on this https://github.com/stellargraph/stellargraph/pull/1574 https://github.com/stellargraph/stellargraph/pull/1581

One potential work around is to partition the graph by time. E.g. in your example:


graphedges = pd.DataFrame(
    {
        "Time": [100,200,300,400,500],
        "SrcVM": ["a", "b", "c", "d", "a"],
        "DestVM": ["b", "c", "d", "a", "c"],
        "packets":[4000, 500, 2000, 300, 700]
    }
)

# split edges by time step, and use the packets as the edge weight
first_graphedges = graphedges[graphedges.time < 250][["SrcVM", "DestVM",  "packets"]]
second_graphedges = graphedges[graphedges.time >= 250][["SrcVM", "DestVM",  "packets"]]
# more time splits if needed

node_data = pd.DataFrame(
    {"x": [1, 2, 3, 4], "y": [80, 80, 1000, 25]}, index=["a", "b", "c", "d"]
)

first_graph_dir = StellarDiGraph(
     {"corner": node_data}, {"line": first_graphedges}, 
     source_column = "SrcVM" , 
     target_column = "DestVM",
     edge_weight_column = "Time"
)

second_graph_dir = StellarDiGraph(
     {"corner": node_data}, {"line": second_graphedges}, 
     source_column = "SrcVM" , 
     target_column = "DestVM",
     edge_weight_column = "Time"
)

And then train and run one our graph ML algorithm on the partitioned graph. Does this help with your use case?

Just to elaborate abit more, my suggestion above will work if the anomly packet transfer patterns you’re looking for occur close together in time. In this case, partitioning the graph based on time will preserve the patterns you’re interested and also simplifies the graph - meaning your model has less to learn!

In practice you’ll probably want to specify a time scale large enough to capture any suspicuous anomalous behaviour and use this to partition the graph:

time_scale = 200 # your time scale here
use_cols = ["SrcVM", "DestVM",  "packets"]
time_graph_edges = [
    graphedges[(graphedges.Time >= t) & (graphedges.Time < (t + time_scale))][use_cols]
    for t in range(0, graphedges.Time.max(), time_scale)
]

Alternatively, if you can ignore packet information you could try Temporal Random Walks to purely capture the temporal structure of the graph: https://stellargraph.readthedocs.io/en/stable/demos/link-prediction/ctdne-link-prediction.html