28 February 2018

In the Node2Vec paper, the authors propose that an embedding for every node in a graph can be learned by trying to maximize the dot product between the feature representation of a node and its neighbors, as determined by BFS (useful for structural equivalence) or DFS (useful for learning homophily, or finding nodes that are connected together).

I think that the strategy for homophily makes sense, but the BFS strategy doesn’t make sense. Imagine that you have have a simple graph that has two two nodes \(A\) and \(B\), and each are surrounded by 5 neighbors, and let’s assume that the neighbors are only connected to \(A\) and \(B\) respectively. In that case, if you wanted to learn structural equivalence, then \(A\) and \(B\) should have a similar embedding, while the neighbors should all have similar embeddings.

However, the objective function that the authors propose won’t achieve this. Instead, it will propose that \(A\) and its neighbors have similar embeddings, and that \(B\) and its neighbors have similar embeddings. Could this be ameliorated by having a “left” and “right” embedding vector for each node? Then, the objective function would be that you maximize the dot product between the left vector of a node, and the right vector of each of its neighbors?

Written on February 28, 2018