Nonparametric Latent Feature Model for Link Prediction

Posted: 2015-07-22 in paper of the day, research
Tags: , , ,

Kurt T. Miller, Thomas L. Griffiths and Michael I. Jordan:
Nonparametric Latent Feature Model for Link Prediction
NIPS 2010

The objective of this paper is to predict links in social networks. The working assumption is that links depend on relational features between entities. The objective of the paper is to simultaneously infer the number of these features and learn which entities have each feature.

Therefore, it is important to extract latent structure representing the properties of individual entities from observed data. Unlike class-based methods (e.g., stochastic block model) the paper focuses on feature-based methods: Each entity has binary-valued latent features that influences its relations with other entities.

For $K$ features $Z$ represents the $N\times K$ binary matrix where each row corresponds to an entity and each column to a feature, such that $Z(i,k)=1$ if entity $i$ possesses feature $k$. The $K\times K$-matrix $W$ contains weights that affect the probability of a link between entities $i,j$. This matrix is independent of the entities, i.e., it does not naturally account for different types of entities (e.g., celebrities, brands, “normal” people). With the assumption that links $y_{ij}$ between entities are independent given $Z,W$, we arrive at the  likelihood model

$p(Y|Z,W) = \prod_{i,j} p(y_{ij}|Z_i, Z_j, W),$
$p(y_{ij}|Z_i, Z_j, W) = \sigma(Z_iWZ_j^T),$

where the product runs over all $n^2$ pairs of entities and $\sigma$ is a sigmoidal function. Note that all-zero columns of $Z$ do not affect the likelihood.

The objective is to infer the posterior $p(Z,W|Y)$ of the feature matrix $Z$ and the weights $W$. Miller et al. use a Gaussian prior $p(W)$ and an Indian Buffet Process (IPB) prior $p(Z)$ for the binary feature matrices $Z$. Due to exchangeability, a Gibbs sampler is used for inference. The generative model with the IPB prior for the latent feature relational model is:

$Z\sim IBP(\alpha)\\ W(k, k^\prime)\sim \mathcal{N}(0,\sigma^2_w)\\ y_{ij} \sim \sigma(Z_i WZ_j^T)$

The model allows to incorporate known features $X$ (e.g., locations) by linearly adding them to the likelihood model, such that

$p(y_{ij}|Z_i, Z_j, W) = \sigma(Z_iWZ_j^T + \beta^TX_{ij}).$

(The model in the paper is a bit more expressive and flexible, see Eq. (3) in the paper.)

For sampling the following is considered:

$p(z_{ik}=1|Z_{-ik}, W, Y) \propto m_k p(Y|z_{ik}=1, Z_{-ik}, W)$,

where $m_k$ is the number of non-zero entries in column $k$, excluding row $i$.

To sample each weight of $W$ that corresponds to non-zero features, the auxiliary sampling trick (Albert & Chib, 1993) is used with the Probit sigmoidal function in the likelihood for Gibbs sampling. This doesn’t work with the logistic, but a Metropolis-Hastings-based scheme can be used instead.

The authors  mention that the intialization of the model is crucial. Therefore, it is initialized with the Stochastic Block Model, which is a special case of the method proposed in this paper with a single feature per entity.

Miller et al. also propose an extension to multiple (independent) relations, such that the likelihood factorizes over the relations. Features $Z$ are shared, but all other parameters are set per relation (e.g., $W, X, \beta$).