distribution for highly skewed discrete edge weights
dear graph-tool mailing list,
do you have any recommendations for modelling highly skewed distributions of
discrete edge weights?
my network is a multigraph which i collapse to a simple graph with
edge-weights represent the number of edges in the multigraph between two
in my data, the modal edge weight is equal to 1, but the max is above 2000
if i fit a degree-corrected Poisson SBM to the multigraph, every pair of
firms with a large number of edges together are grouped together in their
own block. this makes sense, since the poisson model will assign very low
probability to the edges for any value of a poisson parameter that can
rationalize the otherwise sparse rate of edge formation.
while this is not necessarily a problem per se, the large number of blocks
that this creates complicates my analysis considerably, and it would be
useful to use edge-covariates with a distribution that can account for the
skewness to get a smaller number of blocks.
wondering if Tiago or anyone else on the list can suggest any
transformation-distribution combination that might help. i tried (without
thinking too deeply) the transformation weight = log(weight) + 1 with
real-geometric weights, but minimize_blockmodel_dl() was taking an unusually
long time to fit so i escaped.
the other option that came to my mind was to use a hierarchical SBM and
choose a higher level where the blocks are merged. i haven't read the papers
on hierarchical SBM or used them in graph-tool yet.
Re: distribution for highly skewed discrete edge weights
Am 25.08.20 um 00:28 schrieb sam:
> wondering if Tiago or anyone else on the list can suggest any
> transformation-distribution combination that might help. i tried (without
> thinking too deeply) the transformation weight = log(weight) + 1 with
> real-geometric weights, but minimize_blockmodel_dl() was taking an unusually
> long time to fit so i escaped.
It's difficult to say much without looking at the data. But I would try
to keep the nature of the covariates the same, i.e. if they are discrete
before the transform, they should also be discrete afterwards.
One option to reduce the variance may be to rank the values encountered,
and take the rank index as the transformed covariate. YMMV.