Please, advise on interpreting SBM results

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Please, advise on interpreting SBM results

santirdnd
Hi everybody,

I'm not an expert on graph theory, so forgive me if I’m misunderstanding
something. I have a dataset (V=2.5k; E=55k) representing biological entities
and edges linking them based on a similarity measure. This dataset is very
heterogenous with a giant component just shy of 2k nodes while, at the same
time, about 200 singletons. To easy the process I’ve filtered the connected
components with less than 4 nodes, leaving only 2.2k nodes. Upon inspection
the graph seems to reveal many quasi-cliques even in the giant component.
Some of these “putative clusters” are mostly isolated while others have a
lot of links outward, but usually each one have some unique biological
properties.

My goal is to apply a more disciplined approach and, ideally, get to define
the different communities found. The big communities can be found easily
with any algorithm but graph-tool has prove really useful as it has also
detected a community of hub nodes that are instances wrongly entered to the
dataset. However, I get some blocks with mixed results. In fact they are
formed by mostly unconnected “sub-communities”, some of then coming even
from different components of the original graph, with nothing in common
except for their connectivity pattern. As these sub-communities have very
few members (around a dozen of nodes at most) I’m assuming that I’m hitting
the resolution threshold even for nSBM. Is that correct? If it is the case,
there is some way that could help to improve the analysis?

Best,




--
Sent from: http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/
_______________________________________________
graph-tool mailing list
[hidden email]
https://lists.skewed.de/mailman/listinfo/graph-tool
Reply | Threaded
Open this post in threaded view
|

Re: Please, advise on interpreting SBM results

Tiago Peixoto
Administrator
Am 19.03.20 um 11:51 schrieb santirdnd:

> Hi everybody,
>
> I'm not an expert on graph theory, so forgive me if I’m misunderstanding
> something. I have a dataset (V=2.5k; E=55k) representing biological entities
> and edges linking them based on a similarity measure. This dataset is very
> heterogenous with a giant component just shy of 2k nodes while, at the same
> time, about 200 singletons. To easy the process I’ve filtered the connected
> components with less than 4 nodes, leaving only 2.2k nodes. Upon inspection
> the graph seems to reveal many quasi-cliques even in the giant component.
> Some of these “putative clusters” are mostly isolated while others have a
> lot of links outward, but usually each one have some unique biological
> properties.
>
> My goal is to apply a more disciplined approach and, ideally, get to define
> the different communities found. The big communities can be found easily
> with any algorithm but graph-tool has prove really useful as it has also
> detected a community of hub nodes that are instances wrongly entered to the
> dataset. However, I get some blocks with mixed results. In fact they are
> formed by mostly unconnected “sub-communities”, some of then coming even
> from different components of the original graph, with nothing in common
> except for their connectivity pattern. As these sub-communities have very
> few members (around a dozen of nodes at most) I’m assuming that I’m hitting
> the resolution threshold even for nSBM. Is that correct? If it is the case,
> there is some way that could help to improve the analysis?
It's wrong to think that different components should always belong to
different groups.

Think of completely random Erdős–Rényi graph with an average degree
close to one, such that the network is formed by many components. The
correct SBM inference in this case is of model with a single group,
despite the many components. The reason for this is that this division
into components happens by chance, and the nodes that end up together
have no special affinity. If the generative process is run again, the
same nodes will not necessarily belong to the same component.

You should view your results in the same way: nodes end up being grouped
together unless there is clear evidence pointing to the contrary.

Best,
Tiago

--
Tiago de Paula Peixoto <[hidden email]>


_______________________________________________
graph-tool mailing list
[hidden email]
https://lists.skewed.de/mailman/listinfo/graph-tool

signature.asc (849 bytes) Download Attachment
--
Tiago de Paula Peixoto <tiago@skewed.de>