get_edges_prob for every possible nonedge?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

get_edges_prob for every possible nonedge?

deklanw
I'm attempting to use get_edges_prob to find the most likely missing edges out of every possible non-edge. I know every possible edge is O(n^2).

Currently I'm sampling the like this:

    non_edges_probs = [[] for _ in range(len(non_edges))]

    def collect_edge_probs(s):
        s = s.levels[0]

        for i, non_edge in enumerate(non_edges):
            p = s.get_edges_prob([non_edge], [],
                                 entropy_args=dict(partition_dl=False))
            non_edges_probs[i].append(p)

    gt.mcmc_equilibrate(nested_state,
                        force_niter=100,
                        mcmc_args=dict(niter=10),
                        callback=collect_edge_probs,
                        verbose=True)

Is there a way to speed this up at all? If not, is there a heuristic I can use to reduce the number of possibilities?

Currently I'm using vertex similarity measures to cut the possibilities, but I'm wondering if there is a heuristic involving just the NestedState.

Any help appreciated!

_______________________________________________
graph-tool mailing list
[hidden email]
https://lists.skewed.de/mailman/listinfo/graph-tool
Reply | Threaded
Open this post in threaded view
|

Re: get_edges_prob for every possible nonedge?

Tiago Peixoto
Administrator
Am 19.03.20 um 18:33 schrieb Deklan Webster:

> I'm attempting to use get_edges_prob to find the most likely missing
> edges out of every possible non-edge. I know every possible edge is O(n^2).
>
> Currently I'm sampling the like this:
>
>     non_edges_probs = [[] for _ in range(len(non_edges))]
>
>     def collect_edge_probs(s):
>         s = s.levels[0]
>
>         for i, non_edge in enumerate(non_edges):
>             p = s.get_edges_prob([non_edge], [],
>                                  entropy_args=dict(partition_dl=False))
>             non_edges_probs[i].append(p)
>
>     gt.mcmc_equilibrate(nested_state,
>                         force_niter=100,
>                         mcmc_args=dict(niter=10),
>                         callback=collect_edge_probs,
>                         verbose=True)
>
> Is there a way to speed this up at all? If not, is there a heuristic I
> can use to reduce the number of possibilities?
There is no way to avoid looking at all possibilities, but you could
pass the actual list at once, instead of iterating through it and
passing lists of size 1. The reason get_edges_prob() exists and accepts
lists is precisely to speed things up in this case.

Best,
Tiago

--
Tiago de Paula Peixoto <[hidden email]>


_______________________________________________
graph-tool mailing list
[hidden email]
https://lists.skewed.de/mailman/listinfo/graph-tool

signature.asc (849 bytes) Download Attachment
--
Tiago de Paula Peixoto <tiago@skewed.de>
Reply | Threaded
Open this post in threaded view
|

Re: get_edges_prob for every possible nonedge?

deklanw
I was under the impression that passing a list corresponds to getting the probability that *all* the edges are missing.  Indeed, when I try it out I get back a scalar not a np array. I want to collect the probability that each individual edge is missing.

Also, with respect to the heuristics I mention, I just saw this paper "Evaluating Overfit and Underfit in Models of Network Community Structure" use "s_ij = θ_i *θ_ j* l_gi,gj"

If sampling is not computationally feasible, this is what I had in mind.

1) Is there a way built into graph-tool to compute this similarity function efficiently? (i.e., without Python slowing me down)
2) Is there a hierarchical analog, like just summing this similarity at each level?

Thanks as always

On Mon, Mar 23, 2020 at 10:13 AM Tiago de Paula Peixoto <[hidden email]> wrote:
Am 19.03.20 um 18:33 schrieb Deklan Webster:
> I'm attempting to use get_edges_prob to find the most likely missing
> edges out of every possible non-edge. I know every possible edge is O(n^2).
>
> Currently I'm sampling the like this:
>
>     non_edges_probs = [[] for _ in range(len(non_edges))]
>
>     def collect_edge_probs(s):
>         s = s.levels[0]
>
>         for i, non_edge in enumerate(non_edges):
>             p = s.get_edges_prob([non_edge], [],
>                                  entropy_args=dict(partition_dl=False))
>             non_edges_probs[i].append(p)
>
>     gt.mcmc_equilibrate(nested_state,
>                         force_niter=100,
>                         mcmc_args=dict(niter=10),
>                         callback=collect_edge_probs,
>                         verbose=True)
>
> Is there a way to speed this up at all? If not, is there a heuristic I
> can use to reduce the number of possibilities?

There is no way to avoid looking at all possibilities, but you could
pass the actual list at once, instead of iterating through it and
passing lists of size 1. The reason get_edges_prob() exists and accepts
lists is precisely to speed things up in this case.

Best,
Tiago

--
Tiago de Paula Peixoto <[hidden email]>

_______________________________________________
graph-tool mailing list
[hidden email]
https://lists.skewed.de/mailman/listinfo/graph-tool

_______________________________________________
graph-tool mailing list
[hidden email]
https://lists.skewed.de/mailman/listinfo/graph-tool
Reply | Threaded
Open this post in threaded view
|

Re: get_edges_prob for every possible nonedge?

Tiago Peixoto
Administrator
Am 01.04.20 um 02:40 schrieb Deklan Webster:
> I was under the impression that passing a list corresponds to getting
> the probability that *all* the edges are missing.  Indeed, when I try it
> out I get back a scalar not a np array. I want to collect the
> probability that each individual edge is missing.

Yes, this is true.

> Also, with respect to the heuristics I mention, I just saw this paper
> "Evaluating Overfit and Underfit in Models of Network Community
> Structure" use "s_ij = θ_i *θ_ j* l_gi,gj"

This is not substantially faster that what is actually computed in
graph-tool, it is just less accurate.

> If sampling is not computationally feasible, this is what I had in mind.
>
> 1) Is there a way built into graph-tool to compute this similarity
> function efficiently? (i.e., without Python slowing me down)

You should switch to using

    MeasuredBlockState.get_edge_prob()

which is implemented entirely in C++.

The whole approach described in
https://graph-tool.skewed.de/static/doc/demos/inference/inference.html#measured-networks
should be preferred over the older binary classification scheme, as it
completely subsumes it.

> 2) Is there a hierarchical analog, like just summing this similarity at
> each level?

Yes, any of the above approaches work in the exact say way with the
hierarchical model.

Best,
Tiago

--
Tiago de Paula Peixoto <[hidden email]>


_______________________________________________
graph-tool mailing list
[hidden email]
https://lists.skewed.de/mailman/listinfo/graph-tool

signature.asc (849 bytes) Download Attachment
--
Tiago de Paula Peixoto <tiago@skewed.de>