efficient random sampling of paths between two nodes

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

efficient random sampling of paths between two nodes

Franco Peschiera

Hello again Tiago,

First of all thanks for the resources and help. As you may expect, I somewhat lack a specialized background in graph theory and so I sometimes miss the correct terminology.

As a reminder, this is a DAG. I haven’t, yet, implemented the random unbiased generator version of all_paths you kindly shared in those links, although I have it listed as a possible future step.

As a temporary workaround, I’ve been doing the following each time I want to randomly sample paths between two nodes:

  1. I calculate the min and max length of paths between the two nodes (the min using topology.shortest_distance, a good aproximation of the max is trivial).
  2. I sample a number between those min and max`.
  3. I execute topology.all_paths with that number as cutoff argument to obtain the paths generator.
  4. I then execute some sampling from that generator, with an iteration limit.

This is, of course, just a very crude and biased way of sampling, but at least it returns a different set of paths at each time it is executed. Until now, I’ve been assuming each edge has a weight of 1.

I now would like to test giving edges a weight in order for that cutoff argument to use it. I know the shortest_distance function accepts weights of edges in order to do Dijkstra. Could the same be done with all_paths such that the search is stopped if an accumulated weighted-distance is reached? Is there any (alternative) way of controlling which paths I get from the all_paths besides that cutoff argument? Or whatever specialized logic regarding path sampling / filtering I would have to implemented myself (just like the examples you shared)? Would this be something you would consider adding to graph-tool?

For example, I’ve been even wondering if I could just create a temporal view from the graph, by a randomly filtering edges or nodes before samplinh the paths, so as to, again, reduce the graph I will be sampling at each iteration.

as always thanks for your time and help!

Franco Peschiera

Message: 2
Date: Mon, 23 Mar 2020 21:37:52 +0000
From: Tiago de Paula Peixoto <[hidden email]>
To: Main discussion list for the graph-tool project
        <[hidden email]>
Subject: Re: [graph-tool] efficient random sampling of paths between
        two nodes
Message-ID: <[hidden email]>
Content-Type: text/plain; charset=utf-8

Am 23.03.20 um 22:14 schrieb Franco Peschiera:
> Hello Tiago,
>
> First of all, thanks for your time.
>
> I see what you mean by having a biased logic that would prefer shorter
> paths to longer ones, I had not thought about that.
>
> Regarding the self-reference part, I think it would not be a problem
> because of the structure of my particular (directed) graph. In fact,
> each node represents an assignment *at some given time period* and the
> outward neighbors of a node represent assignments *in the future*. In
> this way, a path can never visit a previously visited node since there
> are no possible cycles. In fact I can easily calculate the shortest and
> longest possible path between two nodes (shortest: using graphql's
> `shortest_distance` method, longest= number of periods in between the
> two nodes).

Well, for DAG (directed acyclic graphs) the situation is quite
different, you should have said so in the beginning.

> So the paths I want to create (or sample) are just the different ways
> one can go from a node N1 (in period P1) to node N2 (in period P2 > P1).?
> I think that in my graph I could just sample neighbors with a weight
> that depends on how far they are (in number of periods) from the node:
> the farthest neighbor will have the least probability of being chosen.
> This way, I'd compensate the fact that shorter paths take less hops.
>
> What do you think?

Why do I get the impression I'm using google more than you to answer
your question?

Here is an approach using rejection sampling:

https://math.stackexchange.com/questions/2673132/a-procedure-for-sampling-paths-in-a-directed-acyclic-graph

Another approach is to count the number of paths that go through each
node (this is feasible for DAGs) and use this to sample directly, see:

https://pdfs.semanticscholar.org/0d74/e82c41124f83c842d5432abcb914ed1f410f.pdf

Best,
Tiago


--
Tiago de Paula Peixoto <[hidden email]>


_______________________________________________
graph-tool mailing list
[hidden email]
https://lists.skewed.de/mailman/listinfo/graph-tool
Reply | Threaded
Open this post in threaded view
|

Re: efficient random sampling of paths between two nodes

Tiago Peixoto
Administrator
Am 15.04.20 um 10:42 schrieb Franco Peschiera:
> I now would like to test giving edges a weight in order for that
> |cutoff| argument to use it. I know the |shortest_distance| function
> accepts weights of edges in order to do Dijkstra. Could the same be done
> with |all_paths| such that the search is stopped if an accumulated
> weighted-distance is reached?

I'm afraid not.

> Is there any (alternative) way of
> controlling which paths I get from the |all_paths| besides that |cutoff|
> argument?

There is no other argument to the function, which is deterministic.

You could randomize the order of the edges in the graph (by removing and
adding them again in a random order), which will change which paths are
shown first, but would not give you a proper unbiased sampling.

> Or whatever specialized logic regarding path sampling /
> filtering I would have to implemented myself (just like the examples you
> shared)? Would this be something you would consider adding to graph-tool?

I think adding an algorithm to perform uniform sampling of paths in DAGs
would be useful, but I can't promise I will find the time anytime soon
to implement one.

Best,
Tiago

--
Tiago de Paula Peixoto <[hidden email]>


_______________________________________________
graph-tool mailing list
[hidden email]
https://lists.skewed.de/mailman/listinfo/graph-tool

signature.asc (849 bytes) Download Attachment
--
Tiago de Paula Peixoto <tiago@skewed.de>
Reply | Threaded
Open this post in threaded view
|

Re: efficient random sampling of paths between two nodes

Franco Peschiera
In reply to this post by Franco Peschiera

Hello Tiago,

Thanks again for the quick answers and feedback.

In fact, the more I think about how to take the most advantage of the graph, the more I think that in my case what I would really like is to control the way the paths are being sampled. So, the process would become a biased sampling where I’m the one giving the bias in the form of the probability of choosing each edge when traversing from the origin to the destination. This probability would depend on information of the node and additional information that will depend on each iteration of my algorithm. It could also be an edge property, I guess.

One caveat is that DFS will not be possible since I want to make each sampling independent from the previous one and so getting several paths would be less efficient. Although I’d assume that this is not too much of a problem given that I’d need a smaller sample if I know it’s probably going to be a better one.

I was thinking on doing something on the lines of the following (I’ve only tested it with a small graph):


def sample_path_from_nodes(node1, node2, all_weights, cutoff):
    # all_weights is a dictionary on the graph edges
    current = node1
    path = []
    visited = set()
    while True:
        if current == node2:
            # we finished!
            return path + [current]
        visited.add(current)  # maybe we should add (current, len(path))
        neighbors = set(current.out_neighbors()) - visited
        if len(path) >= cutoff or not len(neighbors):
            # this path does not reach node2
            # we backtrack
            # and we will never visit this node again
            current = path.pop()
            continue
        if not len(path) and not len(neighbors):
            # there is no path to node2
            # this should not be possible
            return None
        # we haven't reached node2
        # but there is still hope
        path.append(current)
        weights = [all_weights.get((current, n), 1) for n in neighbors]
        _sum = sum(weights)
        weights = [w/_sum for w in weights]
        current = np.random.choice(a=list(neighbors), p=weights)

I fear this will not be as efficient as just getting a lot of paths via all_paths. Do you think this makes sense for sampling with preferences on edges? And if so, do you think it will perform a lot better if I did it as a graph-tool C++ extension? I guess I can always test the python version and see how it goes…

thanks!

Franco


Date: Wed, 15 Apr 2020 10:59:51 +0200
From: Tiago de Paula Peixoto <[hidden email]>
To: [hidden email]
Subject: Re: [graph-tool] efficient random sampling of paths between
        two nodes
Message-ID: <[hidden email]>
Content-Type: text/plain; charset="utf-8"

Am 15.04.20 um 10:42 schrieb Franco Peschiera:
> I now would like to test giving edges a weight in order for that
> |cutoff| argument to use it. I know the |shortest_distance| function
> accepts weights of edges in order to do Dijkstra. Could the same be done
> with |all_paths| such that the search is stopped if an accumulated
> weighted-distance is reached?

I'm afraid not.

> Is there any (alternative) way of
> controlling which paths I get from the |all_paths| besides that |cutoff|
> argument?

There is no other argument to the function, which is deterministic.

You could randomize the order of the edges in the graph (by removing and
adding them again in a random order), which will change which paths are
shown first, but would not give you a proper unbiased sampling.

> Or whatever specialized logic regarding path sampling /
> filtering I would have to implemented myself (just like the examples you
> shared)? Would this be something you would consider adding to graph-tool?

I think adding an algorithm to perform uniform sampling of paths in DAGs
would be useful, but I can't promise I will find the time anytime soon
to implement one.

Best,
Tiago

--
Tiago de Paula Peixoto <[hidden email]>

------------------------------

Subject: Digest Footer

_______________________________________________
graph-tool mailing list
[hidden email]
https://lists.skewed.de/mailman/listinfo/graph-tool


------------------------------

End of graph-tool Digest, Vol 147, Issue 24
*******************************************


_______________________________________________
graph-tool mailing list
[hidden email]
https://lists.skewed.de/mailman/listinfo/graph-tool
Reply | Threaded
Open this post in threaded view
|

Re: efficient random sampling of paths between two nodes

Tiago Peixoto
Administrator
Am 20.04.20 um 19:30 schrieb Franco Peschiera:
> I fear this will not be as efficient as just getting a lot of paths via
> |all_paths|. Do you think this makes sense for sampling with preferences
> on edges?

I'll refrain from judging whether it makes sense. It's up to you to
decide for yourself what you want to do.

> And if so, do you think it will perform a lot better if I did
> it as a graph-tool C++ extension?

In general C++ will speed up any equivalent algorithm when compared to
pure python, often by factors of up to 200x. But a good advice is to
focus first on defining your problem well, and writing a correct
implementation first, and worrying about optimization later.

Best,
Tiago

--
Tiago de Paula Peixoto <[hidden email]>


_______________________________________________
graph-tool mailing list
[hidden email]
https://lists.skewed.de/mailman/listinfo/graph-tool

signature.asc (849 bytes) Download Attachment
--
Tiago de Paula Peixoto <tiago@skewed.de>