efficient random sampling of paths between two nodes

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

efficient random sampling of paths between two nodes

Franco Peschiera

Good day to all,

(repeated because chose wrong email address).

I wrote an initial message (through the issues site) asking about the possibility of using the all_paths to random sample paths between two nodes in a graph.

It was pointed out that the: 1. The algorithm is deterministic and that 2. It’s not possible to adapt it to iterate in a unbiased random way.

Now I have a couple more questions, namely searching for advice on how to achieve what I want.


Context:

I’ve modeled certain parts of combinatorial optimization problem as a set of graphs that I’m exploiting to efficiently generate variables on the fly during the optimization routine. A variable in my problem is represented by a path between two nodes in one of those graphs.


Specifically, what I want:

  1. I have a big graph.
  2. During each iteration I choose two nodes on the graph (based on some logic irrelevant to the question).
  3. I get all the paths between those two nodes (or I randomly sample them if there are too many).

For the sampling in the third step I’m currently doing the following (assuming a sample of size X and a population of paths of Y):

  1. I get the generator of paths using the function all_paths.
  2. I iterate over the generator using Reservoir sampling and I stop if 1) I explored all paths or 2) I reach a multiple of X (e.g., 3X).

Many times, the sample X is really small compared to the population Y). X can be 500-2000, and Y can be 100.000+.


My problem is:

Since I could potentially do many iterations (and samplings), I would like to have an unbiased sampling method for the Y paths without having to enumerate them all.

One option is to do the sampling during the construction of the paths. I have in mind to just use the get_out_neighbors method on the start node to do my own depth-first search, randomly choosing a neighbor at each moment until I get to the cutoff or the end node. This, I could do for each path until I get to my limit of paths to sample. I’m not sure, though, if this is 1) similar in logic to the all_paths method and 2) if this is close in efficiency (due to using python to iterate and sample instead of C++).

Another option, although a lot more far-fetched, is to create my own all_paths method in C++ and re-compile my own version of graph-tool. Is this realistic?

Are there better options to doing this?

Thanks,

Franco Peschiera


_______________________________________________
graph-tool mailing list
[hidden email]
https://lists.skewed.de/mailman/listinfo/graph-tool