efficient random sampling of paths between two nodes
Good day to all,
(repeated because chose wrong email address).
I wrote an initial message (through the issues site) asking about the possibility of using the all_paths to random sample paths between two nodes in a graph.
It was pointed out that the: 1. The algorithm is deterministic and that 2. It’s not possible to adapt it to iterate in a unbiased random way.
Now I have a couple more questions, namely searching for advice on how to achieve what I want.
I’ve modeled certain parts of combinatorial optimization problem as a set of graphs that I’m exploiting to efficiently generate variables on the fly during the optimization routine. A variable in my problem is represented by a path between two nodes in one of those graphs.
Specifically, what I want:
I have a big graph.
During each iteration I choose two nodes on the graph (based on some logic irrelevant to the question).
I get all the paths between those two nodes (or I randomly sample them if there are too many).
For the sampling in the third step I’m currently doing the following (assuming a sample of size X and a population of paths of Y):
I get the generator of paths using the function all_paths.
I iterate over the generator using Reservoir sampling and I stop if 1) I explored all paths or 2) I reach a multiple of X (e.g., 3X).
Many times, the sample X is really small compared to the population Y). X can be 500-2000, and Y can be 100.000+.
My problem is:
Since I could potentially do many iterations (and samplings), I would like to have an unbiased sampling method for the Y paths without having to enumerate them all.
One option is to do the sampling during the construction of the paths. I have in mind to just use the get_out_neighbors method on the start node to do my own depth-first search, randomly choosing a neighbor at each moment until I get to the cutoff or the end node. This, I could do for each path until I get to my limit of paths to sample. I’m not sure, though, if this is 1) similar in logic to the all_paths method and 2) if this is close in efficiency (due to using python to iterate and sample instead of C++).
Another option, although a lot more far-fetched, is to create my own all_paths method in C++ and re-compile my own version of graph-tool. Is this realistic?