Hi,
I am computing a weight for each edge, in a graph of some million nodes and about 50 mil edges. Initially, it was doing 25000 edges per minute, so it should have finished processing in about 30+ hours. Now I check the server for results after more than 72 hours, and the process is still running, but very slowly, doing less than 1000 edges per minute. Is there anything I am doing wrong, or what could be the reason for such a slowdown? I list here the code: def compute_exclusivities(g): count = 0 excl = g.new_edge_property("double") g.edge_properties["Exclusivity"] = excl edges = g.get_edges([g.edge_properties["label_id"]]) for edge in edges: edges_of_head = g.get_out_edges(edge[0], [g.edge_properties["label_id"]]) count_edges_ofhead_sametype = len(np.where(np.where(edges_of_head == edge[2])[1] == 2)[0]) edges_of_tail = g.get_in_edges(edge[1], [g.edge_properties["label_id"]]) count_edges_oftail_sametype = len(np.where(np.where(edges_of_tail == edge[2])[1] == 2)[0]) excl[g.edge(edge[0], edge[1])] = 1 / (count_edges_ofhead_sametype + count_edges_oftail_sametype  1) count = count + 1 if count % 1000 == 0: print(count, " exclusivities computed") Thanks, Ioana  Sent from: http://maindiscussionlistforthegraphtoolproject.982480.n3.nabble.com/ _______________________________________________ graphtool mailing list [hidden email] https://lists.skewed.de/mailman/listinfo/graphtool 
Administrator

Am 16.03.20 um 11:48 schrieb ioana:
> Hi, > I am computing a weight for each edge, in a graph of some million nodes and > about 50 mil edges. Initially, it was doing 25000 edges per minute, so it > should have finished processing in about 30+ hours. Now I check the server > for results after more than 72 hours, and the process is still running, but > very slowly, doing less than 1000 edges per minute. > Is there anything I am doing wrong, or what could be the reason for such a > slowdown? It's hard to get to the bottom of things, since you are not even trying to explaining what you wan to do. But looking very briefly at your code, I can see the following problems: You should not do a property map lookup inside the deeper loop, it's better to look it up outside and keep it around: elabel = g.edge_properties["label_id"] Avoid looking up edge descriptors like this: g.edge(edge[0], edge[1]) This takes time O(k) where k is the degree of an endpoint. Since you are using arraybased iteration, you should put the property map you want in the first call to g.get_edges(). These are just some things that pop out, but it might be that there is an altogether better way to achieve whatever it is you want do. Best, Tiago  Tiago de Paula Peixoto <[hidden email]> _______________________________________________ graphtool mailing list [hidden email] https://lists.skewed.de/mailman/listinfo/graphtool signature.asc (849 bytes) Download Attachment

Tiago de Paula Peixoto <tiago@skewed.de> 
I am trying to compute a weight for each relation, that I call "Exclusivity".
Given that the graph has typed edges (with the type of edges stored as an int in the property map label_id), the Exclusivity of an edge is computed as 1 / (number of all edges of same type outgoing from the source node + number of all edges of the same type incoming to the same target node  1) . I understand your suggestions, but regarding the second point, of not looking up the edge descriptors with g.edge(edge[0], edge[1]), how else can that be achieved? As far as I understand, g.get_edges() will return the value of the property map for each edge, but how can I set the property map values? Thanks a lot, Ioana  Sent from: http://maindiscussionlistforthegraphtoolproject.982480.n3.nabble.com/ _______________________________________________ graphtool mailing list [hidden email] https://lists.skewed.de/mailman/listinfo/graphtool 
Hi,
I am still fighting with the same problem: Ioana KHulpus wrote > I am trying to compute a weight for each relation, that I call > "Exclusivity". > Given that the graph has typed edges (with the type of edges stored as an > int in the property map label_id), the Exclusivity of an edge is computed > as > 1 / (number of all edges of same type outgoing from the source node + > number > of all edges of the same type incoming to the same target node  1) . As advised by Tiago, I rewrote the code to: 1. not do property map lookup inside the loop 2. not lookup edge descriptors with g.edge(source, target) The new code: def computeexclusivities(g): count = 0 excl = g.new_edge_property("double") g.edge_properties["Exclusivity"] = excl lbl_id = g.edge_properties["label_id"] edges = g.get_edges([lbl_id, excl]) for edge in edges: edges_of_head = g.get_out_edges(edge[0], [lbl_id]) count_edges_of_head_same_type = len(np.where(np.where(edges_of_head == edge[2])[1] == 2)[0]) edges_of_tail = g.get_in_edges(edge[1], [lbl_id]) count_edges_of_tail_same_type = len(np.where(np.where(edges_of_tail == edge[2])[1] == 2)[0]) edge[3] = 1 / (count_edges_of_head_same_type + count_edges_of_tail_same_type  1) count = count + 1 if count % 1000 == 0: print(count, " exclusivities computed") excl.get_array()[:] = edges[:, 3] Now, I am not sure that excl.get_array() is the best solution, is there any other way to set the property map without providing an edge descriptor? Or an O(1) method for retrieving an edge descriptor? In any case, the code is running now for a week and only did 17 mil edges out of 50 mil, and at the moment it solves 2000 edges per minute. Can anyone help me find a faster solution? In my graph processing pipeline, this is just an intermediary stage, I will have to use the computed exclusivities for computing transfer probabilities for Personalised PageRank, and I cannot afford such long computing times. Thanks a lot! Ioana  Sent from: http://maindiscussionlistforthegraphtoolproject.982480.n3.nabble.com/ _______________________________________________ graphtool mailing list [hidden email] https://lists.skewed.de/mailman/listinfo/graphtool 
Free forum by Nabble  Edit this page 