Correlation Histogram - Runs out of RAM

classic Classic list List threaded Threaded
9 messages Options
P-M
Reply | Threaded
Open this post in threaded view
|

Correlation Histogram - Runs out of RAM

P-M
This post was updated on .
I am trying to obtain the correlation histogram for a graph of mine following the example given in the manual. I run:

g = gt.load_graph('graph.gt')
gt.remove_parallel_edges(g)
h=gt.corr_hist(g,'out','out')

My graph is relatively large at 12,238,931 vertices and 24,884,365 edges. My problem is that as soon as I start the code it runs on 20 processes and happily chomps through 252 GB of RAM before starting to spill over into the swap making my machine incredibly slow.

I presume the RAM usage is linked to the parallel processing so presumably could be tackled if I ran it using fewer processes. Is there any way of reducing the RAM usage? Or would I need to implement the routine manually to achieve this?

Best wishes,

Philipp
Reply | Threaded
Open this post in threaded view
|

Re: Correlation Histogram

Alexandre Hannud Abdo
Ni! Hi Phillip,

That's a feature of OpenMP controlled by an environment variable:

OMP_NUM_THREADS

So you can, for example

export OMP_NUM_THREADS=4

before running your code.

.~´

Le mercredi 08 février 2017 à 10:09 -0700, P-M a écrit :

> I am trying to obtain the correlation histogram for a graph of mine
> following
> the example given in the manual. I run:
>
> g = gt.load_graph('graph.gt')
> gt.remove_parallel_edges(g)
> h=gt.corr_hist(g,'out','out')
>
> My graph is relatively large at 12,238,931 vertices and 24,884,365
> edges. My
> problem is that as soon as I start the code it runs on 20 processes
> and
> happily chomps through 252 GB of RAM before starting to spill over
> into the
> swap making my machine incredibly slow. 
>
> I presume the RAM usage is linked to the parallel processing so
> presumably
> could be tackled if I ran it using fewer processes. Is there any way
> of
> reducing the RAM usage? Or would I need to implement the routine
> manually to
> achieve this?
>
> Best wishes,
>
> Philipp
>
>
>
> --
> View this message in context: http://main-discussion-list-for-the-gra
> ph-tool-project.982480.n3.nabble.com/Correlation-Histogram-
> tp4027010.html
> Sent from the Main discussion list for the graph-tool project mailing
> list archive at Nabble.com.
> _______________________________________________
> graph-tool mailing list
> [hidden email]
> https://lists.skewed.de/mailman/listinfo/graph-tool
_______________________________________________
graph-tool mailing list
[hidden email]
https://lists.skewed.de/mailman/listinfo/graph-tool
P-M
Reply | Threaded
Open this post in threaded view
|

Re: Correlation Histogram

P-M
Thanks! I presume this won't impact already running processes and is only valid for as long as my instance of PuTTY is running and after that revert to normal?
Reply | Threaded
Open this post in threaded view
|

Re: Correlation Histogram

Alexandre Hannud Abdo
Well, yes. Though you can configure your shell and make it permanent. Software Carpentry has some good tutorials on using the shell, for example:

http://swcarpentry.github.io/shell-extras/08-environment-variables.html

You can also modify the environment from within Python, using the "os.environ" dictionary. You'll just have to set the value for 'OMP_NUM_THREADS' before importing graph-tool, because openmp will consider the value at the time of importing.

[]s

On Thursday, February 9, 2017, P-M <[hidden email]> wrote:
Thanks! I presume this won't impact already running processes and is only
valid for as long as my instance of PuTTY is running and after that revert
to normal?



--
View this message in context: http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/Correlation-Histogram-Runs-out-of-RAM-tp4027010p4027012.html
Sent from the Main discussion list for the graph-tool project mailing list archive at Nabble.com.
_______________________________________________
graph-tool mailing list
[hidden email]
https://lists.skewed.de/mailman/listinfo/graph-tool

_______________________________________________
graph-tool mailing list
[hidden email]
https://lists.skewed.de/mailman/listinfo/graph-tool
Reply | Threaded
Open this post in threaded view
|

Re: Correlation Histogram

Tiago Peixoto
Administrator
On 09.02.2017 16:37, Alexandre Hannud Abdo wrote:
> You can also modify the environment from within Python, using the
> "os.environ" dictionary. You'll just have to set the value for
> 'OMP_NUM_THREADS' before importing graph-tool, because openmp will consider
> the value at the time of importing.

graph-tool also provides some convenience functions for doing this from
inside python, independently of the environment variables:

        graph_tool.openmp_get_num_threads()
        graph_tool.openmp_set_num_threads()

        graph_tool.openmp_get_schedule()
        graph_tool.openmp_set_schedule()

--
Tiago de Paula Peixoto <[hidden email]>
_______________________________________________
graph-tool mailing list
[hidden email]
https://lists.skewed.de/mailman/listinfo/graph-tool
--
Tiago de Paula Peixoto <tiago@skewed.de>
P-M
Reply | Threaded
Open this post in threaded view
|

Re: Correlation Histogram

P-M
Those functions were very useful (as was the tip about the environment variable). I couldn't find them anywhere in the documentation though. Would it be possible to add them?

Thank you for the help, alas, in this case even limiting the threads to 1 didn't work. The routine still uses up 252 GB of RAM and then carries on into the swap. I suppose the network is simply too large...

Best,

Philipp
Reply | Threaded
Open this post in threaded view
|

Re: Correlation Histogram

Tiago Peixoto
Administrator
On 10.02.2017 15:14, P-M wrote:
> Those functions were very useful (as was the tip about the environment
> variable). I couldn't find them anywhere in the documentation though. Would
> it be possible to add them?

It's now in git.

> Thank you for the help, alas, in this case even limiting the threads to 1
> didn't work. The routine still uses up 252 GB of RAM and then carries on
> into the swap. I suppose the network is simply too large...

I don't think this is necessarily true. The histogram constructs a DxD
matrix, where D is the largest degree in the network. This is probably why
you are running out of memory.

--
Tiago de Paula Peixoto <[hidden email]>


_______________________________________________
graph-tool mailing list
[hidden email]
https://lists.skewed.de/mailman/listinfo/graph-tool

signature.asc (849 bytes) Download Attachment
--
Tiago de Paula Peixoto <tiago@skewed.de>
P-M
Reply | Threaded
Open this post in threaded view
|

Re: Correlation Histogram

P-M
Yup, that would explain it. D is in the order of 10^5 in this case. Would I be able to write a more memory-efficient script manually with sparse matrices or is the underlying code already fairly optimised in this regard?

Best,

Philipp
Reply | Threaded
Open this post in threaded view
|

Re: Correlation Histogram

Tiago Peixoto
Administrator
On 12.02.2017 12:53, P-M wrote:
> Yup, that would explain it. D is in the order of 10^5 in this case. Would I
> be able to write a more memory-efficient script manually with sparse
> matrices or is the underlying code already fairly optimised in this regard?

Of course, you could do a lot better with a sparse representation...

--
Tiago de Paula Peixoto <[hidden email]>
_______________________________________________
graph-tool mailing list
[hidden email]
https://lists.skewed.de/mailman/listinfo/graph-tool
--
Tiago de Paula Peixoto <tiago@skewed.de>