Performance Measurement with Score-P and Vampir
Vampir and Score-P provide a performance tool framework with a special focus on highly-parallel applications. Performance data is collected from multi-process (MPI, SHMEM), thread-parallel (OpenMP, Pthreads), and accelerator-based paradigms (CUDA, OpenCL, OpenACC), and visualised in a scalable GUI.
About the tools

Score-P
The Score-P measurement infrastructure is a highly scalable and easy-to-use tool suite for profiling, event tracing, and online analysis of HPC applications.
It was created under the German BMBF project SILC and the US DOE project PRIMA and is maintained and enhanced by a number of follow-up projects such as LMAC, Score-E, and HOPSA. Score-P is developed under a BSD 3-Clause License and governed by a meritocratic governance model.
Score-P offers the user a maximum of convenience by supporting a number of analysis tools. Currently, it works with Scalasca, Vampir, and Tau and is open for other tools.
Score-P is available under the New BSD Open Source license.

Vampir
Vampir provides an easy-to-use framework that enables developers to quickly display and analyze arbitrary program behavior at any level of detail. The tool suite implements optimized event analysis algorithms and customizable displays that enable fast and interactive rendering of very complex performance monitoring data.
The combined handling and visualization of instrumented and sampled event traces generated by Score-P enables an outstanding performance analysis capability of highly-parallel applications. Current developments also include the analysis of memory and I/O behavior that often impacts an application's performance.