Overview LWP is AMD's LightWeightProfiling extention to the x86_64 architecture, which enables collection of statistical performance data from user mode code. Prior to the introduction of LWP, the traditional methods of collecting performance data about a particular process were to intrusively inject collection code, or to query the on-chip performance counters via a device driver. Because LWP operates completely in user space, on a per-thread basis, it incurs a much lower overhead than other statistical performance collection methods, and is thus superior Notes: - intrusive instrumentation, various granularity/injection point (where in the toolchain) - statistical sampling - time: sample at timer intervals, collect # of events that have happened. Counters could roll-over unless they are 64-bit (few if any are) if the interval is large enough - event: count up to a limit or down to zero, and when the threshold has been reached, generate an event record - instruction: when the threshold for a particular instruction action is reached, generate an event record. Instructions retired, branches retired, etc. - skid issues: Initially, events from PMUs were imprecise, and could cause reporting of the event on an incorrect instruction. That has generally been eliminated with precise events. - limited/bounded set of resources, counters are system-wide. User A sets them up, could conflict with User B sampling and vice versa. Pre-emption can trivially happen. - LWP + system-wide possible. Because LWP is a per-thread, user-space mechanism, it can be operational at the same time that the system wide PMUs are in use. Implementation approach - the lwpcore files provide a core set of routines to access an LWPCB (LWP Control Block), in a policy neutral way. Generally, higher layers of software are assumed to perform the storage management, and/or consuming the entries in the ring buffer. For this particular implementation, we will utilize a convention wherein: - C++ is used - the shared memory has a header region, followed by the ring buffer itself. - the header holds: - misc information such as thread id, thread creation/exit time, etc. - the LWPCB