Context

One of the quickest ways to understand bottlenecks in PyTorch workloads is to analyze the PyTorch Profiler trace(s). A common tool of choice to view trace files is Chrome Tracing. For traces collected from large models users experience a considerable slow down while using Chrome Tracing. Hence, the need for a new tool to analyze the traces. I recommend using Perfetto for viewing large trace files.

Small to medium size trace files can be loaded directly into the Perfetto UI. For large files the in-memory representation of the trace files is too big and hits the browser tab memory limits. To bypass this problem Perfetto provides the trace_processor_shell executable which can be used as an accelerator for the UI. In this post, I will explain how to setup Perfetto to view the large traces. Additionally, I also list some benefits of using Perfetto along with some of the issues I encountered to get it working and propose ways to mitigate them.

Setup Perfetto

Follow the steps below to setup Perfetto. These have been tested on Mac OS X 12.6 and Ubuntu 20.04.

# Clone the repo
git clone https://github.com/google/perfetto.git

# Pull dependent libraries and toolchains
tools/install-build-deps --ui

# Generate build files
tools/gn args out/android

# Build targets
tools/ninja -C out/android

# Build and run the Perfetto server (takes 2-3 minutes on a developer edition Macbook)
ui/run-dev-server

The steps above will create a executable called trace_processor_shell in the out/android folder in the Perfetto repo. Open the browser and navigate to http://127.0.0.1:10000. If the build succeeds, the Perfetto UI will load.

Visualizing Traces with Perfetto

  1. Load the trace file using the following command

     /path/to/trace_processor_shell -D /path/to/trace/file.json.gz
    
  2. After the trace file is loaded the executable will print the trace uploading time and the location of a RPC server. The default address for the server is http://127.0.0.1:9001.

  3. Refresh the Perfetto UI running on http://127.0.0.1:10000 and accept the option “YES, use loaded trace.”

The trace_processor_shell executable can be viewed as the “client” running on port 9001 and the Perfetto UI as the “server” running on port 10000. For a multi gigabyte trace file it will take some minutes to load it.

Benefits of using Perfetto

  1. Run remotely, view locally - The trace_processor_shell executable and “UI” can be run on a remote server. By setting up local forwarding on ports 9001 and 10000 the local browser can be used to visualize the traces. This eliminates the intermediate step to transfer trace files to the local machine, in order to view the trace file as required when using Chrome Tracing.

  2. Play with Perfetto in Python - Perfetto can be quickly installed via pip (pip install perfetto). This allows the user to analyze traces using Python. The API allows for loading multiple trace files simultaneously which can be useful for a distributed training job with traces from each rank or for different runs of the same model. The user needs to specify the path to the trace files and the trace_processor_shell binary.

  3. Interactive queries - Perfetto allows interactive queries within the UI. The key to exploit the UI is to understand how the data in the trace is mapped into SQL tables. For traces collected from the PyTorch Profiler traces the two relevant tables are slice and args. Example SQL queries are available in the Perfetto UI in the left panel.

  4. View large traces - Perfetto can handle multi gigabyte trace files with ease.

  5. Platform independent - Perfetto is platform independent. It works on Linux, Mac and Windows.

Errors and mitigations

Building the Perfetto binaries is not for the faint hearted. There is a fair amount of finagling required to get things working on OS X. Here are some gotchas to avoid:

  1. The build fails with MacOSx SDK version 13.0. To bypass this issue use an older version. SDK version 12.3 worked without any problems. If multiple versions of SDKs are installed on the system then temporarily disable SDK 13.0 by either modifying the system path or moving it temporarily to a different location.

  2. If the trace file contains any key before the traceEvents key in the json file the Perfetto UI will not render anything. Here’s a script to convert a trace file into a Perfetto compatible format. This script strips off all keys and their respective values except the traceEvents key from the json file.