Julia Profiling Cheat Sheet
I've been using Julia for all my research code in the last few months, and I love it. It has many of the same features that make Python a pleasure to use: concise code, high-level abstractions, a powerful REPL, and a notebook interface. On top of that, Julia has some features that I missed in Python: a powerful type system, and the ability to run at C-like speeds if you write your code carefully.
Good profiling tools are essential for optimizing code, and I wrote in the past about profiling in Python. In this post I will summarize Julia's profiling tools, and a few tricks and gotchas that I learned along the way. I assume familiarity with the sections on profiling and performance tips in Julia's manual.
Cheat sheet: Timing macros
@time
prints timing and allocation info, and returns the expression's value:
julia> @time "the answer is $(2*3*7)"
elapsed time: 1.0246e-5 seconds (480 bytes allocated)
"the answer is 42"
@elapsed
returns the seconds elapsed, and discards the expression's value:
julia> @elapsed "the answer is $(2*3*7)"
1.0404e-5
@allocated
returns the bytes allocated, and discards the expression's value:
julia> @allocated "the answer is $(2*3*7)"
400
@timed
returns a tuple of (expr_val, sec_elapsed, bytes_alloc, sec_in_gc)
.
This macro is undocumented, so it might disappear in future versions of Julia.
julia> @timed "the answer is $(2*3*7)"
("the answer is 42",2.6062e-5,480,0.0)
Cheat sheet: Profiling
For long runs, reduce the profiler's sampling frequency to ensure that we have enough space to store all the backtraces:
Profile.init(delay=0.01)
Run the code once to force compilation, then clear any previous profiling results, and run the code again with profiling:
code_to_profile()
Profile.clear()
@profile code_to_profile()
Visualize results:
using ProfileView
ProfileView.view()
Cheat sheet: Tracking heap allocations
Start Julia with --track-allocation=user
to track mallocs in your code, or
--track-allocation=all
to track mallocs in your code as well as Julia's
libraries.
Run the code once to force compilation, clear previous info about memory allocations, then run the code again:
code_to_profile()
clear_malloc_data()
code_to_profile()
Then quit Julia and inspect the *.mem
files to see which lines cause heap
allocations. (The *.mem
files will not be written out until you quit Julia.)
Trick: Saving profile results to disk
Sometimes you want to store profiling results for later comparison. Other times matplotlib interacts poorly with ProfileView, and you are forced to visualize profiling results from another instance of Julia.
To save profiling results, we grab them from the Profile module and dump them to a file:
r = Profile.retrieve();
f = open("profile.bin", "w")
serialize(f, r)
close(f)
Or as a one-liner using a do-block:
open("profile.bin", "w") do f serialize(f, Profile.retrieve()) end
To load profiling results and visualize them:
using ProfileView
f = open("profile.bin")
r = deserialize(f);
ProfileView.view(r[1], lidict=r[2])
Or as a one-liner:
r = deserialize(open("profile.bin")); ProfileView.view(r[1], lidict=r[2]);
Gotcha: The JIT compiler causes memory allocations
Julia code is just-in-time compiled, so the first call to a function does a lot more work than subsequent calls. You probably don't want to measure the time and memory used by the compiler, so run your code once to force compilation, and then run it again to profile it. Example:
julia> function foo() sum(ones(5)) end
foo (generic function with 1 method)
julia> @time foo()
elapsed time: 0.004021931 seconds (23432 bytes allocated)
5.0
julia> @time foo()
elapsed time: 5.696e-6 seconds (192 bytes allocated)
5.0
For the same reason, it is best to avoid include()
calls in the code that you
are profiling. Instead, include everything you need beforehand.
Gotcha: The REPL causes memory allocations
Code written in the REPL is evaluated in the global scope, and we already know that globals can cause slowness. Additionally, code executed in the REPL causes memory allocations in the REPL itself, which you probably don't want to count. Example:
julia> function foo() 1 + 2 + 3 end
foo (generic function with 1 method)
julia> foo(); @time foo()
elapsed time: 4.217e-6 seconds (80 bytes allocated)
6
julia> function benchmark() @time foo() end
benchmark (generic function with 1 method)
julia> benchmark()
elapsed time: 2.97e-7 seconds (0 bytes allocated)
6
Gotcha: Anonymous functions are not properly optimized
I ran into this issue when writing FactCheck tests. The standard
syntax uses do-blocks, which create anonymous functions.
Unfortunately, anonymous functions are not optimized as well as regular
functions in current versions of Julia, so they may run more
slowly and allocate more memory. Example test file example.jl
:
using FactCheck
# Define an immutable type, which Julia allocates on the stack.
immutable Point
x::Float64
y::Float64
end
function Point(arr::Array{Float64,1})
@assert length(arr) == 2
Point(arr[1], arr[2])
end
# Ensure that the Point() constructor is compiled.
let
Point([3.0, 4.0])
println("compiled")
end
facts("test with anon func -- will fail") do
arr = [3.0, 4.0]
bytes = @allocated Point(arr)
@fact bytes => 0
end
function test_func()
arr = [3.0, 4.0]
bytes = @allocated Point(arr)
@fact bytes => 0
end
facts(test_func, "test with regular func -- will pass")
Now from the REPL:
julia> include("example.jl");
compiled
test with anon func -- will fail
Failure :: (line:-1) :: got 24
bytes => 0
Out of 1 total fact:
Failed: 1
test with regular func -- will pass
1 fact verified.
Bonus: My all-in-one recipe for profiling
Putting it all together, my profiling code follows the basic recipe below. It
prints timing and allocation info to stdout, dumps the profiling data to
profile.bin
, and also produces *.mem
files if Julia was started with
--track-allocation=...
.
function benchmark()
# Any setup code goes here.
# Run once, to force compilation.
println("======================= First run:")
srand(666)
@time code_to_profile()
# Run a second time, with profiling.
println("\n\n======================= Second run:")
srand(666)
Profile.init(delay=0.01)
Profile.clear()
clear_malloc_data()
@profile @time code_to_profile()
# Write profile results to profile.bin.
r = Profile.retrieve()
f = open("profile.bin", "w")
serialize(f, r)
close(f)
end