Building hsdis in 2022
Note: this post is a not very serious guide to building hsdis
using a custom cmake
build script, in order to avoid having to set up an environment for the openjdk build system on Windows (though I’ve also tested that the script works under WSL Linux). I like to play with cmake
from time to time, and this post is about the result of the little project of building hsdis
with cmake
(so, it’s not a very serious guide. Though it should be good enough to get a working hsdis
library). The official instructions for building hsdis
can be found in the openjdk repo here: https://github.com/openjdk/jdk/tree/master/src/utils/hsdis.
What is hsdis?
hsdis
is a disassembler plugin for the OpenJDK/HotSpot JVM. It can be used in conjunction with the PrintAssembly
option (as well as other options) to disassemble and print out code generated by HotSpot’s JIT compilers. It is a separate shared library that can be installed on the PATH
or in the JDK directory next to (lib)jvm.(dll/so/dylib). The VM will dynamically load this library and call the function it exposes to disassemble dynamically generated code.
If you’re interested in the code generated by the VM, you will need the hsdis
plugin to make it visible in a human-readable format (well, if that human happens to know how to read assembly). Without the plugin, the PrintAssembly
option will just output the bytes of the instructions instead.
Building hsdis
Not too long ago, the hsdis
plugin required binutils as a dependency. Fairly recently however, 2 more flavours of hsdis
were added, one based on llvm, and one based on the capstone disassembler library. It is this latter flavour that makes it significantly easier to build hsdis
.
The official way to build hsdis
is through the openjdk build system. If you’re interested in that, the instructions can be found here.
There is, however, an easier way to build it that, crucially for Windows users, doesn’t require setting up cygwin or WSL and using autoconf and make
to run the openjdk build system.
Users also need to provide the capstone library for the build process, a project that uses cmake
as a build system (well, ‘build system generator’).
With the method I’m about to show, we just need to have cmake
and a C compiler installed, and then we can use a simple cmake file to build both capstone and hsdis
in one shot (with capstone statically linked into hsdis
). The script will even automatically download and patch the hsdis sources from the JDK repo in order to build them with cmake.
Create a new directory for the build, and in that directory, create a new file called CMakeLists.txt
(a cmake
build file) with the following contents:
cmake_minimum_required(VERSION 3.15)
project(hsdis)
# options for users
set(HSDIS_JDK_REF 3eb661bbe7151f3a7e949b6518f57896c2bd4136
CACHE STRING "git ref to download hsdis sources from")
set(HSDIS_CAPSTONE_REF 000561b4f74dc15bda9af9544fe714efda7a6e13
CACHE STRING "git ref to fetch capstone from")
set(HSDIS_ARCH X64
CACHE STRING "hsdis target architecture")
# internal settings
set(CMAKE_POSITION_INDEPENDENT_CODE ON) # needed for linux
# turn off architecture support by default, to get a smaller capstone library
set(CAPSTONE_ARCHITECTURE_DEFAULT OFF)
# set architecture specific options. Only x64 for now
if(${HSDIS_ARCH} STREQUAL X64)
set(CAPSTONE_X86_SUPPORT ON)
set(HSDIS_CAPSTONE_ARCH CS_ARCH_X86)
set(HSDIS_CAPSTONE_MODE CS_MODE_64)
set(HSDIS_LIB_SUFFIX amd64)
else()
message(FATAL_ERROR "Unknown architecture: ${HSDIS_ARCH}")
endif()
# fetch and build capstone
include(FetchContent)
message(STATUS "Fetching capstone (ref=${HSDIS_CAPSTONE_REF})...")
FetchContent_Declare(
capstone
GIT_REPOSITORY https://github.com/capstone-engine/capstone
GIT_TAG ${HSDIS_CAPSTONE_REF})
FetchContent_MakeAvailable(capstone)
# build hsdis
# 1. download source files
set(HSDIS_SOURCE_ROOT_URL
https://raw.githubusercontent.com/openjdk/jdk/${HSDIS_JDK_REF}/src/utils/hsdis)
file(DOWNLOAD
${HSDIS_SOURCE_ROOT_URL}/capstone/hsdis-capstone.c
${CMAKE_SOURCE_DIR}/src/hsdis-capstone.c)
file(DOWNLOAD
${HSDIS_SOURCE_ROOT_URL}/hsdis.h
${CMAKE_SOURCE_DIR}/src/hsdis.h)
# 2. fixup capstone.h include
file(READ src/hsdis-capstone.c FILE_CONTENTS)
string(REPLACE "#include <capstone.h>" "#include <capstone/capstone.h>"
FILE_CONTENTS "${FILE_CONTENTS}")
file(WRITE src/hsdis-capstone.c "${FILE_CONTENTS}")
# 3. add hsdis shared library target
add_library(hsdis SHARED src/hsdis-capstone.c)
# 4. configure target
target_link_libraries(hsdis PRIVATE capstone::capstone)
target_include_directories(hsdis PUBLIC src)
target_compile_definitions(hsdis
PRIVATE
CAPSTONE_ARCH=${HSDIS_CAPSTONE_ARCH}
CAPSTONE_MODE=${HSDIS_CAPSTONE_MODE})
set_target_properties(hsdis
PROPERTIES
OUTPUT_NAME hsdis-${HSDIS_LIB_SUFFIX}
PREFIX "")
# 5. generate install target
install(TARGETS hsdis)
At this point it’s important to note that this build script is for building hsdis
for the x86_64 architecture. I’ve added a flag to set the architecture, but it will currently fail when set to anything other than X64
, which is the default.
After initializing some variables, the script will first fetch the capstone source repository (from the latest hash on the next
branch at the time of writing) from github and build it, using the FetchContent
functions.
The script will then download the 2 needed source files, hsdis-capstone.c
and hsdis.h
, from github. The ref that’s used is defined by HSDIS_JDK_REF
. I’ve used the latest hash at the time of writing, which works. You could try changing this to master
to get the latest source code as well (but I can’t guarantee it will work).
The script also patches the hsdis-capstone.h
source file, since it uses a non-standard way of including capstone.h
which is not compatible with the configuration in the capstone cmake package we’re about the build.
We link against capstone with target_link_libraries
.
We also include the src
directory as an include directory with target_include_directories
, since it contains the hsdis.h
header file.
Lastly, we set the CAPSTONE_ARCH
and CAPSTONE_MODE
preprocessor defines with target_compile_definitions
. These are the names of capstone enum constants which end up being passed to capstone at runtime.
Then, to build hsdis
I use the following commands:
cmake -B build <extra cmake config flags>
This command will create a build
directory for the build files. Since I’m using Visual Studio I pass the extra flags -A x64
and -T host=x64
to select the x64 architecture and toolchain. On Linux with gcc
or cc
no extra flags are needed.
cmake --build build --config Release
This builds the library.
cmake --install build --prefix install
This installs the library in the install
directory.
If everything went well, this will have created the hsdis-amd64.dll
file under install/bin
. This file can now be copied into the bin/server
directory in a JDK to enable disassembling. Though, what I’ve done is create an hsdis
folder somewhere on my PC, plopped the .dll
file in there, and put that folder on my PATH
. HotSpot will be able to pick it up from there.
Testing it out
To test out the library we built, I create a simple java class that can be used to JIT compile a payload method for which we want to see the assembly.
public class Main {
public static void main(String[] args) {
for (int i = 0; i < 20_000; i++) {
add(42, 42);
}
}
private static int add(int a, int b) {
return a + b;
}
}
Then I run the following commands to print the assembly for the add
method:
javac Main.java
java -Xbatch '-XX:-TieredCompilation' '-XX:CompileCommand=dontinline,Main::add*' '-XX:CompileCommand=PrintAssembly,Main::add*' Main
-Xbatch
blocks execution until the JIT finishes, so we can get our assembly before the program exits.'-XX:-TieredCompilation'
disables the C1 JIT compiler, so we get a somewhat reduced output. The C2 output is usually what’s interesting, as that is the most optimized.'-XX:CompileCommand=dontinline,Main::add*'
disable inlining of theadd
method, so that we get a clean compilation of that method without it being inlined into the loop, and also so that the JIT doesn’t know that the return value is not actually being used.'-XX:CompileCommand=PrintAssembly,Main::add*'
print out the assembly for theadd
method.- Note that I’ve also used to quotes
'
so that powershell doesn’t try to interpret the arguments as script syntax.
(Note that the CompileCommand
option doesn’t support PrintAssembly
on JDK 8. In that case you’ll have to use the top-level -XX:+PrintAssembly
flag which will print out assembly for all compiled methods, instead of just the add
method)
And BOOM, assembly:
============================= C2-compiled nmethod ==============================
----------------------------------- Assembly -----------------------------------
Compiled method (c2) 58 13 Main::add (4 bytes)
total in heap [0x000001f699c13610,0x000001f699c13810] = 512
relocation [0x000001f699c13768,0x000001f699c13778] = 16
main code [0x000001f699c13780,0x000001f699c137c0] = 64
stub code [0x000001f699c137c0,0x000001f699c137d8] = 24
oops [0x000001f699c137d8,0x000001f699c137e0] = 8
scopes data [0x000001f699c137e0,0x000001f699c137e8] = 8
scopes pcs [0x000001f699c137e8,0x000001f699c13808] = 32
dependencies [0x000001f699c13808,0x000001f699c13810] = 8
[Disassembly]
--------------------------------------------------------------------------------
[Constant Pool (empty)]
--------------------------------------------------------------------------------
[Verified Entry Point]
# {method} {0x000001f6a94002d8} 'add' '(II)I' in 'Main'
# parm0: rdx = int
# parm1: r8 = int
# [sp+0x20] (sp of caller)
0x000001f699c13780: subq $0x18, %rsp
0x000001f699c13787: movq %rbp, 0x10(%rsp)
0x000001f699c1378c: movl %edx, %eax
0x000001f699c1378e: addl %r8d, %eax
0x000001f699c13791: addq $0x10, %rsp
0x000001f699c13795: popq %rbp
0x000001f699c13796: cmpq 0x338(%r15), %rsp ; {poll_return}
0x000001f699c1379d: ja 0x1f699c137a4
0x000001f699c137a3: retq
0x000001f699c137a4: movabsq $0x1f699c13796, %r10; {internal_word}
0x000001f699c137ae: movq %r10, 0x350(%r15)
0x000001f699c137b5: jmp 0x1f699bf3400 ; {runtime_call SafepointBlob}
0x000001f699c137ba: hlt
0x000001f699c137bb: hlt
0x000001f699c137bc: hlt
0x000001f699c137bd: hlt
0x000001f699c137be: hlt
0x000001f699c137bf: hlt
[Exception Handler]
0x000001f699c137c0: jmp 0x1f699c09580 ; {no_reloc}
[Deopt Handler Code]
0x000001f699c137c5: callq 0x1f699c137ca
0x000001f699c137ca: subq $5, (%rsp)
0x000001f699c137cf: jmp 0x1f699bf26a0 ; {runtime_call DeoptimizationBlob}
0x000001f699c137d4: hlt
0x000001f699c137d5: hlt
0x000001f699c137d6: hlt
0x000001f699c137d7: hlt
--------------------------------------------------------------------------------
[/Disassembly]
Now, all that’s left is learning to interpret this ;)