Building hsdis in 2022

Note: this post is a not very serious guide to building hsdis using a custom cmake build script, in order to avoid having to set up an environment for the openjdk build system on Windows (though I’ve also tested that the script works under WSL Linux). I like to play with cmake from time to time, and this post is about the result of the little project of building hsdis with cmake (so, it’s not a very serious guide. Though it should be good enough to get a working hsdis library). The official instructions for building hsdis can be found in the openjdk repo here: https://github.com/openjdk/jdk/tree/master/src/utils/hsdis.

What is hsdis?

hsdis is a disassembler plugin for the OpenJDK/HotSpot JVM. It can be used in conjunction with the PrintAssembly option (as well as other options) to disassemble and print out code generated by HotSpot’s JIT compilers. It is a separate shared library that can be installed on the PATH or in the JDK directory next to (lib)jvm.(dll/so/dylib). The VM will dynamically load this library and call the function it exposes to disassemble dynamically generated code.

If you’re interested in the code generated by the VM, you will need the hsdis plugin to make it visible in a human-readable format (well, if that human happens to know how to read assembly). Without the plugin, the PrintAssembly option will just output the bytes of the instructions instead.

Building hsdis

Not too long ago, the hsdis plugin required binutils as a dependency. Fairly recently however, 2 more flavours of hsdis were added, one based on llvm, and one based on the capstone disassembler library. It is this latter flavour that makes it significantly easier to build hsdis.

The official way to build hsdis is through the openjdk build system. If you’re interested in that, the instructions can be found here.

There is, however, an easier way to build it that, crucially for Windows users, doesn’t require setting up cygwin or WSL and using autoconf and make to run the openjdk build system.

Users also need to provide the capstone library for the build process, a project that uses cmake as a build system (well, ‘build system generator’).

With the method I’m about to show, we just need to have cmake and a C compiler installed, and then we can use a simple cmake file to build both capstone and hsdis in one shot (with capstone statically linked into hsdis). The script will even automatically download and patch the hsdis sources from the JDK repo in order to build them with cmake.

Create a new directory for the build, and in that directory, create a new file called CMakeLists.txt (a cmake build file) with the following contents:

cmake_minimum_required(VERSION 3.15)
project(hsdis)

# options for users
set(HSDIS_JDK_REF 3eb661bbe7151f3a7e949b6518f57896c2bd4136
    CACHE STRING "git ref to download hsdis sources from")
set(HSDIS_CAPSTONE_REF 000561b4f74dc15bda9af9544fe714efda7a6e13
    CACHE STRING "git ref to fetch capstone from")
set(HSDIS_ARCH X64
    CACHE STRING "hsdis target architecture")

# internal settings
set(CMAKE_POSITION_INDEPENDENT_CODE ON) # needed for linux
 # turn off architecture support by default, to get a smaller capstone library
set(CAPSTONE_ARCHITECTURE_DEFAULT OFF)

# set architecture specific options. Only x64 for now
if(${HSDIS_ARCH} STREQUAL X64)
    set(CAPSTONE_X86_SUPPORT ON)
    set(HSDIS_CAPSTONE_ARCH CS_ARCH_X86)
    set(HSDIS_CAPSTONE_MODE CS_MODE_64)
    set(HSDIS_LIB_SUFFIX amd64)
else()
    message(FATAL_ERROR "Unknown architecture: ${HSDIS_ARCH}")
endif()

# fetch and build capstone
include(FetchContent)
message(STATUS "Fetching capstone (ref=${HSDIS_CAPSTONE_REF})...")
FetchContent_Declare(
    capstone
    GIT_REPOSITORY https://github.com/capstone-engine/capstone
    GIT_TAG ${HSDIS_CAPSTONE_REF})
FetchContent_MakeAvailable(capstone)

# build hsdis
# 1. download source files
set(HSDIS_SOURCE_ROOT_URL
    https://raw.githubusercontent.com/openjdk/jdk/${HSDIS_JDK_REF}/src/utils/hsdis)
file(DOWNLOAD
    ${HSDIS_SOURCE_ROOT_URL}/capstone/hsdis-capstone.c
    ${CMAKE_SOURCE_DIR}/src/hsdis-capstone.c)
file(DOWNLOAD
    ${HSDIS_SOURCE_ROOT_URL}/hsdis.h
    ${CMAKE_SOURCE_DIR}/src/hsdis.h)

# 2. fixup capstone.h include
file(READ src/hsdis-capstone.c FILE_CONTENTS)
string(REPLACE "#include <capstone.h>" "#include <capstone/capstone.h>"
       FILE_CONTENTS "${FILE_CONTENTS}")
file(WRITE src/hsdis-capstone.c "${FILE_CONTENTS}")

# 3. add hsdis shared library target
add_library(hsdis SHARED src/hsdis-capstone.c)

# 4. configure target
target_link_libraries(hsdis PRIVATE capstone::capstone)
target_include_directories(hsdis PUBLIC src)
target_compile_definitions(hsdis
  PRIVATE 
    CAPSTONE_ARCH=${HSDIS_CAPSTONE_ARCH}
    CAPSTONE_MODE=${HSDIS_CAPSTONE_MODE})
set_target_properties(hsdis
  PROPERTIES
    OUTPUT_NAME hsdis-${HSDIS_LIB_SUFFIX}
    PREFIX "")

# 5. generate install target
install(TARGETS hsdis)

At this point it’s important to note that this build script is for building hsdis for the x86_64 architecture. I’ve added a flag to set the architecture, but it will currently fail when set to anything other than X64, which is the default.

After initializing some variables, the script will first fetch the capstone source repository (from the latest hash on the next branch at the time of writing) from github and build it, using the FetchContent functions.

The script will then download the 2 needed source files, hsdis-capstone.c and hsdis.h, from github. The ref that’s used is defined by HSDIS_JDK_REF. I’ve used the latest hash at the time of writing, which works. You could try changing this to master to get the latest source code as well (but I can’t guarantee it will work).

The script also patches the hsdis-capstone.h source file, since it uses a non-standard way of including capstone.h which is not compatible with the configuration in the capstone cmake package we’re about the build.

We link against capstone with target_link_libraries.

We also include the src directory as an include directory with target_include_directories, since it contains the hsdis.h header file.

Lastly, we set the CAPSTONE_ARCH and CAPSTONE_MODE preprocessor defines with target_compile_definitions. These are the names of capstone enum constants which end up being passed to capstone at runtime.

Then, to build hsdis I use the following commands:

cmake -B build <extra cmake config flags>

This command will create a build directory for the build files. Since I’m using Visual Studio I pass the extra flags -A x64 and -T host=x64 to select the x64 architecture and toolchain. On Linux with gcc or cc no extra flags are needed.

cmake --build build --config Release

This builds the library.

cmake --install build --prefix install

This installs the library in the install directory.

If everything went well, this will have created the hsdis-amd64.dll file under install/bin. This file can now be copied into the bin/server directory in a JDK to enable disassembling. Though, what I’ve done is create an hsdis folder somewhere on my PC, plopped the .dll file in there, and put that folder on my PATH. HotSpot will be able to pick it up from there.

Testing it out

To test out the library we built, I create a simple java class that can be used to JIT compile a payload method for which we want to see the assembly.

public class Main {
    public static void main(String[] args) {
        for (int i = 0; i < 20_000; i++) {
            add(42, 42);
        }
    }

    private static int add(int a, int b) {
        return a + b;
    }
}

Then I run the following commands to print the assembly for the add method:

javac Main.java
java -Xbatch '-XX:-TieredCompilation' '-XX:CompileCommand=dontinline,Main::add*' '-XX:CompileCommand=PrintAssembly,Main::add*' Main

-Xbatch blocks execution until the JIT finishes, so we can get our assembly before the program exits.
'-XX:-TieredCompilation' disables the C1 JIT compiler, so we get a somewhat reduced output. The C2 output is usually what’s interesting, as that is the most optimized.
'-XX:CompileCommand=dontinline,Main::add*' disable inlining of the add method, so that we get a clean compilation of that method without it being inlined into the loop, and also so that the JIT doesn’t know that the return value is not actually being used.
'-XX:CompileCommand=PrintAssembly,Main::add*' print out the assembly for the add method.
Note that I’ve also used to quotes ' so that powershell doesn’t try to interpret the arguments as script syntax.

(Note that the CompileCommand option doesn’t support PrintAssembly on JDK 8. In that case you’ll have to use the top-level -XX:+PrintAssembly flag which will print out assembly for all compiled methods, instead of just the add method)

And BOOM, assembly:

============================= C2-compiled nmethod ==============================
----------------------------------- Assembly -----------------------------------

Compiled method (c2)      58   13             Main::add (4 bytes)
 total in heap  [0x000001f699c13610,0x000001f699c13810] = 512
 relocation     [0x000001f699c13768,0x000001f699c13778] = 16
 main code      [0x000001f699c13780,0x000001f699c137c0] = 64
 stub code      [0x000001f699c137c0,0x000001f699c137d8] = 24
 oops           [0x000001f699c137d8,0x000001f699c137e0] = 8
 scopes data    [0x000001f699c137e0,0x000001f699c137e8] = 8
 scopes pcs     [0x000001f699c137e8,0x000001f699c13808] = 32
 dependencies   [0x000001f699c13808,0x000001f699c13810] = 8

[Disassembly]
--------------------------------------------------------------------------------
[Constant Pool (empty)]

--------------------------------------------------------------------------------

[Verified Entry Point]
  # {method} {0x000001f6a94002d8} 'add' '(II)I' in 'Main'
  # parm0:    rdx       = int
  # parm1:    r8        = int
  #           [sp+0x20]  (sp of caller)
  0x000001f699c13780:   subq            $0x18, %rsp
  0x000001f699c13787:   movq            %rbp, 0x10(%rsp)
  0x000001f699c1378c:   movl            %edx, %eax
  0x000001f699c1378e:   addl            %r8d, %eax
  0x000001f699c13791:   addq            $0x10, %rsp
  0x000001f699c13795:   popq            %rbp
  0x000001f699c13796:   cmpq            0x338(%r15), %rsp   ;   {poll_return}
  0x000001f699c1379d:   ja              0x1f699c137a4
  0x000001f699c137a3:   retq
  0x000001f699c137a4:   movabsq         $0x1f699c13796, %r10;   {internal_word}
  0x000001f699c137ae:   movq            %r10, 0x350(%r15)
  0x000001f699c137b5:   jmp             0x1f699bf3400       ;   {runtime_call SafepointBlob}
  0x000001f699c137ba:   hlt
  0x000001f699c137bb:   hlt
  0x000001f699c137bc:   hlt
  0x000001f699c137bd:   hlt
  0x000001f699c137be:   hlt
  0x000001f699c137bf:   hlt
[Exception Handler]
  0x000001f699c137c0:   jmp             0x1f699c09580       ;   {no_reloc}
[Deopt Handler Code]
  0x000001f699c137c5:   callq           0x1f699c137ca
  0x000001f699c137ca:   subq            $5, (%rsp)
  0x000001f699c137cf:   jmp             0x1f699bf26a0       ;   {runtime_call DeoptimizationBlob}
  0x000001f699c137d4:   hlt
  0x000001f699c137d5:   hlt
  0x000001f699c137d6:   hlt
  0x000001f699c137d7:   hlt
--------------------------------------------------------------------------------
[/Disassembly]

Now, all that’s left is learning to interpret this ;)

What is hsdis?

Building hsdis

Testing it out

Thanks for reading