====== rJava: when compilers get too smart ====== In the midst of installing the thousands of packages available in the CRAN library, a subset of the packages displayed a curious behavior: when R attempted to load the package to verify installation, the R subprocess would hang indefinitely. With timeouts in place, the installation of the package eventually fails. An initial check of the hung R subprocess showed it to be stuck in a function named ''RinitJVM_with_padding()'': #0 0x00007f2954334740 in RinitJVM_with_padding () from /opt/shared/r/add-ons/r4.1.3/cran/20220517/rJava/libs/rJava.so #1 0x00007f295433491d in RinitJVM_jsw () from /opt/shared/r/add-ons/r4.1.3/cran/20220517/rJava/libs/rJava.so #2 0x00007f2961899364 in do_External () from /opt/shared/r/4.1.3/lib64/R/lib/libR.so #3 0x00007f29618d66bd in bcEval () from /opt/shared/r/4.1.3/lib64/R/lib/libR.so #4 0x00007f29618f0340 in Rf_eval.localalias () from /opt/shared/r/4.1.3/lib64/R/lib/libR.so : The goal of the function is to grow the stack by some distance ''JSW_PADDING'' (defaults to 2 MiB) before starting the Java interpreter — guard pages to better-insulate the Java VM from the C code in front of it. The ''RinitJVM_jsw()'' function calls ''RinitJVM_with_padding()'' and then ''RinitJVM_with_padding()'' calls itself recursively until its local variable is at least ''JSW_PADDING'' bytes away from where ''RinitJVM_jsw()'' was placing its local variables. The correct execution of the ''RinitJVM_with_padding()'' chain relies on the stack's predictably growing on each invocation. There are several comments surrounding the C source code that mention the need to try to fool the C compiler so it does not optimize-out those local variables; this includes mention of their having tried other stack-based allocation schemes (e.g. via ''alloca()'') to no avail. The odd part about the GDB stack trace was that it never went any deeper into a ''RinitJVM_with_padding()'' recursion, execution just stayed in the first invocation. The assembly dump for the region through which execution was looping indefinitely looked like this: 0x7f2954334730 sub %r10,%rax 0x7f2954334733 imul %r8,%rax 0x7f2954334737 sub %rax,%rcx 0x7f295433473a mov %rcx,%rdx 0x7f295433473d test %rcx,%rcx 0x7f2954334740 jle 0x7f2954334770 0x7f2954334742 sub %r11,%rdx 0x7f2954334745 test %rdx,%rdx 0x7f2954334748 jle 0x7f2954334770 0x7f295433474a sub %rsi,%rdx 0x7f295433474d test %rdx,%rdx 0x7f2954334750 jle 0x7f2954334770 0x7f2954334752 sub %rdi,%rdx 0x7f2954334755 test %rdx,%rdx 0x7f2954334758 jle 0x7f2954334775 0x7f295433475a mov %rdx,%rcx 0x7f295433475d mov %rbx,%rax 0x7f2954334760 sub %rbp,%rcx 0x7f2954334763 test %rcx,%rcx 0x7f2954334766 jg 0x7f2954334730 First and foremost note that nothing is ever being pushed onto the stack in this segment. Working-out what each register holds: ^Register^Purpose^Initial Value^ |r10|Base stack pointer|X| |rax|Current stack pointer|X + 4| |r8|Stack growth direction, ±1|+1| |rcx|Bytes remaining to be allocated|C| |rdx|Temporary byte count| | |r11|Pointer decrement|-1| |rsi|Pointer decrement|-1| |rdi|Pointer decrement|-1| |rdp|Pointer decrement|-1| |rbx|Saved starting value for current stack pointer|X + 4| For C > 4, the code determines (rax - r10) to be 4 and calculates C = C - 4. It then does in seuquence subtraction of registers r11, rsi, rdi, and rdp off C, exiting the function iff C drops to zero or negative. But subtraction of a -1 means we are doing four C = C + 1 operations, so by the end of the segment of code C has been restored to its initial value (which is still C > 4) and the loop continues with rax being restored to X + 4. The compiler has apparently transformed the original source code into a loop that plays with the same base pointer over and over again because the stack is never being grown. This seems to imply that the compiler's recursion analysis and elimination logic does not consider the implicit stack growth side effect that makes the local dummy variable in each invocation unique. The following test code was used to explore the issue. It is a modified variant on the ''RinitJVM_with_padding()'' function that is still in use in rJava: #include #include #include #include #define JSW_PADDING 2*1024*1024 #define C_Stack_Direction 1 extern void * __libc_stack_end; #if defined(NO_OPT) #pragma GCC push_options #pragma GCC optimize ("O0") #endif #if defined(NO_RETURN) # define RinitJVM_with_padding_RETURN(RVAL) { RVAL; } void #else # define RinitJVM_with_padding_RETURN(RVAL) { return RVAL; } intptr_t #endif RinitJVM_with_padding( char *dumdum, intptr_t padding, char *last ) { intptr_t dp; char dummy[1]; dummy[0] = (char)(uintptr_t)&dummy; padding -= (dp = (last - dummy) * C_Stack_Direction); #if !defined(NO_PRINTF) printf("%p %p: padding = %p ; dp = %p\n", __builtin_frame_address(0), dummy, padding, dp); #endif if ( padding > 0 ) RinitJVM_with_padding_RETURN( RinitJVM_with_padding(dumdum, padding, dummy) ) else RinitJVM_with_padding_RETURN( (uintptr_t)dummy ); } #if defined(NO_OPT) #pragma GCC pop_options #endif int main() { char dummy[1], dummy2[45]; uintptr_t R_CStackStart = (uintptr_t) __libc_stack_end; intptr_t R_CStackLimit = 8192 * 1024; intptr_t padding = 0; { int dummy; intptr_t usage = (R_CStackStart - (uintptr_t)&dummy); usage += JSW_PADDING + 512; /* 512 is a buffer for C recursive calls */ if(R_CStackLimit == -1 || usage < ((intptr_t) R_CStackLimit)) padding = JSW_PADDING; } /* reduce the risk that dummy will be optimized out */ dummy[0] = (char) (uintptr_t) &dummy; printf("padding = %p\n", padding); #if defined(NO_RETURN) RinitJVM_with_padding(dummy, padding, dummy); #else padding = RinitJVM_with_padding(dummy, padding, dummy); printf("FINAL: %p - %p = %p\n", dummy, padding, (intptr_t)dummy - padding); #endif return 0; } Various GCC versions (4.8, 7.3, 10.1, 11.2) all transformed ''RinitJVM_with_padding()'' into a form that yielded an infinite loop (''gcc -DNO_PRINTF -DNO_RETURN -o jsw jsw.c -g -O2''). For GCC 4.8 and 7.3, inserting a ''printf()'' statement into the function to show the stack pointer, padding remaining, and padding change on each invocation yielded a working function (external dependency on the local variable disallows its being optimized-out). The address of the local variable moved monotonically downward from the base stack pointer from when ''RinitJVM_with_padding()'' was first called, by 48 bytes for each ''printf()'' (''gcc -DNO_RETURN -o jsw jsw.c -g -O2''). padding = 0x200000 0x7ffd59ed1cc0 0x7ffd59ed1c9f: padding = 0x1fffc3 ; dp = 0x3d 0x7ffd59ed1c80 0x7ffd59ed1c5f: padding = 0x1fff83 ; dp = 0x40 0x7ffd59ed1c40 0x7ffd59ed1c1f: padding = 0x1fff43 ; dp = 0x40 0x7ffd59ed1c00 0x7ffd59ed1bdf: padding = 0x1fff03 ; dp = 0x40 0x7ffd59ed1bc0 0x7ffd59ed1b9f: padding = 0x1ffec3 ; dp = 0x40 0x7ffd59ed1b80 0x7ffd59ed1b5f: padding = 0x1ffe83 ; dp = 0x40 : 0x7ffd59cd1e00 0x7ffd59cd1ddf: padding = 0x103 ; dp = 0x40 0x7ffd59cd1dc0 0x7ffd59cd1d9f: padding = 0xc3 ; dp = 0x40 0x7ffd59cd1d80 0x7ffd59cd1d5f: padding = 0x83 ; dp = 0x40 0x7ffd59cd1d40 0x7ffd59cd1d1f: padding = 0x43 ; dp = 0x40 0x7ffd59cd1d00 0x7ffd59cd1cdf: padding = 0x3 ; dp = 0x40 0x7ffd59cd1cc0 0x7ffd59cd1c9f: padding = 0xffffffffffffffc3 ; dp = 0x40 For GCC 10.1 and 11.2, the ''printf()'' dependency also produced a working function. However, the stack pointer shifted both upwards and downwards as the recursion progressed: the stack pointer would drop by 64 bytes, then through two subsequent ''printf()'' calls rise by 1 byte. In each case, the function's stack pointer stayed the same. padding = 0x200000 0x7ffdd834d1d0 0x7ffdd834d1ad: padding = 0x1fffc1 ; dp = 0x3f 0x7ffdd834d1d0 0x7ffdd834d1ae: padding = 0x1fffc2 ; dp = 0xffffffffffffffff 0x7ffdd834d1d0 0x7ffdd834d1af: padding = 0x1fffc3 ; dp = 0xffffffffffffffff 0x7ffdd834d190 0x7ffdd834d16d: padding = 0x1fff81 ; dp = 0x42 0x7ffdd834d190 0x7ffdd834d16e: padding = 0x1fff82 ; dp = 0xffffffffffffffff 0x7ffdd834d190 0x7ffdd834d16f: padding = 0x1fff83 ; dp = 0xffffffffffffffff 0x7ffdd834d150 0x7ffdd834d12d: padding = 0x1fff41 ; dp = 0x42 0x7ffdd834d150 0x7ffdd834d12e: padding = 0x1fff42 ; dp = 0xffffffffffffffff 0x7ffdd834d150 0x7ffdd834d12f: padding = 0x1fff43 ; dp = 0xffffffffffffffff : 0x7ffdf339f290 0x7ffdf339f26d: padding = 0x41 ; dp = 0x42 0x7ffdf339f290 0x7ffdf339f26e: padding = 0x42 ; dp = 0xffffffffffffffff 0x7ffdf339f290 0x7ffdf339f26f: padding = 0x43 ; dp = 0xffffffffffffffff 0x7ffdf339f250 0x7ffdf339f22d: padding = 0x1 ; dp = 0x42 0x7ffdf339f250 0x7ffdf339f22e: padding = 0x2 ; dp = 0xffffffffffffffff 0x7ffdf339f250 0x7ffdf339f22f: padding = 0x3 ; dp = 0xffffffffffffffff 0x7ffdf339f210 0x7ffdf339f1ed: padding = 0xffffffffffffffc1 ; dp = 0x42 Appearances are that the ''printf()'' dependency prevented full flattening of the recursion into a single contained loop, with a recursion of loops being the result. Each invocation extends the stack by 64 bytes, generating multiple local variables (versus just one) that the code loops over. Because the stack pointer appears to drop AND rise, the padding value is not accurately altered: the actual delta between the final stack location and the first is 0x1B05201D bytes, about 216 times the desired 0x200000 bytes (for 4.8 and 7.3, the delta was 0x200021 bytes). Having a ''printf()'' present to force a dependency on the local variable is not a viable solution; likewise, introducing calls to some other dummy function are not a great solution to the problem. Since GCC 4.4, C source code can affect the compiler behavior to enforce, for example, no optimization of segments of code. The code delimited by the ''NO_OPT'' macro in jsw.c (above) informs GCC that the ''RinitJVM_with_padding()'' function should not be subject to compiler optimizations. A subsequent compilation (''gcc -DNO_OPT -DNO_RETURN -o jsw jsw.c -g -O2'') produces the following behavior in each version of GCC: padding = 0x200000 0x7ffd8a647550 0x7ffd8a647547: padding = 0x1fffdb ; dp = 0x25 0x7ffd8a647510 0x7ffd8a647507: padding = 0x1fff9b ; dp = 0x40 0x7ffd8a6474d0 0x7ffd8a6474c7: padding = 0x1fff5b ; dp = 0x40 0x7ffd8a647490 0x7ffd8a647487: padding = 0x1fff1b ; dp = 0x40 0x7ffd8a647450 0x7ffd8a647447: padding = 0x1ffedb ; dp = 0x40 0x7ffd8a647410 0x7ffd8a647407: padding = 0x1ffe9b ; dp = 0x40 : 0x7ffd8a447690 0x7ffd8a447687: padding = 0x11b ; dp = 0x40 0x7ffd8a447650 0x7ffd8a447647: padding = 0xdb ; dp = 0x40 0x7ffd8a447610 0x7ffd8a447607: padding = 0x9b ; dp = 0x40 0x7ffd8a4475d0 0x7ffd8a4475c7: padding = 0x5b ; dp = 0x40 0x7ffd8a447590 0x7ffd8a447587: padding = 0x1b ; dp = 0x40 0x7ffd8a447550 0x7ffd8a447547: padding = 0xffffffffffffffdb ; dp = 0x40 The final test is to remove the ''printf()'' dependency (''gcc -DNO_OPT -DNO_PRINTF -o jsw jsw.c -g -O2''): padding = 0x200000 FINAL: 0x7ffe31fa0d3c - 0x7ffe31da0d17 = 0x200025 The rJava package source was downloaded and unpacked, and the ''src/init.c'' file was patched to place the GCC pragmas: #pragma GCC push_options #pragma GCC optimize ("O0") and #pragma GCC pop_options around the ''RinitJVM_with_padding()'' function. The ''install.packages()'' R function was invoked on the patched source directory and installed without issue. Subsequent installation of rJava-dependent packages no longer hung when attempting to load the Java VM.