ExecJS crashes when shelling out to nodejs
I am running into a strange problem with JavaScript assets not being loaded in gitlab-rails, which crashes the app. It appears to only affect GCK users, perhaps only on Linux. I verified with a co-worker that they do not have this issue with the GDK on Linux. I run Fedora with Kernel 6.6.8: Linux carbon-x1 6.6.8-200.fc39.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Dec 21 04:01:49 UTC 2023 x86_64 GNU/Linux
The symptom will be something to this extent:
RuntimeError at /users/sign_in
Your nodejs binary failed to load autoprefixer script file,
please check if you're running a supported version (10, 12, 14+)
ENV["PATH"] = /data/cache/bundle-3.1.4/ruby/3.1.0/bin:/data/cache/bundle-3.1.4/bin:/data/cache/go/bin:/usr/local/go/bin:/scripts/ccache:/usr/local/bin:/usr/local/sbin:/usr/sbin:/usr/bin:/sbin:/bin
binary = node
I've been debugging this all afternoon and traced it down to a dependency of the autoprefixer
gem, which itself uses a gem called execjs
that abstracts away JS runtimes. When this gem tries to shell out to node
from a Puma process to load the vendor/autoprefixer.js
file, it crashes with OOM.
I found that this is unrelated to the specific script that's being loaded. In fact, any attempt to shell out to the node
binary will crash, as long as sufficiently long output is produced. For example:
rbtrace -p $(pgrep -f 'worker 1') -e 'IO.popen(["node", "-v"], :err=>[:child, :out]).read'
*** run `sudo sysctl kernel.msgmnb=1048576` to prevent losing events (currently: 16384 bytes)
*** attached to process 100
>> IO.popen(["node", "-v"], :err=>[:child, :out]).read
=> "v18.17.1\n"
*** detached from process 100
works, but this does not:
bundle exec rbtrace -p $(pgrep -f 'worker 1') -e 'IO.popen(["node", "-h"], :err=>[:child, :out]).read'
*** run `sudo sysctl kernel.msgmnb=1048576` to prevent losing events (currently: 16384 bytes)
*** attached to process 100
>> IO.popen(["node", "-h"], :err=>[:child, :out]).read
=> "\n<--- Last few GCs --->\n\n\n<--- JS stacktrace --->\n\n\n#\n# Fatal javascript OOM in MemoryChunk allocation failed during deserialization.\n#\n\n"
*** detached from process 100
You can strace
this call, and I found it fails to call mprotect
shortly before crashing:
web_1 | execve("/usr/bin/node", ["node", "-h"], 0x7ffd446cfc38 /* 115 vars */) = 0
web_1 | brk(NULL) = 0x5732000
web_1 | openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/libjemalloc.so.2", O_RDONLY|O_CLOEXEC) = 3
...
web_1 | mprotect(0x53c3000, 245760, PROT_READ|PROT_WRITE|PROT_EXEC) = -1 EACCES (Permission denied)
web_1 | mprotect(0x53c0000, 8192, PROT_NONE) = 0
web_1 | madvise(0x53c0000, 8192, MADV_DONTNEED) = 0
web_1 | write(2, "\n<--- Last few GCs --->\n\n", 25
web_1 | <--- Last few GCs --->
web_1 |
web_1 | ) = 25
web_1 | write(2, "\n<--- JS stacktrace --->\n\n", 26
web_1 | <--- JS stacktrace --->
web_1 |
web_1 | ) = 26
web_1 | write(2, "\n#\n# Fatal javascript OOM in Mem"..., 86
web_1 | #
web_1 | # Fatal javascript OOM in MemoryChunk allocation failed during deserialization.
web_1 | #
web_1 |
web_1 | ) = 86
web_1 | --- SIGTRAP {si_signo=SIGTRAP, si_code=SI_KERNEL} ---
web_1 | +++ killed by SIGTRAP (core dumped) +++
web_1 | #<Thread:0x00007fc576bcded0 eval:6 run> terminated with exception (report_on_exception is true):
web_1 | (eval):1:in `system': Command failed with SIGTRAP (signal 5) (core dumped): strace (RuntimeError)
web_1 | from (eval):1:in `eval_context'
web_1 | from eval:6:in `eval'
web_1 | from eval:6:in `block in eval_and_inspect'
but I'm not sure if this is just another red herring.