Recently I’ve struggled with subj. One of the customers of the company I’m working has encountered several “java.lang.OutOfMemoryError: unable to create new native thread” appearances in a clustered WebLogic Server 9 environment. Here I’ll describe why it could happen and possible workarounds for this error.
JVM memory allocations
Generally speaking “java.lang.OutOfMemoryError: unable to create new native thread” error means java process has run out of virtual memory: a process has tried to create a new thread, the thread has to reserve memory for thread stack and it can’t, hence, the error. Let’s consider a 32-bit Sun’s HotSpot server JVM5. Following are memory regions in it’s virtual space:
- Heap. Usually it’s the biggest contributor to the size of the consumed process’s address space. Most objects used in the Java code are allocated here. Heap size is controlled with -Xms/-Xmx startup parameters.
- Permanent generation. This is a special heap, and special objects live here: class, method, and intern()-ed String objects. The reason for distinguishing objects into separated heaps is clear: ordinary objects are inclined to die quickly on the minor garbage collection cycles, whereas class/method objects are supposed to live way too long. intern()-ed Strings are also considered long-lived objects since JVM uses them for pooling String literals and constant expressions. Permanent generation size is controlled with-XX:PermSize/ -XX:MaxPermSize.
- Threads’ stacks. By default it is 1MB and can be changed via -Xss. The stack size can be supplied to Thread‘s constructor too. There are special JVM threads which can ignore this setting and use different stack size.
- Code cache. This is a memory part where specific JIT-compiler stores compiled code. It’s size is set via -XX:ReservedCodeCacheSize.
- Shared libraries. They are shared at the OS level, but are accounted to each process’ address space, when several processes shares a library. Their total size is not fixed and may vary during JVM life.
- Other native resources, for example file descriptors or memory mapped files.
- JVM internal memory structures.
Suppose that a JVM uses following startup parameters:
-Xms2048m -Xmx2048m -XX:PermSize=256m -XX:MaxPermSize=256m -Xss1024k -XX:ReservedCodeCacheSize=128m
There’s total of 2432MB memory for heap, permanent generation and code cache. If this would be the only memory regions JVM process allocates, it would be possible to start and use approximately 1600 threads in the JVM. But many allocations are native resources – and they may be big enough to reduce available process’ address space to the value much smaller than 1600m, for example, to 200-500m, which means a JVM could run about 200-500 threads.
WebLogic 9 uses single execute queue – single thread pool, which is being utilized by different applications/parts of application. Depending on the Work Manager settings, WLS schedules execution to the execute queue. This single thread pool is not a single point where threads are created. Here are two problems I can confirm:
- Javelin (WLS compilation framework) starts (availableProcessors+1) threads upon JSP compilation. This could pass unnoticed, but there seems to be a bug in exception handling: if Javelin encounters an exception during start of threads, then already started threads would remain hanging. This is an excerpt of JVM thread dump after several such fails (actual number of such hanging threads may become significant – I’ve seen up to 180 such threads):
"Javelin Worker-0" daemon prio=1 tid=0x0191a390 nid=0x19e8 in Object.wait() [0x3fc7f000..0x3fc7faf0] at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:474) at javelin.client.ThreadPool$WorkerThread.run(ThreadPool.java:202) - locked <0x9de3dc20> (a java.lang.Object) "Javelin Worker-0" daemon prio=1 tid=0x0a40e470 nid=0x19e5 in Object.wait() [0x3ff7f000..0x3ff7f970] at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:474) at javelin.client.ThreadPool$WorkerThread.run(ThreadPool.java:202) - locked <0x887d3f08> (a java.lang.Object) "Javelin Worker-0" daemon prio=1 tid=0x088fab68 nid=0x19e4 in Object.wait() [0x4007f000..0x4007f8f0] at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:474) at javelin.client.ThreadPool$WorkerThread.run(ThreadPool.java:202) - locked <0x88763108> (a java.lang.Object) "Javelin Worker-1" daemon prio=1 tid=0x00d41690 nid=0x19d2 in Object.wait() [0x3737f000..0x3737f9f0] at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:474) at javelin.client.ThreadPool$WorkerThread.run(ThreadPool.java:202) - locked <0x88404d88> (a java.lang.Object)
- By default number of native muxer threads is (availableProcessors+1). This is odd, since it is usually not necessary to have let’s say, 64 threads all reading a socket. It should be enough to have 3-5 such threads at most. Luckily number of native muxers is controllable: <socket-readers>N<socket-readers> element right under <server> element of config.xml does it.
Both issues increase probability of running out of virtual memory in WLS. AFAIK things didn’t change in WLS 10.
Possible solutions to the “OOM: unable to create new native thread” are aimed at reducing total amount of virtual memory allocate by JVM, reducing number of running threads or increasing JVM’s address space:
- changing heap, permanent generation, code cache size, default thread stack size
- better control over the number of running threads – via configuration, code change, etc
- using 64-bit JVM
PS. while googling I’ve found this. Hilton HHonors uses WLS, and google crawler broke them JVM due to failed javelin compilation some time in the past :)