Monday, January 24, 2011

Too many open files

I've had my share of dreaded moments of the 'Too many open files' exception. When you see 'java.io.IOException[java.net.SocketException]: Too many open files' it can be somewhat tricky finding out what exactly is causing it. But not so tricky if you know what needs to be done to get to the root cause.

First up though what's this exception really telling us?
When a file/socket is accessed in linux, a file descriptor is created for the process that deals with the operation. This information is available under /proc/process_id/fd. The number of file descriptors allowed are however restricted. Now if a file/socket is accessed, and the stream used to access the file/socket is not closed properly, then we run the danger of exhausting the limit of open files. This when we see 'Too many open files' cropping up in our logs.
However the fix to the root cause will vary from what's been uncovered. It's easy if it's an error made in your code base (simply because you can fix your own code easily - at least in theory)  and harder if it's a third-party library or worse the jdk (not so worse if its documented though).

So what do we do when this is upon us?
As with any other thing, find the root cause and cure it.
In order to find the root cause in relation to this exception, the first thing that would be nice to find out is, what files are opened and how many of them are opened to cause the exception. The 'lsof' command is your friend here.
shell>lsof -p [process_id]
(To state the obvious the process id is the pid of your java process)
The output of the above could be 'grep'd to find out what files are repeated and is increasing as the application runs.

Once we know the file(s), and if that file/socket is accessed within our code its a no brainer. Go fix the code. This could be something simple as not closing a stream properly like so;
public void doFoo() {
    Properties props = new Properties();
    props.load(Thread.currentThread().getContextClassLoader().getResourceAsStream("file.txt"));
}
The stream above is not closed. This would result in a open file handle against filt.txt.
If its third-party code, and you have access to the code, you have put your self through the some-what painful process of finding out the buggy code in order to work out what can be done about it.
In some situations third party software would require increasing the hard and soft limits applicable to the number of file descriptors. This can be done at /etc/conf/limits.conf by changing the figures for hard and soft values like so;
*   hard   nofile   1024
*   soft   nofile   1024
Usage of '*' is best replaced by user:group for which the change is applicable to.
Some helpful information about lsof

No comments: