Are you using PostgreSQL and experiencing
PSQLException: This connection has been closed. sometimes? Did you, by chance, configure a
socketTimeout for your database connection? Read on to find out more and how to fix this.
Here at ImmobilienScout24 we’re just starting to use PostgreSQL; so we conscientiously worked through the loads of connection parameters one can specify. One parameter we chose to set was
socketTimeout, using 16s. Given that we’re aiming for an API responding sub 100ms, that seemed to be very liberal while still providing some safety concerning thread congestion.
Good; connection established, fast forward a couple of weeks, right into the dark launch of our new product…
This connection has been closed
We were running some weeks in a production environment, congratulating ourselves for choosing PostgreSQL over MySQL when we suddenly experienced bursts of database connectivity issues, all reporting
Caused by: org.postgresql.util.PSQLException: This connection has been closed.
Hm. We noted that most of the time such issues occurred, there was an autovacuum either running or finished shortly before, we involved a bunch of people, logged a ton, configured our connection pool, and, to cut a long story short – figured out that the
socketTimeout parameter initially caused our problem.
As so often, the funny thing is that all makes sense and works as expected:
The timeout value used for socket read operations. If reading from the server takes longer than this value, the connection is closed. This can be used as both a brute force global query timeout and a method of detecting network problems. The timeout is specified in seconds and a value of zero means that it is disabled.1
So we did have some long-running query that ran into timeout, causing that connection to be closed.
Connection pooling & validation
Now we’re making use of the Tomcat JDBC Connection Pool, so we had a broken connection in our pool that was screwing with us. And as this connection was pretty quickly returned to the pool (using it fails quickly, because the connection is physically closed on the client-side), it had a noticeable blast radius.
But… now you may think “For sure you’re validating connections, don’t you? How can this be an issue?”
Yes, we validate connections. We validate them on connect and while idle (every 16 seconds); validating them on borrow and on return comes with a latency impact as this would be done synchronously, so we initially refrained from doing that. But even then we had the issue – although the application recovered after ~10-15 seconds when enabling either
The fallacy was that we did not set
validationInterval, thus the default of 30s was used. In effect, that delayed validating the connection up to 14s (the remainder of the 30s after the 16s timeout).
So in order to stick with the
socketTimeout, you have to enable
testOnReturn, and set
validationInterval to a value less or equal than
The alternative: using queryTimeout
As an alternative, one can set a query timeout instead. This keeps the connection alive when failing with
Caused by: org.postgresql.util.PSQLException: ERROR: canceling statement due to user request
Setting the query timeout can unfortunately not be done on a global level; or rather it depends on the abstraction layer you’re using. E.g. for JPA, you can use the configuration parameter
An additional advantage of using the query timeout is that you can override it easily. Overriding the socket timeout however is only possible on a connection level, making it a bit awkward in conjunction with connection pooling.
Jump on over to GitHub to see a test scenario illustrating the behavior of the two parameters.