Wednesday, February 13, 2013

Intermittently seeing EADDRNOTAVAIL when calling connect

Follow on to  my previous post.

I have been debugging Linux systems for many years at Vertica, and very often I have been helped by a lucid description of a problem that someone else has written and posted. In this post I hope to pay back some of that help that I have received over the years.

Geek Alert: the rest of the post will delve into socket geekery. If that doesn't get you excited, there are plenty of other ways to spend you time on the internet.

Problem:

In some cases on more recent (introduced at the end of 2012 and early 2013) Redhat based kernels (I don't know about other distributions) when you connect a file descriptor that was previously bind-ed to a specific port, the kernel will intermittently return EADDRNOTAVAIL.

Specifically, I have observed this behavior change going from

Linux 2.6.18-308.13.1.el5 #1 SMP Tue Aug 21 17:10:18 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux to:
Linux  2.6.18-348.1.1.el5 #1 SMP Tue Jan 22 16:19:19 EST 2013 x86_64 x86_64 x86_64 GNU/Linux

In my (admittedly abusive) test program, about 150 out of 3000 rapidly opened and then closed sockets (with at most 10 outstanding at any time) fail with the call to connect returning -1 and setting errno to EADDRNOTAVAIL.

Solution:

Don't bind the socket first. Call connect directly.

There it is -- in a nutshell -- my contribution to global knowledge. Hopefully it makes more than zero positive karma.

Discussion:

The problem doesn't seem to maifest itself if bind is not called on the socket before calling connect. I have no idea why this particular implementation quirk exists, nor why the change in behavior got backported by RedHat into RHEL5. A comment on the apache.org trafficserver-dev mailing list suggests the problem might be due to the fact that the Linux kernel has two different code paths for assigning ephemeral ports depending on if bind has been called or not.

Background:

For those of you reading along not familiar with BSD/POSIX style sockets calls I will try and provide some background while I have ambition and this information is still paged into my head.

The canonical pattern to receive incoming connections is:

// create a socket
int fd = socket(....)

// Bind the socket to where we want it to listen

bind(fd, <protocol, interface, port>)

// Tell the network stack to queue incoming connection requests
listen(fd)

// get a file descriptor for a particular client connection:
int cfd = accept(fd);

Note that in the accepting sequence above, the call to bind associates a socket with a particular port address so that the network stack knows on which interface / port it should accept connections from clients and which file descriptor to route such connections to.

The canonical pattern to establish an outgoing client connection to a server is

// create a socket
int fd = socket(....)

// get a file descriptor for a particular client connection:
connect(fd, remote_address);

Note that in this pattern, the connection from the client to the server does not explicitly supply an address and port. Rather, the tcp stack assigns it an 'ephemeral' port for the duration of the connection.

For reasons of code symmetry, Vertica's network code happened to use the following pattern

// create a socket
int fd = socket(....)

// Bind the socket to a specific local address
bind(fd, <protocol, interface, port>)

// get a file descriptor for a particular client connection:
connect(fd, remote_address);

Of course a call to bind above is not really required, and hasn't caused any problems in the last 5 years of production deployments of the Vertica Analytic Database. Actually in this case it doesn't even really seem do anything (to my knowledge) because the tcp stack was assigning the newly created connection an ephemeral port anyways (perhaps due to some options we had set on the socket via setsockopt).

Anyhow, when I changed the networking layer to avoid calling bind on the outgoing socket in this case, the intermittent EADDRNOTAVAIL failures went away.

If anyone can explain the above behavior better, or why RedHat backported something that caused it to start failing / behaving differently after more than 5 years of happiness, I would loev to hear from you.

p.s.

Before you say "of course you are running out of ephemeral ports" (which is the most common reason to get the EADDRNOTAVAIL error, it is not true -- I have 30K available and I can reliably get the problem to occur ~150 times out of 3000 with only 10 concurrently open at a time:

[06:33:58][alamb@tldr:~]$  cat /proc/sys/net/ipv4/ip_local_port_range
32768   61000

A Linux and Vertica Opera: EADDRNOTAVAIL returned fron connect

This short post is required to set up the more in depth follow up.

In Fall 2012, I worked on a problem here at Vertica that we saw at several customer sites. Specifically, the pathology was that when a lot of TCP connections get opened and closed in a short period of time (the Vertica Analytic Database happens to do this to run some queries), certain Linux kernels (unfortunately the stock RHEL6 ones included) will, occasionally, return EADDRNOTAVAIL when the program tries to connect a socket to a remote node.

This was causing some outbound connections to intermittently fail which was causing queries to fail which was causing an unhappy situation for all involved. 

I am fairly sure this is a kernel bug, at least from our point of view.  I can use the more polite phrase of 'bad interaction between kernel and vertica', but it doesn't really matter because at the end of the day the queries were failing, upgrading or downgrading the kernel made the problem go away, but our customers were in pain.

Amusingly the workaround I came up with was if the kernel refuses to open a connection, simply reissue the connect a few times (aka retry when an error was going to happen anyways). This approach was actually far more effective than I would have imagined -- the symptoms just go away.

I like to think of the workaround as the following (abbreviated) opera. You need to sing it in your head with a deep operatic voice:

 
Vertica: please open the connection
Kernel: No! (address is not available)!
Vertica: please open the connection
Kernel: No! (address is not available)!
Vertica: please open the connection
Kernel: Ok, fine