keepalive connection to fastcgi backend hangs

19 Dec '21

I've created a server setup where nginx acts as
a proxy server for a fastcgi application.
That last application is running on a different server on port 9000.
It is spawn with spawn-fcgi.
Recently I have found out that nginx
closes the connection after every request.

In order to make nginx keep the tcp connections alive,
I've added the following settings:

* proxy_socket_keepalive on
* proxy_http_version 1.1;
* proxy_set_header Connection "";
* fastcgi_keep_conn on;
* added an upstream "fgi":

upstream fcgi {
    keepalive 10;
    server myhost:9000;
}

* added a location block (only snippet given):

location /fcgi {
   fastcgi_pass_request_headers on;
   fastcgi_pass fcgi;
   fastcgi_keep_conn on;
}

What I see: after a couple of requests nginx "hangs" when I visit path "/fcgi".

This disappears when

* I remove the setting "keepalive" from the upstream (but that disables keepalive altogether)
* bind the fcgi application to a unix socket, and let nginx bind to that. But that requires nginx and the fcgi to be on the same server.
* reduce the number of nginx workers to exactly 1. Not sure why that works.
* I spawn the application with tool "supervisord" (a fcgi process manager written in python)

Does anyone know what is happening here?
Fcgi has little documentation on the web..

Example of an application: fcgi_example.cpp

#include <iostream>
#include <fcgio.h>
#include <fcgiapp.h>

void handle_request(FCGX_Request& request){
    fcgi_streambuf cout_fcgi_streambuf(request.out);
    std::ostream os{&cout_fcgi_streambuf};

    os << "HTTP/1.1 200 OK\r\n"
       << "Content-type: text/plain\r\n\r\n"
       << "Hello!\r\n";
}

int main(){
    FCGX_Request request;
    FCGX_Init();
    FCGX_InitRequest(&request, 0, 0);
    while (FCGX_Accept_r(&request) == 0) {
        handle_request(request);
    }
}

Build: g++ -std=c++11 -lfcgi -lfcgi++ -o fcgi_example fcgi_example.cpp

Spawn: spawn-fcgi -f /path/to/fcgi_example.cpp -p 9000

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx/attachments/20211219/88f2c292/attachment.htm>

Maxim

20 Dec '21

Hello!

On Sun, Dec 19, 2021 at 07:56:51PM +0000, Nicolas Franck wrote:

> In order to make nginx keep the tcp connections alive,
> I've added the following settings:
> 
> * proxy_socket_keepalive on
> * proxy_http_version 1.1;
> * proxy_set_header Connection "";

Just a side note: you don't need any of these to keep FastCGI 
connections alive.

> * fastcgi_keep_conn on;
> * added an upstream "fgi":
> 
> upstream fcgi {
>     keepalive 10;
>     server myhost:9000;
> }
> 
> * added a location block (only snippet given):
> 
> location /fcgi {
>    fastcgi_pass_request_headers on;
>    fastcgi_pass fcgi;
>    fastcgi_keep_conn on;
> }
> 
> What I see: after a couple of requests nginx "hangs" when I 
> visit path "/fcgi".
> 
> This disappears when
> 
> * I remove the setting "keepalive" from the upstream (but that 
> disables keepalive altogether)
> * bind the fcgi application to a unix socket, and let nginx bind 
> to that. But that requires nginx and the fcgi to be on the same 
> server.
> * reduce the number of nginx workers to exactly 1. Not sure why 
> that works.
> * I spawn the application with tool "supervisord" (a fcgi 
> process manager written in python)
> 
> Does anyone know what is happening here?
> Fcgi has little documentation on the web..

[...]

> Spawn: spawn-fcgi -f /path/to/fcgi_example.cpp -p 9000

The spawn-fcgi defaults to 1 child process, and each child process 
can handle just one connection.  On the other hand, your 
configuration instruct nginx to cache up to 10 connections per 
nginx process.

As long as the only one connection your upstream server can handle 
is cached in another nginx process, nginx won't be able able to 
reach the upstream server, and will return 504 Gateway Timeout 
error once the fastcgi_connect_timeout expires (60s by default).  
Likely this is something you see as "hangs".

Obvious fix would be to add additional fastcgi processes.  Given 
"keepalive 10;" in nginx configuration, you'll need at least 10 * 
<number of nginx worker processes>.  Something like:

spawn-fcgi -F 20 -f /path/to/fcgi -p 9000

should fix things for up to 2 nginx worker processes.

Just in case, that's exactly the problem upstream keepalive 
documentation warns about (http://nginx.org/r/keepalive): "The 
connections parameter should be set to a number small enough to 
let upstream servers process new incoming connections as well".

-- 
Maxim Dounin
http://mdounin.ru/

Nicolas

20 Dec '21

Interesting!

I looks like there is nothing that managing the incoming connections
for the fcgi workers. Every fcgi worker needs to do this on its own, right?
So if there are more clients (i.e. nginx workers) than fcgi workers,
then it becomes unresponsive after a few requests, because all
the fcgi workers are holding on to a connection to an nginx worker,
and there seems to be no queue handling this. 

Is this correct? Just guessing here

> On 20 Dec 2021, at 15:11, Maxim Dounin <mdounin at mdounin.ru> wrote:
> 
> Hello!
> 
> On Sun, Dec 19, 2021 at 07:56:51PM +0000, Nicolas Franck wrote:
> 
>> In order to make nginx keep the tcp connections alive,
>> I've added the following settings:
>> 
>> * proxy_socket_keepalive on
>> * proxy_http_version 1.1;
>> * proxy_set_header Connection "";
> 
> Just a side note: you don't need any of these to keep FastCGI 
> connections alive.
> 
>> * fastcgi_keep_conn on;
>> * added an upstream "fgi":
>> 
>> upstream fcgi {
>>    keepalive 10;
>>    server myhost:9000;
>> }
>> 
>> * added a location block (only snippet given):
>> 
>> location /fcgi {
>>   fastcgi_pass_request_headers on;
>>   fastcgi_pass fcgi;
>>   fastcgi_keep_conn on;
>> }
>> 
>> What I see: after a couple of requests nginx "hangs" when I 
>> visit path "/fcgi".
>> 
>> This disappears when
>> 
>> * I remove the setting "keepalive" from the upstream (but that 
>> disables keepalive altogether)
>> * bind the fcgi application to a unix socket, and let nginx bind 
>> to that. But that requires nginx and the fcgi to be on the same 
>> server.
>> * reduce the number of nginx workers to exactly 1. Not sure why 
>> that works.
>> * I spawn the application with tool "supervisord" (a fcgi 
>> process manager written in python)
>> 
>> Does anyone know what is happening here?
>> Fcgi has little documentation on the web..
> 
> [...]
> 
>> Spawn: spawn-fcgi -f /path/to/fcgi_example.cpp -p 9000
> 
> The spawn-fcgi defaults to 1 child process, and each child process 
> can handle just one connection.  On the other hand, your 
> configuration instruct nginx to cache up to 10 connections per 
> nginx process.
> 
> As long as the only one connection your upstream server can handle 
> is cached in another nginx process, nginx won't be able able to 
> reach the upstream server, and will return 504 Gateway Timeout 
> error once the fastcgi_connect_timeout expires (60s by default).  
> Likely this is something you see as "hangs".
> 
> Obvious fix would be to add additional fastcgi processes.  Given 
> "keepalive 10;" in nginx configuration, you'll need at least 10 * 
> <number of nginx worker processes>.  Something like:
> 
> spawn-fcgi -F 20 -f /path/to/fcgi -p 9000
> 
> should fix things for up to 2 nginx worker processes.
> 
> Just in case, that's exactly the problem upstream keepalive 
> documentation warns about (https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fnginx.org%2Fr%2Fkeepalive&amp;data=04%7C01%7CNicolas.Franck%40ugent.be%7Ce982b61a75624979ba3708d9c3c2b12b%7Cd7811cdeecef496c8f91a1786241b99c%7C1%7C0%7C637756063709619074%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=mHb200pbVMC0oS0HrbLY31hX3QyLQV0WQLoyt%2Fh96eM%3D&amp;reserved=0): "The 
> connections parameter should be set to a number small enough to 
> let upstream servers process new incoming connections as well".
> 
> -- 
> Maxim Dounin
> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmdounin.ru%2F&amp;data=04%7C01%7CNicolas.Franck%40ugent.be%7Ce982b61a75624979ba3708d9c3c2b12b%7Cd7811cdeecef496c8f91a1786241b99c%7C1%7C0%7C637756063709619074%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=vgJOdIzYbvba1kqXCKAiMY%2FPyNx3RgyQonp9cbLXZ6Q%3D&amp;reserved=0
> _______________________________________________
> nginx mailing list
> nginx at nginx.org
> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmailman.nginx.org%2Fmailman%2Flistinfo%2Fnginx&amp;data=04%7C01%7CNicolas.Franck%40ugent.be%7Ce982b61a75624979ba3708d9c3c2b12b%7Cd7811cdeecef496c8f91a1786241b99c%7C1%7C0%7C637756063709619074%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=qNyDaV%2B0p4cuTlcFkBZeKN3gCT0A03xIMrb5qOOeBFk%3D&amp;reserved=0

Maxim

20 Dec '21

Hello!

On Mon, Dec 20, 2021 at 04:00:59PM +0000, Nicolas Franck wrote:

> Interesting!
> 
> I looks like there is nothing that managing the incoming connections
> for the fcgi workers. Every fcgi worker needs to do this on its own, right?
> So if there are more clients (i.e. nginx workers) than fcgi workers,
> then it becomes unresponsive after a few requests, because all
> the fcgi workers are holding on to a connection to an nginx worker,
> and there seems to be no queue handling this. 
> 
> Is this correct? Just guessing here

More or less.  The FastCGI code in your example implies very 
simple connection management, based on the process-per-connection 
model.  As long as all FastCGI processes are busy, all additional 
connections will be queued in the listen queue of FastCGI 
listening socket (till a connection is closed).  Certainly that's 
not the only model possible with FastCGI, but the easiest to use.

The process-per-connection model doesn't combine well with 
keepalive connections, since each keepalive connection occupies 
the whole process.  And you have to create enough processes to 
handle all keepalive connections you want to be able to keep 
alive.  In case of nginx as a client, this means at least (<number 
of connections in the keepalive directive> * <number of worker 
processes>) processes.

Alternatively, you can avoid using keepalive connections.  These 
are not really needed for local upstream servers, since connection 
establishment costs are usually negligible compared to the total 
request processing costs.  And this is what nginx does by default.

-- 
Maxim Dounin
http://mdounin.ru/

Nicolas

20 Dec '21

I kind of agree: keepalive connections are not strictly necessary in this scenario.

But there is a reason why I started looking into this: I started noticing a lot
of closed tcp connections with status TIME_WAIT. That happens when you
close the connection on your end, and the os keeps these around for a few
seconds, to make sure that the other end of the connection got the tcp "FIN".
During that time the client port for that connection cannot be used:

$ netstat -an | grep :5000

(if the fcgi app is listening on port 5000)

If you receive a lot of requests after each other, and the number grows
larger than the os can free the TIME_WAIT connections, then you'll run
out of client ports, and see seemingly unrelated errors like "dns lookup failure".
This even happens when the response of the upstream server is fast, as it
takes a "lot" of time before the TIME_WAIT connections are freed.

Reuse of tcp connections is one way to tackle this problem.
Playing around with sysctl is another:

$ sysctl -w net.ipv4.tcp_tw_recycle=1

But I am not well versed in this, and I do not know a lot
about the possible side effects.

cf. https://web3us.com/drupal6/how-guides/what-timewait-state
cf. https://onlinehelp.opswat.com/centralmgmt/What_you_need_to_do_if_you_see_too_many_TIME_WAIT_sockets.html

On 20 Dec 2021, at 20:35, Maxim Dounin <mdounin at mdounin.ru<mailto:mdounin at mdounin.ru>> wrote:

Hello!

On Mon, Dec 20, 2021 at 04:00:59PM +0000, Nicolas Franck wrote:

Interesting!

I looks like there is nothing that managing the incoming connections
for the fcgi workers. Every fcgi worker needs to do this on its own, right?
So if there are more clients (i.e. nginx workers) than fcgi workers,
then it becomes unresponsive after a few requests, because all
the fcgi workers are holding on to a connection to an nginx worker,
and there seems to be no queue handling this.

Is this correct? Just guessing here

More or less.  The FastCGI code in your example implies very
simple connection management, based on the process-per-connection
model.  As long as all FastCGI processes are busy, all additional
connections will be queued in the listen queue of FastCGI
listening socket (till a connection is closed).  Certainly that's
not the only model possible with FastCGI, but the easiest to use.

The process-per-connection model doesn't combine well with
keepalive connections, since each keepalive connection occupies
the whole process.  And you have to create enough processes to
handle all keepalive connections you want to be able to keep
alive.  In case of nginx as a client, this means at least (<number
of connections in the keepalive directive> * <number of worker
processes>) processes.

Alternatively, you can avoid using keepalive connections.  These
are not really needed for local upstream servers, since connection
establishment costs are usually negligible compared to the total
request processing costs.  And this is what nginx does by default.

--
Maxim Dounin
https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmdounin.ru%2F&amp;data=04%7C01%7CNicolas.Franck%40ugent.be%7Cb8a3cee984e84e47f86408d9c3efed9f%7Cd7811cdeecef496c8f91a1786241b99c%7C1%7C0%7C637756257522060451%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=%2FSv4njqodduc7rpTd5m0FXnO1DBmQooZmFXABzKbC2A%3D&amp;reserved=0
_______________________________________________
nginx mailing list
nginx at nginx.org
https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmailman.nginx.org%2Fmailman%2Flistinfo%2Fnginx&amp;data=04%7C01%7CNicolas.Franck%40ugent.be%7Cb8a3cee984e84e47f86408d9c3efed9f%7Cd7811cdeecef496c8f91a1786241b99c%7C1%7C0%7C637756257522060451%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=3gI9%2F8oxIPl65YD1pbdE5zT%2FsUM7JQUW5qLkQpSCAGU%3D&amp;reserved=0

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx/attachments/20211220/485e894e/attachment.htm>

Page 1 of 1