Nginx prematurely closing connections when reloaded

S
  • 25 Mar '24
Hello

I have an issue with nginx closing prematurely connections when reload is
performed.

I have some nginx servers configured to proxy_pass requests to an upstream
group. This group itself is composed of several servers which are nginx
themselves, and is configured to use keepalive connections.

When I trigger a reload (-s reload) on an nginx of one of the servers which
is target of the upstream, I see in error logs of all servers in front that
connection was reset by the nginx which was reloaded.

Here configuration of upstream group (IPs are hidden replaced by IP_X):

--- BEGIN ---

upstream data_api {

random;

server IP_1:80 max_fails=3 fail_timeout=30s;

server IP_2:80 max_fails=3 fail_timeout=30s;

server IP_3:80 max_fails=3 fail_timeout=30s;

server IP_4:80 max_fails=3 fail_timeout=30s;

server IP_5:80 max_fails=3 fail_timeout=30s;

server IP_6:80 max_fails=3 fail_timeout=30s;

server IP_7:80 max_fails=3 fail_timeout=30s;

server IP_8:80 max_fails=3 fail_timeout=30s;

server IP_9:80 max_fails=3 fail_timeout=30s;

server IP_10:80 max_fails=3 fail_timeout=30s;

keepalive 20;

}

--- END ---

Here configuration of the location using this upstream:

--- BEGIN ---

location / {

proxy_pass http://data_api;

proxy_http_version 1.1;

proxy_set_header Connection "";

proxy_set_header Host $host;

proxy_set_header X-Real-IP $real_ip;

proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

proxy_connect_timeout 2s;

proxy_send_timeout 6s;

proxy_read_timeout 10s;

proxy_next_upstream error timeout http_502 http_504;

}

--- END ---

And here the kind of error messages I get when I reload nginx of "IP_1":

--- BEGIN ---

2024/03/25 11:24:25 [error] 3758170#0: *1795895162 recv() failed (104:
Connection reset by peer) while reading response header from upstream,
client: CLIENT_IP_HIDDEN, server: SERVER_HIDDEN, request: "POST
/REQUEST_LOCATION_HIDDEN HTTP/2.0", upstream: "
http://IP_1:80/REQUEST_LOCATION_HIDDEN", host: "HOST_HIDDEN", referrer:
"REFERRER_HIDDEN"

--- END ---

I thought -s reload was doing graceful shutdown of connections. Is it due
to the fact that nginx can not handle that when using keepalive
connections? Is it a bug?

I am using nginx 1.24.0 everywhere, no particular

Thank you for any help.

Sébastien
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx/attachments/20240325/25d4b4bb/attachment.htm>
C
  • 25 Mar '24
What language is your upstream API written in and are you hosting in
Amazon?  Even if you aren't hosting in Amazon, it seems your values are
quite low for connect, read and send.

If you are running NodeJS for the upstream, one thing I have found is that
certain values need to be offset to avoid the 502 errors:

1. nGinx:
proxy_read_timeout and proxy_send_timeout should be set higher if not equal
(60s)

2. NodeJS:
server.keepAliveTimeout = 70 * 1000;
server.headersTimeout = 75 * 1000;

keepaliveTimeout for the NodeJS app needs to be longer than the nginx
timeouts and headersTimeout needs to slightly extend keepAliveTimeout.

Not sure if this applies to other languages but that was an issue we ran
into a few years ago.

Craig

On Mon, Mar 25, 2024 at 8:32 AM Sébastien Rebecchi <srebecchi at kameleoon.com>
wrote:

> Hello
>
>
> I have an issue with nginx closing prematurely connections when reload is
> performed.
>
>
> I have some nginx servers configured to proxy_pass requests to an
> upstream group. This group itself is composed of several servers which are
> nginx themselves, and is configured to use keepalive connections.
>
> When I trigger a reload (-s reload) on an nginx of one of the servers
> which is target of the upstream, I see in error logs of all servers in
> front that connection was reset by the nginx which was reloaded.
>
>
> Here configuration of upstream group (IPs are hidden replaced by IP_X):
>
> --- BEGIN ---
>
> upstream data_api {
>
> random;
>
>
> server IP_1:80 max_fails=3 fail_timeout=30s;
>
> server IP_2:80 max_fails=3 fail_timeout=30s;
>
> server IP_3:80 max_fails=3 fail_timeout=30s;
>
> server IP_4:80 max_fails=3 fail_timeout=30s;
>
> server IP_5:80 max_fails=3 fail_timeout=30s;
>
> server IP_6:80 max_fails=3 fail_timeout=30s;
>
> server IP_7:80 max_fails=3 fail_timeout=30s;
>
> server IP_8:80 max_fails=3 fail_timeout=30s;
>
> server IP_9:80 max_fails=3 fail_timeout=30s;
>
> server IP_10:80 max_fails=3 fail_timeout=30s;
>
>
> keepalive 20;
>
> }
>
> --- END ---
>
>
> Here configuration of the location using this upstream:
>
> --- BEGIN ---
>
> location / {
>
> proxy_pass http://data_api;
>
>
> proxy_http_version 1.1;
>
> proxy_set_header Connection "";
>
>
> proxy_set_header Host $host;
>
> proxy_set_header X-Real-IP $real_ip;
>
> proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
>
>
> proxy_connect_timeout 2s;
>
> proxy_send_timeout 6s;
>
> proxy_read_timeout 10s;
>
>
> proxy_next_upstream error timeout http_502 http_504;
>
> }
>
> --- END ---
>
>
> And here the kind of error messages I get when I reload nginx of "IP_1":
>
> --- BEGIN ---
>
> 2024/03/25 11:24:25 [error] 3758170#0: *1795895162 recv() failed (104:
> Connection reset by peer) while reading response header from upstream,
> client: CLIENT_IP_HIDDEN, server: SERVER_HIDDEN, request: "POST
> /REQUEST_LOCATION_HIDDEN HTTP/2.0", upstream: "
> http://IP_1:80/REQUEST_LOCATION_HIDDEN", host: "HOST_HIDDEN", referrer:
> "REFERRER_HIDDEN"
>
> --- END ---
>
>
> I thought -s reload was doing graceful shutdown of connections. Is it due
> to the fact that nginx can not handle that when using keepalive
> connections? Is it a bug?
>
> I am using nginx 1.24.0 everywhere, no particular
>
>
> Thank you for any help.
>
>
> Sébastien
>
> _______________________________________________
> nginx mailing list
> nginx at nginx.org
> https://mailman.nginx.org/mailman/listinfo/nginx
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx/attachments/20240325/5513dd8a/attachment.htm>
C
  • 25 Mar '24
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx/attachments/20240325/f8cd2cca/attachment-0001.htm>
I
  • 25 Mar '24
Sébastien,

Nginx should keep active connections open and wait for a request to 
complete before closing.
A reload starts a new set of workers while old workers wait for old 
connections to shut down.
The only exception I'm aware of is having worker_shutdown_timeout 
configured: in this case a worker will wait till this timeout and 
forcibly close a connection. Be default there is no timeout.

It would be curious to see error log of nginx at IP_1 (the reloaded one) 
while the reload happens. It may explain the reason for connection resets.

Kind regards,
Igor.

On 25/03/2024 12:31, Sébastien Rebecchi wrote:
>
> Hello
>
>
> I have an issue with nginx closing prematurely connections when reload 
> is performed.
>
>
> I have some nginx servers configured to proxy_pass requests to an 
> upstream group. This group itself is composed of several servers which 
> are nginx themselves, and is configured to use keepalive connections.
>
> When I trigger a reload (-s reload) on an nginx of one of the servers 
> which is target of the upstream, I see in error logs of all servers in 
> front that connection was reset by the nginx which was reloaded.
>
>
> Here configuration of upstream group (IPs are hidden replaced by IP_X):
>
> --- BEGIN ---
>
> upstream data_api {
>
> random;
>
>
> server IP_1:80 max_fails=3 fail_timeout=30s;
>
> server IP_2:80 max_fails=3 fail_timeout=30s;
>
> server IP_3:80 max_fails=3 fail_timeout=30s;
>
> server IP_4:80 max_fails=3 fail_timeout=30s;
>
> server IP_5:80 max_fails=3 fail_timeout=30s;
>
> server IP_6:80 max_fails=3 fail_timeout=30s;
>
> server IP_7:80 max_fails=3 fail_timeout=30s;
>
> server IP_8:80 max_fails=3 fail_timeout=30s;
>
> server IP_9:80 max_fails=3 fail_timeout=30s;
>
> server IP_10:80 max_fails=3 fail_timeout=30s;
>
>
> keepalive 20;
>
> }
>
> --- END ---
>
>
> Here configuration of the location using this upstream:
>
> --- BEGIN ---
>
> location / {
>
> proxy_pass http://data_api;
>
>
> proxy_http_version 1.1;
>
> proxy_set_header Connection "";
>
>
> proxy_set_header Host $host;
>
> proxy_set_header X-Real-IP $real_ip;
>
> proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
>
>
> proxy_connect_timeout 2s;
>
> proxy_send_timeout 6s;
>
> proxy_read_timeout 10s;
>
>
> proxy_next_upstream error timeout http_502 http_504;
>
> }
>
> --- END ---
>
>
> And here the kind of error messages I get when I reload nginx of "IP_1":
>
> --- BEGIN ---
>
> 2024/03/25 11:24:25 [error] 3758170#0: *1795895162 recv() failed (104: 
> Connection reset by peer) while reading response header from upstream, 
> client: CLIENT_IP_HIDDEN, server: SERVER_HIDDEN, request: "POST 
> /REQUEST_LOCATION_HIDDEN HTTP/2.0", upstream: 
> "http://IP_1:80/REQUEST_LOCATION_HIDDEN 
> <http://IP_1:80/REQUEST_LOCATION_HIDDEN>", host: "HOST_HIDDEN", 
> referrer: "REFERRER_HIDDEN"
>
> --- END ---
>
>
> I thought -s reload was doing graceful shutdown of connections. Is it 
> due to the fact that nginx can not handle that when using keepalive 
> connections? Is it a bug?
>
> I am using nginx 1.24.0 everywhere, no particular
>
>
> Thank you for any help.
>
>
> Sébastien
>
>
> _______________________________________________
> nginx mailing list
> nginx at nginx.org
> https://mailman.nginx.org/mailman/listinfo/nginx

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx/attachments/20240325/0bb4da26/attachment.htm>
S
  • 26 Mar '24
Hi Igor

There is no special logs on the IP_1 (the reloaded one) side, only 1 log
line, which is expected:
--- BEGIN ---
2024/03/26 13:37:55 [notice] 3928855#0: signal process started
--- END ---

I did not configure worker_shutdown_timeout, it is unlimited.

Sébastien.

Le lun. 25 mars 2024 à 17:59, Igor Ippolitov <iippolitov at nginx.com> a
écrit :

> Sébastien,
>
> Nginx should keep active connections open and wait for a request to
> complete before closing.
> A reload starts a new set of workers while old workers wait for old
> connections to shut down.
> The only exception I'm aware of is having worker_shutdown_timeout
> configured: in this case a worker will wait till this timeout and forcibly
> close a connection. Be default there is no timeout.
>
> It would be curious to see error log of nginx at IP_1 (the reloaded one)
> while the reload happens. It may explain the reason for connection resets.
>
> Kind regards,
> Igor.
>
> On 25/03/2024 12:31, Sébastien Rebecchi wrote:
>
> Hello
>
>
> I have an issue with nginx closing prematurely connections when reload is
> performed.
>
>
> I have some nginx servers configured to proxy_pass requests to an
> upstream group. This group itself is composed of several servers which are
> nginx themselves, and is configured to use keepalive connections.
>
> When I trigger a reload (-s reload) on an nginx of one of the servers
> which is target of the upstream, I see in error logs of all servers in
> front that connection was reset by the nginx which was reloaded.
>
>
> Here configuration of upstream group (IPs are hidden replaced by IP_X):
>
> --- BEGIN ---
>
> upstream data_api {
>
> random;
>
>
> server IP_1:80 max_fails=3 fail_timeout=30s;
>
> server IP_2:80 max_fails=3 fail_timeout=30s;
>
> server IP_3:80 max_fails=3 fail_timeout=30s;
>
> server IP_4:80 max_fails=3 fail_timeout=30s;
>
> server IP_5:80 max_fails=3 fail_timeout=30s;
>
> server IP_6:80 max_fails=3 fail_timeout=30s;
>
> server IP_7:80 max_fails=3 fail_timeout=30s;
>
> server IP_8:80 max_fails=3 fail_timeout=30s;
>
> server IP_9:80 max_fails=3 fail_timeout=30s;
>
> server IP_10:80 max_fails=3 fail_timeout=30s;
>
>
> keepalive 20;
>
> }
>
> --- END ---
>
>
> Here configuration of the location using this upstream:
>
> --- BEGIN ---
>
> location / {
>
> proxy_pass http://data_api;
>
>
> proxy_http_version 1.1;
>
> proxy_set_header Connection "";
>
>
> proxy_set_header Host $host;
>
> proxy_set_header X-Real-IP $real_ip;
>
> proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
>
>
> proxy_connect_timeout 2s;
>
> proxy_send_timeout 6s;
>
> proxy_read_timeout 10s;
>
>
> proxy_next_upstream error timeout http_502 http_504;
>
> }
>
> --- END ---
>
>
> And here the kind of error messages I get when I reload nginx of "IP_1":
>
> --- BEGIN ---
>
> 2024/03/25 11:24:25 [error] 3758170#0: *1795895162 recv() failed (104:
> Connection reset by peer) while reading response header from upstream,
> client: CLIENT_IP_HIDDEN, server: SERVER_HIDDEN, request: "POST
> /REQUEST_LOCATION_HIDDEN HTTP/2.0", upstream: "
> http://IP_1:80/REQUEST_LOCATION_HIDDEN", host: "HOST_HIDDEN", referrer:
> "REFERRER_HIDDEN"
>
> --- END ---
>
>
> I thought -s reload was doing graceful shutdown of connections. Is it due
> to the fact that nginx can not handle that when using keepalive
> connections? Is it a bug?
>
> I am using nginx 1.24.0 everywhere, no particular
>
>
> Thank you for any help.
>
>
> Sébastien
>
> _______________________________________________
> nginx mailing listnginx at nginx.orghttps://mailman.nginx.org/mailman/listinfo/nginx
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx/attachments/20240326/1ba26a88/attachment.htm>
I
  • 28 Mar '24
Sébastien,

The message about the signal process is only the beginning of the process.
You are interested in messages like the following:

> 2024/03/26 13:36:36 [notice] 723#723: signal 1 (SIGHUP) received from 
> 69064, reconfiguring
> 2024/03/26 13:36:36 [notice] 723#723: reconfiguring
> 2024/03/26 13:36:36 [notice] 723#723: using the "epoll" event method
> 2024/03/26 13:36:36 [notice] 723#723: start worker processes
> 2024/03/26 13:36:36 [notice] 723#723: start worker process 69065
> 2024/03/26 13:36:36 [notice] 723#723: start worker process 69066
> 2024/03/26 13:36:36 [notice] 723#723: start cache manager process 69067
> 2024/03/26 13:36:36 [notice] 61903#61903: gracefully shutting down
> 2024/03/26 13:36:36 [notice] 61905#61905: exiting
> 2024/03/26 13:36:36 [notice] 61903#61903: exiting
> 2024/03/26 13:36:36 [notice] 61904#61904: gracefully shutting down
> 2024/03/26 13:36:36 [notice] 61904#61904: exiting
> 2024/03/26 13:36:36 [notice] 61903#61903: exit

Note the 'gracefully shutting down' and 'exiting' message from workers. 
Also the 'start' and 'reconfiguring' messages from the master process.
There should be a similar sequence somewhere in your logs.
Having these logs may help explaining what happens on a reload.

Kind regards,
Igor.

On 26/03/2024 12:41, Sébastien Rebecchi wrote:
> Hi Igor
>
> There is no special logs on the IP_1 (the reloaded one) side, only 1 
> log line, which is expected:
> --- BEGIN ---
> 2024/03/26 13:37:55 [notice] 3928855#0: signal process started
> --- END ---
>
> I did not configure worker_shutdown_timeout, it is unlimited.
>
> Sébastien.
>
> Le lun. 25 mars 2024 à 17:59, Igor Ippolitov <iippolitov at nginx.com> a 
> écrit :
>
>     Sébastien,
>
>     Nginx should keep active connections open and wait for a request
>     to complete before closing.
>     A reload starts a new set of workers while old workers wait for
>     old connections to shut down.
>     The only exception I'm aware of is having worker_shutdown_timeout
>     configured: in this case a worker will wait till this timeout and
>     forcibly close a connection. Be default there is no timeout.
>
>     It would be curious to see error log of nginx at IP_1 (the
>     reloaded one) while the reload happens. It may explain the reason
>     for connection resets.
>
>     Kind regards,
>     Igor.
>
>     On 25/03/2024 12:31, Sébastien Rebecchi wrote:
>>
>>     Hello
>>
>>
>>     I have an issue with nginx closing prematurely connections when
>>     reload is performed.
>>
>>
>>     I have some nginx servers configured to proxy_pass requests to an
>>     upstream group. This group itself is composed of several servers
>>     which are nginx themselves, and is configured to use keepalive
>>     connections.
>>
>>     When I trigger a reload (-s reload) on an nginx of one of the
>>     servers which is target of the upstream, I see in error logs of
>>     all servers in front that connection was reset by the nginx which
>>     was reloaded.
>>
>>
>>     Here configuration of upstream group (IPs are hidden replaced by
>>     IP_X):
>>
>>     --- BEGIN ---
>>
>>     upstream data_api {
>>
>>     random;
>>
>>
>>     server IP_1:80 max_fails=3 fail_timeout=30s;
>>
>>     server IP_2:80 max_fails=3 fail_timeout=30s;
>>
>>     server IP_3:80 max_fails=3 fail_timeout=30s;
>>
>>     server IP_4:80 max_fails=3 fail_timeout=30s;
>>
>>     server IP_5:80 max_fails=3 fail_timeout=30s;
>>
>>     server IP_6:80 max_fails=3 fail_timeout=30s;
>>
>>     server IP_7:80 max_fails=3 fail_timeout=30s;
>>
>>     server IP_8:80 max_fails=3 fail_timeout=30s;
>>
>>     server IP_9:80 max_fails=3 fail_timeout=30s;
>>
>>     server IP_10:80 max_fails=3 fail_timeout=30s;
>>
>>
>>     keepalive 20;
>>
>>     }
>>
>>     --- END ---
>>
>>
>>     Here configuration of the location using this upstream:
>>
>>     --- BEGIN ---
>>
>>     location / {
>>
>>     proxy_pass http://data_api;
>>
>>
>>     proxy_http_version 1.1;
>>
>>     proxy_set_header Connection "";
>>
>>
>>     proxy_set_header Host $host;
>>
>>     proxy_set_header X-Real-IP $real_ip;
>>
>>     proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
>>
>>
>>     proxy_connect_timeout 2s;
>>
>>     proxy_send_timeout 6s;
>>
>>     proxy_read_timeout 10s;
>>
>>
>>     proxy_next_upstream error timeout http_502 http_504;
>>
>>     }
>>
>>     --- END ---
>>
>>
>>     And here the kind of error messages I get when I reload nginx of
>>     "IP_1":
>>
>>     --- BEGIN ---
>>
>>     2024/03/25 11:24:25 [error] 3758170#0: *1795895162 recv() failed
>>     (104: Connection reset by peer) while reading response header
>>     from upstream, client: CLIENT_IP_HIDDEN, server: SERVER_HIDDEN,
>>     request: "POST /REQUEST_LOCATION_HIDDEN HTTP/2.0", upstream:
>>     "http://IP_1:80/REQUEST_LOCATION_HIDDEN
>>     <http://IP_1:80/REQUEST_LOCATION_HIDDEN>", host: "HOST_HIDDEN",
>>     referrer: "REFERRER_HIDDEN"
>>
>>     --- END ---
>>
>>
>>     I thought -s reload was doing graceful shutdown of connections.
>>     Is it due to the fact that nginx can not handle that when using
>>     keepalive connections? Is it a bug?
>>
>>     I am using nginx 1.24.0 everywhere, no particular
>>
>>
>>     Thank you for any help.
>>
>>
>>     Sébastien
>>
>>
>>     _______________________________________________
>>     nginx mailing list
>>     nginx at nginx.org
>>     https://mailman.nginx.org/mailman/listinfo/nginx
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx/attachments/20240328/3d862c00/attachment.htm>
S
  • 28 Mar '24
Hi Igor,

Thanks for the answer.

I really got that message 'signal process started' every time i do 'nginx
-s reload' and this is the only log line I have, I don't have the other
lines you mentioned. Is there anything to do to enable those logs?

Sébastien

Le jeu. 28 mars 2024, 16:40, Igor Ippolitov <iippolitov at nginx.com> a écrit :

> Sébastien,
>
> The message about the signal process is only the beginning of the process.
> You are interested in messages like the following:
>
> 2024/03/26 13:36:36 [notice] 723#723: signal 1 (SIGHUP) received from
> 69064, reconfiguring
> 2024/03/26 13:36:36 [notice] 723#723: reconfiguring
> 2024/03/26 13:36:36 [notice] 723#723: using the "epoll" event method
> 2024/03/26 13:36:36 [notice] 723#723: start worker processes
> 2024/03/26 13:36:36 [notice] 723#723: start worker process 69065
> 2024/03/26 13:36:36 [notice] 723#723: start worker process 69066
> 2024/03/26 13:36:36 [notice] 723#723: start cache manager process 69067
> 2024/03/26 13:36:36 [notice] 61903#61903: gracefully shutting down
> 2024/03/26 13:36:36 [notice] 61905#61905: exiting
> 2024/03/26 13:36:36 [notice] 61903#61903: exiting
> 2024/03/26 13:36:36 [notice] 61904#61904: gracefully shutting down
> 2024/03/26 13:36:36 [notice] 61904#61904: exiting
> 2024/03/26 13:36:36 [notice] 61903#61903: exit
>
>
> Note the 'gracefully shutting down' and 'exiting' message from workers.
> Also the 'start' and 'reconfiguring' messages from the master process.
> There should be a similar sequence somewhere in your logs.
> Having these logs may help explaining what happens on a reload.
>
> Kind regards,
> Igor.
>
> On 26/03/2024 12:41, Sébastien Rebecchi wrote:
>
> Hi Igor
>
> There is no special logs on the IP_1 (the reloaded one) side, only 1 log
> line, which is expected:
> --- BEGIN ---
> 2024/03/26 13:37:55 [notice] 3928855#0: signal process started
> --- END ---
>
> I did not configure worker_shutdown_timeout, it is unlimited.
>
> Sébastien.
>
> Le lun. 25 mars 2024 à 17:59, Igor Ippolitov <iippolitov at nginx.com> a
> écrit :
>
>> Sébastien,
>>
>> Nginx should keep active connections open and wait for a request to
>> complete before closing.
>> A reload starts a new set of workers while old workers wait for old
>> connections to shut down.
>> The only exception I'm aware of is having worker_shutdown_timeout
>> configured: in this case a worker will wait till this timeout and forcibly
>> close a connection. Be default there is no timeout.
>>
>> It would be curious to see error log of nginx at IP_1 (the reloaded one)
>> while the reload happens. It may explain the reason for connection resets.
>>
>> Kind regards,
>> Igor.
>>
>> On 25/03/2024 12:31, Sébastien Rebecchi wrote:
>>
>> Hello
>>
>>
>> I have an issue with nginx closing prematurely connections when reload
>> is performed.
>>
>>
>> I have some nginx servers configured to proxy_pass requests to an
>> upstream group. This group itself is composed of several servers which are
>> nginx themselves, and is configured to use keepalive connections.
>>
>> When I trigger a reload (-s reload) on an nginx of one of the servers
>> which is target of the upstream, I see in error logs of all servers in
>> front that connection was reset by the nginx which was reloaded.
>>
>>
>> Here configuration of upstream group (IPs are hidden replaced by IP_X):
>>
>> --- BEGIN ---
>>
>> upstream data_api {
>>
>> random;
>>
>>
>> server IP_1:80 max_fails=3 fail_timeout=30s;
>>
>> server IP_2:80 max_fails=3 fail_timeout=30s;
>>
>> server IP_3:80 max_fails=3 fail_timeout=30s;
>>
>> server IP_4:80 max_fails=3 fail_timeout=30s;
>>
>> server IP_5:80 max_fails=3 fail_timeout=30s;
>>
>> server IP_6:80 max_fails=3 fail_timeout=30s;
>>
>> server IP_7:80 max_fails=3 fail_timeout=30s;
>>
>> server IP_8:80 max_fails=3 fail_timeout=30s;
>>
>> server IP_9:80 max_fails=3 fail_timeout=30s;
>>
>> server IP_10:80 max_fails=3 fail_timeout=30s;
>>
>>
>> keepalive 20;
>>
>> }
>>
>> --- END ---
>>
>>
>> Here configuration of the location using this upstream:
>>
>> --- BEGIN ---
>>
>> location / {
>>
>> proxy_pass http://data_api;
>>
>>
>> proxy_http_version 1.1;
>>
>> proxy_set_header Connection "";
>>
>>
>> proxy_set_header Host $host;
>>
>> proxy_set_header X-Real-IP $real_ip;
>>
>> proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
>>
>>
>> proxy_connect_timeout 2s;
>>
>> proxy_send_timeout 6s;
>>
>> proxy_read_timeout 10s;
>>
>>
>> proxy_next_upstream error timeout http_502 http_504;
>>
>> }
>>
>> --- END ---
>>
>>
>> And here the kind of error messages I get when I reload nginx of "IP_1":
>>
>> --- BEGIN ---
>>
>> 2024/03/25 11:24:25 [error] 3758170#0: *1795895162 recv() failed (104:
>> Connection reset by peer) while reading response header from upstream,
>> client: CLIENT_IP_HIDDEN, server: SERVER_HIDDEN, request: "POST
>> /REQUEST_LOCATION_HIDDEN HTTP/2.0", upstream: "
>> http://IP_1:80/REQUEST_LOCATION_HIDDEN", host: "HOST_HIDDEN", referrer:
>> "REFERRER_HIDDEN"
>>
>> --- END ---
>>
>>
>> I thought -s reload was doing graceful shutdown of connections. Is it due
>> to the fact that nginx can not handle that when using keepalive
>> connections? Is it a bug?
>>
>> I am using nginx 1.24.0 everywhere, no particular
>>
>>
>> Thank you for any help.
>>
>>
>> Sébastien
>>
>> _______________________________________________
>> nginx mailing listnginx at nginx.orghttps://mailman.nginx.org/mailman/listinfo/nginx
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx/attachments/20240328/0befa522/attachment-0001.htm>
I
  • 28 Mar '24
Sébastien,

Is it possible that messages go to another log file? These messages go 
to the main error log file, defined in the root context.
Another common pitfall is a log level above notice. Try setting error 
log to a more verbose one, maybe?

Regards,
Igor.

On 28/03/2024 18:27, Sébastien Rebecchi wrote:
> Hi Igor,
>
> Thanks for the answer.
>
> I really got that message 'signal process started' every time i do 
> 'nginx -s reload' and this is the only log line I have, I don't have 
> the other lines you mentioned. Is there anything to do to enable those 
> logs?
>
> Sébastien
>
> Le jeu. 28 mars 2024, 16:40, Igor Ippolitov <iippolitov at nginx.com> a 
> écrit :
>
>     Sébastien,
>
>     The message about the signal process is only the beginning of the
>     process.
>     You are interested in messages like the following:
>
>>     2024/03/26 13:36:36 [notice] 723#723: signal 1 (SIGHUP) received
>>     from 69064, reconfiguring
>>     2024/03/26 13:36:36 [notice] 723#723: reconfiguring
>>     2024/03/26 13:36:36 [notice] 723#723: using the "epoll" event method
>>     2024/03/26 13:36:36 [notice] 723#723: start worker processes
>>     2024/03/26 13:36:36 [notice] 723#723: start worker process 69065
>>     2024/03/26 13:36:36 [notice] 723#723: start worker process 69066
>>     2024/03/26 13:36:36 [notice] 723#723: start cache manager process
>>     69067
>>     2024/03/26 13:36:36 [notice] 61903#61903: gracefully shutting down
>>     2024/03/26 13:36:36 [notice] 61905#61905: exiting
>>     2024/03/26 13:36:36 [notice] 61903#61903: exiting
>>     2024/03/26 13:36:36 [notice] 61904#61904: gracefully shutting down
>>     2024/03/26 13:36:36 [notice] 61904#61904: exiting
>>     2024/03/26 13:36:36 [notice] 61903#61903: exit
>
>     Note the 'gracefully shutting down' and 'exiting' message from
>     workers. Also the 'start' and 'reconfiguring' messages from the
>     master process.
>     There should be a similar sequence somewhere in your logs.
>     Having these logs may help explaining what happens on a reload.
>
>     Kind regards,
>     Igor.
>
>     On 26/03/2024 12:41, Sébastien Rebecchi wrote:
>>     Hi Igor
>>
>>     There is no special logs on the IP_1 (the reloaded one) side,
>>     only 1 log line, which is expected:
>>     --- BEGIN ---
>>     2024/03/26 13:37:55 [notice] 3928855#0: signal process started
>>     --- END ---
>>
>>     I did not configure worker_shutdown_timeout, it is unlimited.
>>
>>     Sébastien.
>>
>>     Le lun. 25 mars 2024 à 17:59, Igor Ippolitov
>>     <iippolitov at nginx.com> a écrit :
>>
>>         Sébastien,
>>
>>         Nginx should keep active connections open and wait for a
>>         request to complete before closing.
>>         A reload starts a new set of workers while old workers wait
>>         for old connections to shut down.
>>         The only exception I'm aware of is having
>>         worker_shutdown_timeout configured: in this case a worker
>>         will wait till this timeout and forcibly close a connection.
>>         Be default there is no timeout.
>>
>>         It would be curious to see error log of nginx at IP_1 (the
>>         reloaded one) while the reload happens. It may explain the
>>         reason for connection resets.
>>
>>         Kind regards,
>>         Igor.
>>
>>         On 25/03/2024 12:31, Sébastien Rebecchi wrote:
>>>
>>>         Hello
>>>
>>>
>>>         I have an issue with nginx closing prematurely connections
>>>         when reload is performed.
>>>
>>>
>>>         I have some nginx servers configured to proxy_pass requests
>>>         to an upstream group. This group itself is composed of
>>>         several servers which are nginx themselves, and is
>>>         configured to use keepalive connections.
>>>
>>>         When I trigger a reload (-s reload) on an nginx of one of
>>>         the servers which is target of the upstream, I see in error
>>>         logs of all servers in front that connection was reset by
>>>         the nginx which was reloaded.
>>>
>>>
>>>         Here configuration of upstream group (IPs are hidden
>>>         replaced by IP_X):
>>>
>>>         --- BEGIN ---
>>>
>>>         upstream data_api {
>>>
>>>         random;
>>>
>>>
>>>         server IP_1:80 max_fails=3 fail_timeout=30s;
>>>
>>>         server IP_2:80 max_fails=3 fail_timeout=30s;
>>>
>>>         server IP_3:80 max_fails=3 fail_timeout=30s;
>>>
>>>         server IP_4:80 max_fails=3 fail_timeout=30s;
>>>
>>>         server IP_5:80 max_fails=3 fail_timeout=30s;
>>>
>>>         server IP_6:80 max_fails=3 fail_timeout=30s;
>>>
>>>         server IP_7:80 max_fails=3 fail_timeout=30s;
>>>
>>>         server IP_8:80 max_fails=3 fail_timeout=30s;
>>>
>>>         server IP_9:80 max_fails=3 fail_timeout=30s;
>>>
>>>         server IP_10:80 max_fails=3 fail_timeout=30s;
>>>
>>>
>>>         keepalive 20;
>>>
>>>         }
>>>
>>>         --- END ---
>>>
>>>
>>>         Here configuration of the location using this upstream:
>>>
>>>         --- BEGIN ---
>>>
>>>         location / {
>>>
>>>         proxy_pass http://data_api;
>>>
>>>
>>>         proxy_http_version 1.1;
>>>
>>>         proxy_set_header Connection "";
>>>
>>>
>>>         proxy_set_header Host $host;
>>>
>>>         proxy_set_header X-Real-IP $real_ip;
>>>
>>>         proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
>>>
>>>
>>>         proxy_connect_timeout 2s;
>>>
>>>         proxy_send_timeout 6s;
>>>
>>>         proxy_read_timeout 10s;
>>>
>>>
>>>         proxy_next_upstream error timeout http_502 http_504;
>>>
>>>         }
>>>
>>>         --- END ---
>>>
>>>
>>>         And here the kind of error messages I get when I reload
>>>         nginx of "IP_1":
>>>
>>>         --- BEGIN ---
>>>
>>>         2024/03/25 11:24:25 [error] 3758170#0: *1795895162 recv()
>>>         failed (104: Connection reset by peer) while reading
>>>         response header from upstream, client: CLIENT_IP_HIDDEN,
>>>         server: SERVER_HIDDEN, request: "POST
>>>         /REQUEST_LOCATION_HIDDEN HTTP/2.0", upstream:
>>>         "http://IP_1:80/REQUEST_LOCATION_HIDDEN
>>>         <http://IP_1:80/REQUEST_LOCATION_HIDDEN>", host:
>>>         "HOST_HIDDEN", referrer: "REFERRER_HIDDEN"
>>>
>>>         --- END ---
>>>
>>>
>>>         I thought -s reload was doing graceful shutdown of
>>>         connections. Is it due to the fact that nginx can not handle
>>>         that when using keepalive connections? Is it a bug?
>>>
>>>         I am using nginx 1.24.0 everywhere, no particular
>>>
>>>
>>>         Thank you for any help.
>>>
>>>
>>>         Sébastien
>>>
>>>
>>>         _______________________________________________
>>>         nginx mailing list
>>>         nginx at nginx.org
>>>         https://mailman.nginx.org/mailman/listinfo/nginx
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx/attachments/20240328/8c931211/attachment.htm>
S
  • 29 Mar '24
Hi Igor,

I did not have error_log directive at main context, so it took default
conf, which seems why i got only 1 log file. I added directive and now I
have more logs when I do nginx -s reload:
2024/03/29 09:04:20 [notice] 1064394#0: signal process started
2024/03/29 09:04:20 [notice] 3718160#0: signal 1 (SIGHUP) received from
1064394, reconfiguring
2024/03/29 09:04:20 [notice] 3718160#0: reconfiguring
2024/03/29 09:04:20 [notice] 3718160#0: using the "epoll" event method
2024/03/29 09:04:20 [notice] 3718160#0: start worker processes
2024/03/29 09:04:20 [notice] 3718160#0: start worker process 1064395
2024/03/29 09:04:20 [notice] 3718160#0: start worker process 1064396
2024/03/29 09:04:20 [notice] 3718160#0: start worker process 1064397
2024/03/29 09:04:20 [notice] 3718160#0: start worker process 1064398
2024/03/29 09:04:20 [notice] 3718160#0: start worker process 1064399
2024/03/29 09:04:20 [notice] 3718160#0: start worker process 1064400
2024/03/29 09:04:20 [notice] 3718160#0: start worker process 1064401
2024/03/29 09:04:20 [notice] 3718160#0: start worker process 1064402
2024/03/29 09:04:20 [notice] 3718160#0: start worker process 1064403
2024/03/29 09:04:20 [notice] 3718160#0: start worker process 1064404
2024/03/29 09:04:20 [notice] 3718160#0: start worker process 1064405
2024/03/29 09:04:20 [notice] 3718160#0: start worker process 1064406
2024/03/29 09:04:20 [notice] 1063598#0: gracefully shutting down
2024/03/29 09:04:20 [notice] 1063599#0: gracefully shutting down
2024/03/29 09:04:20 [notice] 1063600#0: gracefully shutting down
2024/03/29 09:04:20 [notice] 1063601#0: gracefully shutting down
2024/03/29 09:04:20 [notice] 1063602#0: gracefully shutting down
2024/03/29 09:04:20 [notice] 1063603#0: gracefully shutting down
2024/03/29 09:04:20 [notice] 1063604#0: gracefully shutting down
2024/03/29 09:04:20 [notice] 1063607#0: gracefully shutting down
2024/03/29 09:04:20 [notice] 1063608#0: gracefully shutting down
2024/03/29 09:04:20 [notice] 1063597#0: gracefully shutting down
2024/03/29 09:04:20 [notice] 1063605#0: gracefully shutting down
2024/03/29 09:04:20 [notice] 1063609#0: gracefully shutting down
2024/03/29 09:04:23 [notice] 3718160#0: signal 17 (SIGCHLD) received from
3989432
2024/03/29 09:04:23 [notice] 3718160#0: worker process 3989432 exited with
code 0
2024/03/29 09:04:23 [notice] 3718160#0: signal 29 (SIGIO) received
2024/03/29 09:04:26 [notice] 1060347#0: exiting
2024/03/29 09:04:26 [notice] 1060347#0: exit
2024/03/29 09:04:26 [notice] 3718160#0: signal 17 (SIGCHLD) received from
1060347
2024/03/29 09:04:26 [notice] 3718160#0: worker process 1060347 exited with
code 0
2024/03/29 09:04:26 [notice] 3718160#0: signal 29 (SIGIO) received
2024/03/29 09:04:29 [notice] 3718160#0: signal 17 (SIGCHLD) received from
3989423
2024/03/29 09:04:29 [notice] 3718160#0: worker process 3989423 exited with
code 0
2024/03/29 09:04:29 [notice] 3718160#0: signal 29 (SIGIO) received
... etc ...

Could the pb I encounter be linked to that discussion?
https://mailman.nginx.org/pipermail/nginx-devel/2024-January/YSJATQMPXDIBETCDS46OTKUZNOJK6Q22.html
Seems a race condition between a client that have started to send a new
request at the same time that the server has decided to close the idle
connection. Is there a plan to add a bugfix in nginx to handle that
properly?

Thanks,

Sébastien

Le ven. 29 mars 2024 à 00:04, Igor Ippolitov <iippolitov at nginx.com> a
écrit :

> Sébastien,
>
> Is it possible that messages go to another log file? These messages go to
> the main error log file, defined in the root context.
> Another common pitfall is a log level above notice. Try setting error log
> to a more verbose one, maybe?
>
> Regards,
> Igor.
>
>
> On 28/03/2024 18:27, Sébastien Rebecchi wrote:
>
> Hi Igor,
>
> Thanks for the answer.
>
> I really got that message 'signal process started' every time i do 'nginx
> -s reload' and this is the only log line I have, I don't have the other
> lines you mentioned. Is there anything to do to enable those logs?
>
> Sébastien
>
> Le jeu. 28 mars 2024, 16:40, Igor Ippolitov <iippolitov at nginx.com> a
> écrit :
>
>> Sébastien,
>>
>> The message about the signal process is only the beginning of the process.
>> You are interested in messages like the following:
>>
>> 2024/03/26 13:36:36 [notice] 723#723: signal 1 (SIGHUP) received from
>> 69064, reconfiguring
>> 2024/03/26 13:36:36 [notice] 723#723: reconfiguring
>> 2024/03/26 13:36:36 [notice] 723#723: using the "epoll" event method
>> 2024/03/26 13:36:36 [notice] 723#723: start worker processes
>> 2024/03/26 13:36:36 [notice] 723#723: start worker process 69065
>> 2024/03/26 13:36:36 [notice] 723#723: start worker process 69066
>> 2024/03/26 13:36:36 [notice] 723#723: start cache manager process 69067
>> 2024/03/26 13:36:36 [notice] 61903#61903: gracefully shutting down
>> 2024/03/26 13:36:36 [notice] 61905#61905: exiting
>> 2024/03/26 13:36:36 [notice] 61903#61903: exiting
>> 2024/03/26 13:36:36 [notice] 61904#61904: gracefully shutting down
>> 2024/03/26 13:36:36 [notice] 61904#61904: exiting
>> 2024/03/26 13:36:36 [notice] 61903#61903: exit
>>
>>
>> Note the 'gracefully shutting down' and 'exiting' message from workers.
>> Also the 'start' and 'reconfiguring' messages from the master process.
>> There should be a similar sequence somewhere in your logs.
>> Having these logs may help explaining what happens on a reload.
>>
>> Kind regards,
>> Igor.
>>
>> On 26/03/2024 12:41, Sébastien Rebecchi wrote:
>>
>> Hi Igor
>>
>> There is no special logs on the IP_1 (the reloaded one) side, only 1 log
>> line, which is expected:
>> --- BEGIN ---
>> 2024/03/26 13:37:55 [notice] 3928855#0: signal process started
>> --- END ---
>>
>> I did not configure worker_shutdown_timeout, it is unlimited.
>>
>> Sébastien.
>>
>> Le lun. 25 mars 2024 à 17:59, Igor Ippolitov <iippolitov at nginx.com> a
>> écrit :
>>
>>> Sébastien,
>>>
>>> Nginx should keep active connections open and wait for a request to
>>> complete before closing.
>>> A reload starts a new set of workers while old workers wait for old
>>> connections to shut down.
>>> The only exception I'm aware of is having worker_shutdown_timeout
>>> configured: in this case a worker will wait till this timeout and forcibly
>>> close a connection. Be default there is no timeout.
>>>
>>> It would be curious to see error log of nginx at IP_1 (the reloaded one)
>>> while the reload happens. It may explain the reason for connection resets.
>>>
>>> Kind regards,
>>> Igor.
>>>
>>> On 25/03/2024 12:31, Sébastien Rebecchi wrote:
>>>
>>> Hello
>>>
>>>
>>> I have an issue with nginx closing prematurely connections when reload
>>> is performed.
>>>
>>>
>>> I have some nginx servers configured to proxy_pass requests to an
>>> upstream group. This group itself is composed of several servers which are
>>> nginx themselves, and is configured to use keepalive connections.
>>>
>>> When I trigger a reload (-s reload) on an nginx of one of the servers
>>> which is target of the upstream, I see in error logs of all servers in
>>> front that connection was reset by the nginx which was reloaded.
>>>
>>>
>>> Here configuration of upstream group (IPs are hidden replaced by IP_X):
>>>
>>> --- BEGIN ---
>>>
>>> upstream data_api {
>>>
>>> random;
>>>
>>>
>>> server IP_1:80 max_fails=3 fail_timeout=30s;
>>>
>>> server IP_2:80 max_fails=3 fail_timeout=30s;
>>>
>>> server IP_3:80 max_fails=3 fail_timeout=30s;
>>>
>>> server IP_4:80 max_fails=3 fail_timeout=30s;
>>>
>>> server IP_5:80 max_fails=3 fail_timeout=30s;
>>>
>>> server IP_6:80 max_fails=3 fail_timeout=30s;
>>>
>>> server IP_7:80 max_fails=3 fail_timeout=30s;
>>>
>>> server IP_8:80 max_fails=3 fail_timeout=30s;
>>>
>>> server IP_9:80 max_fails=3 fail_timeout=30s;
>>>
>>> server IP_10:80 max_fails=3 fail_timeout=30s;
>>>
>>>
>>> keepalive 20;
>>>
>>> }
>>>
>>> --- END ---
>>>
>>>
>>> Here configuration of the location using this upstream:
>>>
>>> --- BEGIN ---
>>>
>>> location / {
>>>
>>> proxy_pass http://data_api;
>>>
>>>
>>> proxy_http_version 1.1;
>>>
>>> proxy_set_header Connection "";
>>>
>>>
>>> proxy_set_header Host $host;
>>>
>>> proxy_set_header X-Real-IP $real_ip;
>>>
>>> proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
>>>
>>>
>>> proxy_connect_timeout 2s;
>>>
>>> proxy_send_timeout 6s;
>>>
>>> proxy_read_timeout 10s;
>>>
>>>
>>> proxy_next_upstream error timeout http_502 http_504;
>>>
>>> }
>>>
>>> --- END ---
>>>
>>>
>>> And here the kind of error messages I get when I reload nginx of "IP_1":
>>>
>>> --- BEGIN ---
>>>
>>> 2024/03/25 11:24:25 [error] 3758170#0: *1795895162 recv() failed (104:
>>> Connection reset by peer) while reading response header from upstream,
>>> client: CLIENT_IP_HIDDEN, server: SERVER_HIDDEN, request: "POST
>>> /REQUEST_LOCATION_HIDDEN HTTP/2.0", upstream: "
>>> http://IP_1:80/REQUEST_LOCATION_HIDDEN", host: "HOST_HIDDEN", referrer:
>>> "REFERRER_HIDDEN"
>>>
>>> --- END ---
>>>
>>>
>>> I thought -s reload was doing graceful shutdown of connections. Is it
>>> due to the fact that nginx can not handle that when using keepalive
>>> connections? Is it a bug?
>>>
>>> I am using nginx 1.24.0 everywhere, no particular
>>>
>>>
>>> Thank you for any help.
>>>
>>>
>>> Sébastien
>>>
>>> _______________________________________________
>>> nginx mailing listnginx at nginx.orghttps://mailman.nginx.org/mailman/listinfo/nginx
>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx/attachments/20240329/665d2a7d/attachment-0001.htm>