nginx serving wrong proxy content, static assets not affected

3 Jan '23

I have several servers hosting multiple Rails sites, using nginx as a reverse proxy. All sites have unique host names and, at least at first, nginx returns the content for each site correctly (dynamic content from Rails as well as static assets such as images, javascript and CSS).

Both Rails and nginx are running in Docker containers. I am using nginx 1.23.1 running in a Docker image built from the official Debian Docker image (I only added certbot for TLS certificate processing). nginx connects to each proxy via HTTP using the internal service name defined in the Docker network by each docker-compose.yml file.

For some reason, some time after nginx starts nginx gets confused and begins serving content from the wrong proxy. For instance, requesting a page from aaa.com<http://aaa.com> returns the Rails content for bbb.com<http://bbb.com>; requesting bbb.com<http://bbb.com> returns content from ccc.com<http://ccc.com>; and requesting ccc.com<http://ccc.com> returns content from aaa.com<http://aaa.com>. This problem only affects the proxy content; the static assets for aaa.com<http://aaa.com> are returned as expected (and so on for all the other sites).

This is not a problem of nginx simply returning content from the wrong site. If you request a page from aaa.com<http://aaa.com>, since the dynamic content (HTML markup) comes from bbb.com<http://bbb.com>, the markup will contain references to assets from bbb.com<http://bbb.com>, but since the URLs are relative they will be requested from aaa.com<http://aaa.com>; those assets return 404 Not found because they do not exist in aaa.com<http://aaa.com>. If you change the URL manually and request them from bbb.com<http://bbb.com>, they are returned with no problem, so it’s not that nginx can’t resolve the host name, just that the proxy content is being routed incorrectly.

Also, I don’t believe this is a problem with Docker. When nginx gets confused, I can run a shell in any Docker container and connect directly to Rails (by running a cURL command pointing to the proxy URL configured for each site in nginx), and I get the correct content every time. It is only when I request it through nginx that the content comes from a different site than the one requested.

I have no idea what triggers this behavior. Once it happens, the only thing that can be done to correct it is to restart nginx. After that (could be minutes, hours, or days), the server will function as expected once again. Since I am using this setup in several production servers, at first I created a cron job to restart nginx every day, then every hour, and finally I decided to poll the sites on each server every five minutes, so that if the responses don’t look right I can restart nginx without having users experience a lengthy interruption.

I have confirmed that this problem occurs on more than one server (although on one server I have only observed it once). I have also set up a staging server that is as close to one of the production servers as possible, but so far the problem has not occurred there (since this staging server does not get any traffic, the problem may never surface if it is triggered by a particular kind of incoming request).

I have asked a question at Server Fault (https://serverfault.com/questions/1117412/nginx-serving-content-from-wrong-proxy), where I have posted sanitized versions of the nginx configuration for two sample sites. I can post here the same or any other configuration that might help diagnose this. I have never seen nginx behave like this before (I have used it to host multiple Rails sites for years without any problems—only not in combination with Docker).

Any suggestions as to what to look for or what to try would be most appreciated.

—
Eduardo Kortright
EWTN Online Services
(205) 271-2900
(205) 332-4835 (cell)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx/attachments/20230103/b180b942/attachment.htm>

Payam

4 Jan '23

On Wed, Jan 4, 2023 at 12:33 PM Eduardo Kortright <ekortright at ewtn.com>
wrote:

> I have several servers hosting multiple Rails sites, using nginx as a
> reverse proxy. All sites have unique host names and, at least at first,
> nginx returns the content for each site correctly (dynamic content from
> Rails as well as static assets such as images, javascript and CSS).
>
> Both Rails and nginx are running in Docker containers. I am using nginx
> 1.23.1 running in a Docker image built from the official Debian Docker
> image (I only added certbot for TLS certificate processing). nginx connects
> to each proxy via HTTP using the internal service name defined in the
> Docker network by each docker-compose.yml file.
>
> For some reason, some time after nginx starts nginx gets confused and
> begins serving content from the wrong proxy. For instance, requesting a
> page from aaa.com returns the Rails content for bbb.com; requesting
> bbb.com returns content from ccc.com; and requesting ccc.com returns
> content from aaa.com. This problem only affects the proxy content; the
> static assets for aaa.com are returned as expected (and so on for all the
> other sites).
>
> This is not a problem of nginx simply returning content from the wrong
> site. If you request a page from aaa.com, since the dynamic content (HTML
> markup) comes from bbb.com, the markup will contain references to assets
> from bbb.com, but since the URLs are relative they will be requested from
> aaa.com; those assets return 404 Not found because they do not exist in
> aaa.com. If you change the URL manually and request them from bbb.com,
> they are returned with no problem, so it’s not that nginx can’t resolve the
> host name, just that the proxy content is being routed incorrectly.
>
> Also, I don’t believe this is a problem with Docker. When nginx gets
> confused, I can run a shell in any Docker container and connect directly to
> Rails (by running a cURL command pointing to the proxy URL configured for
> each site in nginx), and I get the correct content every time. It is only
> when I request it through nginx that the content comes from a different
> site than the one requested.
>
> I have no idea what triggers this behavior. Once it happens, the only
> thing that can be done to correct it is to restart nginx. After that (could
> be minutes, hours, or days), the server will function as expected once
> again. Since I am using this setup in several production servers, at first
> I created a cron job to restart nginx every day, then every hour, and
> finally I decided to poll the sites on each server every five minutes, so
> that if the responses don’t look right I can restart nginx without having
> users experience a lengthy interruption.
>
> I have confirmed that this problem occurs on more than one server
> (although on one server I have only observed it once). I have also set up a
> staging server that is as close to one of the production servers as
> possible, but so far the problem has not occurred there (since this staging
> server does not get any traffic, the problem may never surface if it is
> triggered by a particular kind of incoming request).
>
> I have asked a question at Server Fault (
> https://serverfault.com/questions/1117412/nginx-serving-content-from-wrong-proxy),
> where I have posted sanitized versions of the nginx configuration for two
> sample sites. I can post here the same or any other configuration that
> might help diagnose this. I have never seen nginx behave like this before
> (I have used it to host multiple Rails sites for years without any
> problems—only not in combination with Docker).
>
> Any suggestions as to what to look for or what to try would be most
> appreciated.
>
> —
> Eduardo Kortright
> EWTN Online Services
> (205) 271-2900
> (205) 332-4835 (cell)
> _______________________________________________
> nginx mailing list
> nginx at nginx.org
> https://mailman.nginx.org/mailman/listinfo/nginx
>
Sounds like you are either having hash collisions or incorrect dns
resolution.

- Are you caching? Set a larger key size
- Have you looked at dns and host resolution for the impacted requests?

—
Payam

-- 
Payam Tarverdyan Chychi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx/attachments/20230104/c2ec4847/attachment-0001.htm>

Ángel

6 Jan '23

On 2023-01-03 at 20:54 +0000, Eduardo Kortright wrote:
> I have no idea what triggers this behavior. Once it happens, the only
> thing that can be done to correct it is to restart nginx. After that
> (could be minutes, hours, or days), the server will function as
> expected once again. Since I am using this setup in several
> production servers, at first I created a cron job to restart nginx
> every day, then every hour, and finally I decided to poll the sites
> on each server every five minutes, so that if the responses don’t
> look right I can restart nginx without having users experience a
> lengthy interruption.

Did you try reload instead of a restart? That's usually enough for
getting nginx update the sources, and is transparent to your users.

As for the actual problem, as I understand you have 4 docker
containers:
- aaa.com (Rails app)
- bbb.com (Rails app)
- ccc.com (Rails app)
- proxy (nginx, with the static assets for the 3 sites)

Do the ip addresses for the rails sites change over time?
Mind that nginx will query the hostname only once (at startup/reload),
*and use that same ip forever*
If the other containers switched ips, that would produce the exact
behavior that you are seeing.

You can force nginx to requery dns by using a variable
see https://forum.nginx.org/read.php?2,215830,215832#msg-215832

Eduardo

6 Jan '23

I'll bet that's it!  There is nothing in my configuration that makes the IP addresses of the containers in the Docker network stay fixed.  I would not be surprised if, when two or more containers are restarted (as they probably are every once in a while when logrotate runs), some or all of them may exchange IP addresses.

I will try to duplicate this so I can post the results here, but in any case I will find out how to assign specific IP addresses to the containers in the Docker configuration and do that from now on.  Your observation that nginx looks up the IP once and assumes it will not change would explain what is going on.

I can't thank you enough, as this was driving me crazy.

Thank you also for your other very helpful suggestions (reloading nginx instead of restarting, forcing DNS lookups).

________________________________

> I have no idea what triggers this behavior. Once it happens, the only
> thing that can be done to correct it is to restart nginx. After that
> (could be minutes, hours, or days), the server will function as
> expected once again. Since I am using this setup in several
> production servers, at first I created a cron job to restart nginx
> every day, then every hour, and finally I decided to poll the sites
> on each server every five minutes, so that if the responses don’t
> look right I can restart nginx without having users experience a
> lengthy interruption.

Did you try reload instead of a restart? That's usually enough for
getting nginx update the sources, and is transparent to your users.

As for the actual problem, as I understand you have 4 docker
containers:
- aaa.com (Rails app)
- bbb.com (Rails app)
- ccc.com (Rails app)
- proxy (nginx, with the static assets for the 3 sites)

Do the ip addresses for the rails sites change over time?
Mind that nginx will query the hostname only once (at startup/reload),
*and use that same ip forever*
If the other containers switched ips, that would produce the exact
behavior that you are seeing.

You can force nginx to requery dns by using a variable
see https://forum.nginx.org/read.php?2,215830,215832#msg-215832

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx/attachments/20230106/59fdc574/attachment.htm>

Eduardo

6 Jan '23

On my staging server, I stopped container aaa (with IP address x.x.x.5); I then restarted container bbb (with IP address x.x.x.6); finally, I started aaa again.  As expected, when bbb restarted it claimed aaa’s old IP address, since it was the lowest available address.  When aaa started back up, it took bbb’s old IP, so they ended up swapping IP addresses, but nginx thinks they’re at their original locations.

Once again, thank you for your help with this.  As I mentioned, I’m probably just going to make Docker assign fixed addresses to each container so that nginx can look up the names once.

If you are interested in that sort of thing, please leave an answer at https://serverfault.com/questions/1117412/nginx-serving-content-from-wrong-proxy and I’ll be happy to mark it correct.

I'll bet that's it!  There is nothing in my configuration that makes the IP addresses of the containers in the Docker network stay fixed.  I would not be surprised if, when two or more containers are restarted (as they probably are every once in a while when logrotate runs), some or all of them may exchange IP addresses.
I will try to duplicate this so I can post the results here, but in any case I will find out how to assign specific IP addresses to the containers in the Docker configuration and do that from now on.  Your observation that nginx looks up the IP once and assumes it will not change would explain what is going on.
I can't thank you enough, as this was driving me crazy.
Thank you also for your other very helpful suggestions (reloading nginx instead of restarting, forcing DNS lookups).
Do the ip addresses for the rails sites change over time?Mind that nginx will query the hostname only once (at startup/reload),*and use that same ip forever*If the other containers switched ips, that would produce the exactbehavior that you are seeing.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx/attachments/20230106/cad1a449/attachment.htm>

Eduardo

6 Jan '23

Hi Payam,

I’m not doing any caching.  It looks like it is indeed a DNS problem, as the Docker containers are occasionally changing their IP address as the containers are restarted, but as Ángel pointed out, nginx does not resolve the name at each request, but only when it loads the configuration initially.  Thank you for your help.

Sounds like you are either having hash collisions or incorrect dns resolution.

- Are you caching? Set a larger key size
- Have you looked at dns and host resolution for the impacted requests?

—
Payam

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx/attachments/20230106/d0889f6d/attachment-0001.htm>

Page 1 of 1