Question about proxy

S
  • 29 Jan '23
In my website, I proxied
https://perplexity.ai
trough a domain of mine
but when I get redirected, I see on top, on the domain line, not my own line.
In other cases, I see my own domain line.
What causes each case, i.e., what do I need to do so always the
https://domain.com is NOT the original domain being proxied, but my
own domain (https://disney.ibm.com).

in this case, this is the example:

server {
default_type  application/octet-stream;
set $template_root /usr/local/openresty/nginx/html/templates;
listen 0.0.0:443 ssl;
# reuseport;
error_log logs/error.log warn;
access_log logs/access.log;
server_name  disney.ibm.com;
ssl_certificate /etc/letsencrypt/live/disney.ibm.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/disney.ibm.com/privkey.pem;
location / {
proxy_cookie_domain https://perplexity.ai https://disney.ibm.com;
proxy_buffering on;
resolver 127.0.0.1 ipv6=off;
proxy_http_version 1.1;
proxy_buffer_size  128k;
proxy_busy_buffers_size  256k;
proxy_buffers 4 256k;
proxy_set_header User-Agent $http_user_agent;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto   $scheme;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $http_connection;
proxy_set_header Accept-Encoding "";
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Scheme $scheme;
proxy_ssl_server_name on;
proxy_ssl_name $proxy_host;
proxy_set_header Host perplexity.ai;
proxy_pass https://perplexity.ai;
proxy_redirect https://perplexity.ai https://disney.ibm.com;
subs_filter_types text/css text/javascript application/javascript;
subs_filter "https://cdn*.perplexity.ai/(.*)"
"https://disney.ibm.com/cdn*/$1" gi
subs_filter "https://perplexity.ai/(.*)" "https://disney.ibm.com/$1" gi;
subs_filter "https://(.*).perplexity.ai/(.*)" "https://disney.ibm.com/$1/$2" gi;
subs_filter "https://www.perplexity.ai" "https://disney.ibm.com" gi;
subs_filter "https://perplexity.ai" "https://disney.ibm.com" gi;
subs_filter "perplexity.ai" "disney.ibm.com" gi;
            }
    }
F
  • 31 Jan '23
On Sun, Jan 29, 2023 at 03:17:15PM -0500, Saint Michael wrote:

Hi there,

> What causes each case, i.e., what do I need to do so always the
> https://domain.com is NOT the original domain being proxied, but my
> own domain (https://disney.ibm.com).

You seem to be using the module at
https://github.com/yaoweibin/ngx_http_substitutions_filter_module.

You probably want subs_filter_types to include text/html, and you probably
want "r" on the subs_filter patterns that are regular expressions rather
than fixed strings.

Generally, you proxy_pass to a server you control, so it may be easier
to adjust the upstream so that subs_filter is not needed. But basically:
you want any string in the response that the browser will interpret as
a url, to be on your server not on the upstream one.

So in this case, you can test the output of things like "curl -i
https://disney.ibm.com/something", and see that it does not contain any
unexpected mention of perplexity.ai.

> subs_filter_types text/css text/javascript application/javascript;
> subs_filter "https://cdn*.perplexity.ai/(.*)"
> "https://disney.ibm.com/cdn*/$1" gi
> subs_filter "https://perplexity.ai/(.*)" "https://disney.ibm.com/$1" gi;
> subs_filter "https://(.*).perplexity.ai/(.*)" "https://disney.ibm.com/$1/$2" gi;
> subs_filter "https://www.perplexity.ai" "https://disney.ibm.com" gi;
> subs_filter "https://perplexity.ai" "https://disney.ibm.com" gi;
> subs_filter "perplexity.ai" "disney.ibm.com" gi;

If you do see an unexpected mention, you can try to see why it is there
-- especially the first subs_filter above, I'm not certain what it
is trying to do; and the second one probably does not need the regex
parts at all -- the fifth and sixth ones probably both do the same
thing as it. The third and fourth seem to have different ideas of how
"https://www.perplexity.ai/something" should be substituted; maybe you
have a test case which shows why both are needed.

Good luck with it,

    f
-- 
Francis Daly        francis at daoine.org
S
  • 31 Jan '23
Can you please elaborate on this:
"You probably want subs_filter_types to include text/html, and you probably
want "r" on the subs_filter patterns that are regular expressions rather
than fixed strings"
one example will suffice.

On Mon, Jan 30, 2023 at 8:20 PM Francis Daly <francis at daoine.org> wrote:
>
> On Sun, Jan 29, 2023 at 03:17:15PM -0500, Saint Michael wrote:
>
> Hi there,
>
> > What causes each case, i.e., what do I need to do so always the
> > https://domain.com is NOT the original domain being proxied, but my
> > own domain (https://disney.ibm.com).
>
> You seem to be using the module at
> https://github.com/yaoweibin/ngx_http_substitutions_filter_module.
>
> You probably want subs_filter_types to include text/html, and you probably
> want "r" on the subs_filter patterns that are regular expressions rather
> than fixed strings.
>
> Generally, you proxy_pass to a server you control, so it may be easier
> to adjust the upstream so that subs_filter is not needed. But basically:
> you want any string in the response that the browser will interpret as
> a url, to be on your server not on the upstream one.
>
> So in this case, you can test the output of things like "curl -i
> https://disney.ibm.com/something", and see that it does not contain any
> unexpected mention of perplexity.ai.
>
> > subs_filter_types text/css text/javascript application/javascript;
> > subs_filter "https://cdn*.perplexity.ai/(.*)"
> > "https://disney.ibm.com/cdn*/$1" gi
> > subs_filter "https://perplexity.ai/(.*)" "https://disney.ibm.com/$1" gi;
> > subs_filter "https://(.*).perplexity.ai/(.*)" "https://disney.ibm.com/$1/$2" gi;
> > subs_filter "https://www.perplexity.ai" "https://disney.ibm.com" gi;
> > subs_filter "https://perplexity.ai" "https://disney.ibm.com" gi;
> > subs_filter "perplexity.ai" "disney.ibm.com" gi;
>
> If you do see an unexpected mention, you can try to see why it is there
> -- especially the first subs_filter above, I'm not certain what it
> is trying to do; and the second one probably does not need the regex
> parts at all -- the fifth and sixth ones probably both do the same
> thing as it. The third and fourth seem to have different ideas of how
> "https://www.perplexity.ai/something" should be substituted; maybe you
> have a test case which shows why both are needed.
>
> Good luck with it,
>
>         f
> --
> Francis Daly        francis at daoine.org
> _______________________________________________
> nginx mailing list
> nginx at nginx.org
> https://mailman.nginx.org/mailman/listinfo/nginx
F
  • 31 Jan '23
On Mon, Jan 30, 2023 at 10:39:52PM -0500, Saint Michael wrote:

Hi there,

> Can you please elaborate on this:
> "You probably want subs_filter_types to include text/html, and you probably
> want "r" on the subs_filter patterns that are regular expressions rather
> than fixed strings"
> one example will suffice.

https://github.com/yaoweibin/ngx_http_substitutions_filter_module includes:

"""
  Example
    location / {

        subs_filter_types text/html text/css text/xml;
        subs_filter st(\d*).example.com $1.example.com ir;
        subs_filter a.example.com s.example.com;
        subs_filter http://$host https://$host;
    }
"""

along with explanations of each directive.

(If that's *not* the module that you are using, then the documentation
for your module should show something similar.)

Although I do see that some later text suggests that text/html content is
always searched, so maybe being explicit about that in subs_filter_types
is not necessary.

Cheers,

    f
-- 
Francis Daly        francis at daoine.org