Regex match the middle of a URL and also the ending?

J
  • 3 Jun '23
Hi all,

I have an app on a domain that is set by a developer to proxy at certain 
URLs:

|example.com/browser/123foo0/stuff.js |

for example, where |123foo0| is some random key. The key may also change 
length in future.

That’s all fine.

But I’d like to interrupt specific requests and not proxy them: I don’t 
want to serve anything after the key that is in the path |/welcome| for 
example, i.e. not proxy any of these:

|example.com/browser/123foo0/welcome/welcome.html 
example.com/browser/foo456b/welcome/welcome.css 
example.com/browser/bar123f/welcome/welcome.js 
example.com/browser/456foob/welcome/other.stuff 
example.com/browser/foo789b/welcome/ |

So I tried simple stuff first like: |location ^~ 
/browser/.*/welcome/welcome.html {...|
but couldn’t even get that working, before moving on to try capturing 
groups like css files and scripts and so on.

I also tried putting regex in quotes, but that didn’t seem to work either.

What am I doing wrong?

Here’s a truncated version of the conf, with the location blocks only:

|location ^~ "/browser/.*/welcome/welcome.html" { return 200 'Not 
proxied.\n'; add_header Content-Type text/plain; } location ^~ /browser 
{ proxy_pass http://127.0.0.1:1234; proxy_set_header Host $http_host; } 
# landing page location / { root /var/www/foobar; index index.html; 
try_files $uri $uri/ /index.html; } |

Thanks,
Jore

​

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx/attachments/20230604/c659abb5/attachment.htm>
M
  • 3 Jun '23
Hello!

On Sun, Jun 04, 2023 at 12:26:55AM +1000, Jore wrote:

> Hi all,
> 
> I have an app on a domain that is set by a developer to proxy at certain 
> URLs:
> 
> |example.com/browser/123foo0/stuff.js |
> 
> for example, where |123foo0| is some random key. The key may also change 
> length in future.
> 
> That’s all fine.
> 
> But I’d like to interrupt specific requests and not proxy them: I don’t 
> want to serve anything after the key that is in the path |/welcome| for 
> example, i.e. not proxy any of these:
> 
> |example.com/browser/123foo0/welcome/welcome.html 
> example.com/browser/foo456b/welcome/welcome.css 
> example.com/browser/bar123f/welcome/welcome.js 
> example.com/browser/456foob/welcome/other.stuff 
> example.com/browser/foo789b/welcome/ |
> 
> So I tried simple stuff first like: |location ^~ 
> /browser/.*/welcome/welcome.html {...|
> but couldn’t even get that working, before moving on to try capturing 
> groups like css files and scripts and so on.
> 
> I also tried putting regex in quotes, but that didn’t seem to work either.
> 
> What am I doing wrong?
> 
> Here’s a truncated version of the conf, with the location blocks only:
> 
> |location ^~ "/browser/.*/welcome/welcome.html" { return 200 'Not 
> proxied.\n'; add_header Content-Type text/plain; } location ^~ /browser 
> { proxy_pass http://127.0.0.1:1234; proxy_set_header Host $http_host; } 
> # landing page location / { root /var/www/foobar; index index.html; 
> try_files $uri $uri/ /index.html; } |

The "^~" location modifier is for prefix-match locations to 
prevent further checking of regular expressions, see 
http://nginx.org/r/location for details.  If you want to use a 
regular expression, you have to use the "~" modifier instead.

That is, proper configuration will look like:

location ~ ^/browser/.*/welcome/welcome.html$ {
    # URI matches given regular expression
    ...
}

location /browser/ {
    # URI starts with /browser/
    ...
}

location / {
    # anything else
    ...
}

Hope this helps.

-- 
Maxim Dounin
http://mdounin.ru/
J
  • 3 Jun '23
Hi there,

Thanks for getting back.

On 4/6/23 3:16 am, Maxim Dounin wrote:

> Hello!

[…]

> The "^~" location modifier is for prefix-match locations to prevent 
> further checking of regular expressions, see 
> http://nginx.org/r/location for details. If you want to use a regular 
> expression, you have to use the "~" modifier instead.

Thank you for that. Apologies, I should’ve mentioned that I did review 
that documentation on how nginx selects a location. Unfortunately I 
didn’t find it particularly clear or helpful.

I especially thought this rule in question would match and take 
precedence over the latter /browser rule, because of this line on that page:

    "If the longest matching prefix location has the “^~” modifier then
    regular expressions are not checked."

i.e. because this rule in question comes first and it is longer than the 
latter /browser rule, a match would occur here and not later (because 
processing stops here)?

And because I couldn’t find much on how nginx handles regex, I ended up 
checking this question/answer 
<https://stackoverflow.com/questions/59846238> on Stackoverflow. It 
cleared things up a little, but still made me wonder why my approach 
didn’t work.

Nevertheless, your suggestions to remove the priority prefix |^~| for 
the second rule fixed the problem, but I still wonder why my approach 
didn’t work. ;)

Speaking of Stackoverflow, I ended up asking the question there also 
<https://stackoverflow.com/questions/76396334>. Not to take this 
conversation away from this list, but since your answer was helpful, 
feel free to chime in there too if you’re looking for some upvotes :)

Thanks,
Jore

​
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx/attachments/20230604/38e0821f/attachment-0001.htm>
M
  • 4 Jun '23
Hello!

On Sun, Jun 04, 2023 at 07:30:40AM +1000, Jore wrote:

> Hi there,
> 
> Thanks for getting back.
> 
> On 4/6/23 3:16 am, Maxim Dounin wrote:
> 
> > Hello!
> 
> […]
> 
> > The "^~" location modifier is for prefix-match locations to prevent 
> > further checking of regular expressions, see 
> > http://nginx.org/r/location for details. If you want to use a regular 
> > expression, you have to use the "~" modifier instead.
> 
> Thank you for that. Apologies, I should’ve mentioned that I did review 
> that documentation on how nginx selects a location. Unfortunately I 
> didn’t find it particularly clear or helpful.
> 
> I especially thought this rule in question would match and take 
> precedence over the latter /browser rule, because of this line on that page:
> 
>     "If the longest matching prefix location has the “^~” modifier then
>     regular expressions are not checked."
> 
> i.e. because this rule in question comes first and it is longer than the 
> latter /browser rule, a match would occur here and not later (because 
> processing stops here)?

The most important part is in the following paragraph:

  A location can either be defined by a prefix string, or by a 
  regular expression. Regular expressions are specified with the 
  preceding “~*” modifier (for case-insensitive matching), or the 
  “~” modifier (for case-sensitive matching). To find location 
  matching a given request, nginx first checks locations defined 
  using the prefix strings (prefix locations). Among them, the 
  location with the longest matching prefix is selected and 
  remembered. Then regular expressions are checked, in the order of 
  their appearance in the configuration file. The search of regular 
  expressions terminates on the first match, and the corresponding 
  configuration is used. If no match with a regular expression is 
  found then the configuration of the prefix location remembered 
  earlier is used.

In other words:

- Regular expressions are with "~*" and "~" modifiers.  Everything 
  else are prefix strings.

- For prefix strings, longest matching prefix is used (note that 
  order of prefix locations is not important).

- If the longest prefix found does not disable regular expression 
  matching (with the "^~" modifier, as per the quote you've 
  provided), regular expressions are checked in order.

As long as a regular expression is matched, nginx will use the 
corresponding location.  If no regular expressions matched, nginx 
will use the longest matching prefix location.

The "location" directive description additionally provides some 
examples explaining how this all works.  Reading the 
https://nginx.org/en/docs/http/request_processing.html article 
might be also helpful.

> And because I couldn’t find much on how nginx handles regex, I ended up 
> checking this question/answer 
> <https://stackoverflow.com/questions/59846238> on Stackoverflow. It 
> cleared things up a little, but still made me wonder why my approach 
> didn’t work.
> 
> Nevertheless, your suggestions to remove the priority prefix |^~| for 
> the second rule fixed the problem, but I still wonder why my approach 
> didn’t work. ;)

In your configuration,

location ^~ "/browser/.*/welcome/welcome.html" { ... }

is a location defined by a prefix string.

It will work for requests with the given prefix, such as 
"/browser/.*/welcome/welcome.html" or 
"/browser/.*/welcome/welcome.html.foobar".  But since it is a 
prefix string, and not a regular expression, the ".*" characters 
do not have any special meaning, and matched literally.  That 
is, this location won't match requests to resources like 
"/browser/foo123/welcome/welcome.html", since these use a 
different prefix.

To make it match requests to 
"/browser/foo123/welcome/welcome.html", you have to change the 
location to a location defined by a regular expression.  That, you 
have to change the "^~" modifier to "~" modifier (and it is also a 
good idea to change the regular expression to a slightly more 
explicit one, see my initial response).  But it is not enough, see 
below.

Similarly,

location ^~ /browser { ... }

is also a location defined by a prefix string.  Further, due to 
the "^~" modifier, it disables matching of regular expressions, so 
any request which starts with "/browser" won't be checked against 
regular expressions.  So you have to remove the "^~" modifier if 
you want nginx to check regular expressions, notably the one in 
the first location (assuming "^~" is changed to "~").

Hope this helps.

-- 
Maxim Dounin
http://mdounin.ru/
J
  • 4 Jun '23
Hi there,

Yes, all that was /very/ helpful, thank you!!!

Much appreciated,

Jore

On 4/6/23 10:09 am, Maxim Dounin wrote:
> The most important part is in the following paragraph:
>    A location can either be defined by a prefix string, or by a
>    regular expression. Regular expressions are specified with the
>    preceding “~*” modifier (for case-insensitive matching), or the
>    “~” modifier (for case-sensitive matching). To find location
>    matching a given request, nginx first checks locations defined
>    using the prefix strings (prefix locations). Among them, the
>    location with the longest matching prefix is selected and
>    remembered. Then regular expressions are checked, in the order of
>    their appearance in the configuration file. The search of regular
>    expressions terminates on the first match, and the corresponding
>    configuration is used. If no match with a regular expression is
>    found then the configuration of the prefix location remembered
>    earlier is used.
>
> In other words:
>
> - Regular expressions are with "~*" and "~" modifiers.  Everything
>    else are prefix strings.
>
> - For prefix strings, longest matching prefix is used (note that
>    order of prefix locations is not important).
>
> - If the longest prefix found does not disable regular expression
>    matching (with the "^~" modifier, as per the quote you've
>    provided), regular expressions are checked in order.
>
> As long as a regular expression is matched, nginx will use the
> corresponding location.  If no regular expressions matched, nginx
> will use the longest matching prefix location.
>
> The "location" directive description additionally provides some
> examples explaining how this all works.  Reading the
> https://nginx.org/en/docs/http/request_processing.html  article
> might be also helpful.
>
>> And because I couldn’t find much on how nginx handles regex, I ended up
>> checking this question/answer
>> <https://stackoverflow.com/questions/59846238>  on Stackoverflow. It
>> cleared things up a little, but still made me wonder why my approach
>> didn’t work.
>>
>> Nevertheless, your suggestions to remove the priority prefix |^~| for
>> the second rule fixed the problem, but I still wonder why my approach
>> didn’t work. ;)
> In your configuration,
>
> location ^~ "/browser/.*/welcome/welcome.html" { ... }
>
> is a location defined by a prefix string.
>
> It will work for requests with the given prefix, such as
> "/browser/.*/welcome/welcome.html" or
> "/browser/.*/welcome/welcome.html.foobar".  But since it is a
> prefix string, and not a regular expression, the ".*" characters
> do not have any special meaning, and matched literally.  That
> is, this location won't match requests to resources like
> "/browser/foo123/welcome/welcome.html", since these use a
> different prefix.
>
> To make it match requests to
> "/browser/foo123/welcome/welcome.html", you have to change the
> location to a location defined by a regular expression.  That, you
> have to change the "^~" modifier to "~" modifier (and it is also a
> good idea to change the regular expression to a slightly more
> explicit one, see my initial response).  But it is not enough, see
> below.
>
> Similarly,
>
> location ^~ /browser { ... }
>
> is also a location defined by a prefix string.  Further, due to
> the "^~" modifier, it disables matching of regular expressions, so
> any request which starts with "/browser" won't be checked against
> regular expressions.  So you have to remove the "^~" modifier if
> you want nginx to check regular expressions, notably the one in
> the first location (assuming "^~" is changed to "~").
>
> Hope this helps.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx/attachments/20230605/56274da9/attachment.htm>