OCSP checks fail only on 1st site hit; OK afterwards ?

P
  • 9 Nov '22
i run nginx/1.23.2 on linux

after a clear reboot, on first access to my site front page, I see in log

    ==> /var/log/nginx/example.com.443.error.log <==
    2022/11/09 12:38:15 [info] 1460#1460: *2 SSL_do_handshake() failed (SSL: error:0A000412:SSL routines::sslv3 alert bad certificate:SSL alert number 42) while SSL handshaking, client: 2601:...:xxx1, server: [2600:...:xxx6]:443

if I immediately just reload the page in browser, no more problem; the page renders ok, SSL check out, all site nav is fine

subsequent hits to the front page are also OK

i use include letsencrypt certs.

digging around, i found this from 2013

    Can't get OCSP stapling to work, despite openssl working fine
     https://success.qualys.com/discussions/s/question/0D52L00004TnuFdSAJ/cant-get-ocsp-stapling-to-work-despite-openssl-working-fine

my config includes,

    ssl_stapling on;
    ssl_stapling_verify on;
    ssl_stapling_responder http://r3.o.lencr.org/;
    server {
        ssl_trusted_certificate ...;
    }

checking, after cold reboot, 1st connect returns an OCSP missing response

    echo | openssl s_client -connect example.com:443 -servername example.com -tls1_3  -tlsextdebug -status
        CONNECTED(00000003)
        ...
        depth=0 CN = example.com
        verify return:1
!!      OCSP response: no response sent
        ...
        ---
        SSL handshake has read 4384 bytes and written 318 bytes
        Verification: OK
        ---
        New, TLSv1.3, Cipher is TLS_CHACHA20_POLY1305_SHA256
        Server public key is 384 bit
        Secure Renegotiation IS NOT supported
        Compression: NONE
        Expansion: NONE
        No ALPN negotiated
        Early data was not sent
        Verify return code: 0 (ok)
        ---
        DONE

but an immediately subsequent 2nd try returns a response

    echo | openssl s_client -connect example.com:443 -servername example.com -tls1_3  -tlsextdebug -status
        CONNECTED(00000003)
        ...
        verify return:1
        OCSP response:
        ======================================
        OCSP Response Data:
            OCSP Response Status: successful (0x0)
            Response Type: Basic OCSP Response
            Version: 1 (0x0)
            Responder Id: C = US, O = Let's Encrypt, CN = R3
            Produced At: Nov  9 17:09:00 2022 GMT
            Responses:
            Certificate ID:
              Hash Algorithm: sha1
              Issuer Name Hash: 48D...3D1
              Issuer Key Hash: 142...2BC
              Serial Number: 022...84E
            Cert Status: good
            This Update: Nov  9 17:00:00 2022 GMT
            Next Update: Nov 16 16:59:58 2022 GMT

            Signature Algorithm: sha256WithRSAEncryption
            Signature Value:
                09:...:cf
        ======================================
        ...
        ---
        SSL handshake has read 4894 bytes and written 318 bytes
        Verification: OK
        ---
        New, TLSv1.3, Cipher is TLS_CHACHA20_POLY1305_SHA256
        Server public key is 384 bit
        Secure Renegotiation IS NOT supported
        Compression: NONE
        Expansion: NONE
        No ALPN negotiated
        Early data was not sent
        Verify return code: 0 (ok)
        ---
        DONE

so far, this^^ is 100% reproducible for me; always/only on first load after boot

this 'feels' like a timeout before OCSP is cached, and no issues after.
not sure

reading up at

    https://nginx.org/en/docs/http/ngx_http_ssl_module.html

i see

    ssl_stapling_responder

        "Overrides the URL of the OCSP responder specified in the “Authority Information Access” certificate extension."

which i use, but also

    ssl_ocsp_responder

        "Overrides the URL of the OCSP responder specified in the “Authority Information Access” certificate extension for validation of client certificates. "

which I don't currently.

what's the difference in function/usage between those two?

As far as caching, I also see

    ssl_ocsp_cache

which i haven't defined, so it's at default

    ssl_ocsp_cache off

any clues as to what's missing/misconfig'd and responsible for the 1st-time-only fails I see?
P
  • 9 Nov '22
an old, 2015 post from Caddy Webserver's author,

    OCSP Stapling Robustness in Apache and nginx
     https://gist.github.com/mholt/3b4910c802b2ed7e92294e26a1ae8551

comments,

    "...
    nginx's logic is a lot more robust than Apache's in this regard. Good OCSP responses are cached for an hour, but are not replaced until a successful new response has been received, meaning nginx can weather temporary OCSP responder outages. Unfortunately, nginx's logic is drastically worse in a different way: nginx kicks off OCSP queries on-demand, during the TLS handshake, but continues the handshake without waiting for the OCSP response to return. And since the OCSP response caches are unique per worker process, the first TLS connection handled by any given worker process never has a response stapled! (By the way, this makes testing whether you've properly enabled OCSP stapling rather annoying and confusing if you don't know about this.) This behavior also means that if a worker process sites idle for a long time, it doesn't refresh its OCSP responses and could staple an expired OCSP response on the next request it handles. [Update: the expired response issue is fixed in nginx 1.9.2. Now, if the cached OCSP response is expired, no response at all is stapled. A query to the OCSP responder is still initiated in the background, so subsequent handshakes should have a fresh stapled response.]
    ..."

that suggests an 'updated' (back then, as of v >= 1.9.2) behavior of no OCSP response on 1st try, but a background-queried-and-cached ok response subsequently.

which, sounds like what i'm seeing.

> i run nginx/1.23.2 on linux
> 
> after a clear reboot, on first access to my site front page, I see in log
> 
>      ==> /var/log/nginx/example.com.443.error.log <==
>      2022/11/09 12:38:15 [info] 1460#1460: *2 SSL_do_handshake() failed (SSL: error:0A000412:SSL routines::sslv3 alert bad certificate:SSL alert number 42) while SSL handshaking, client: 2601:...:xxx1, server: [2600:...:xxx6]:443
> 
> if I immediately just reload the page in browser, no more problem; the page renders ok, SSL check out, all site nav is fine
> 
> subsequent hits to the front page are also OK
...

is that (still?) the current mode of operation in nginx's ocsp logic ?
P
  • 9 Nov '22
This 2012 post

    Priming the OCSP cache in Nginx
     https://unmitigatedrisk.com/?p=241

comments

    "...
    in Nginx 1.3.7, unfortunately architectural restrictions made it impractical to make it so that pre-fetching the OCSP response on server start-up so instead the first connection to the server primes the cache that is used for later connections.

    This is a fine compromise but what if you really want the first connection to have the benefit too? Well there are two approaches you can take:
    ..."

where OCSP pre-fetching is a challenge that Cloudflare similarly took up in 2017 outside of its then-Nginx usage,

    High-reliability OCSP stapling and why it matters
     https://blog.cloudflare.com/high-reliability-ocsp-stapling/

Adding to

    edit /etc/systemd/system/nginx.service

+       ExecStartPost=/bin/bash /etc/nginx/scripts/ocsp_prefetch.sh

where

    cat /etc/nginx/scripts/ocsp_prefetch.sh

iterates over served domains,

    echo QUIT | openssl s_client -connect ${_thisDom}:443 -servername ${_thisDom} -tls1_3  -tlsextdebug -status 2> /dev/null

Does the trick.  After cold reboot, 1st hits to site(s) no longer fail to respond in-browser, or fail to provide OCSP response to openssl s_client query.

IS there an nginx prefetch mechanism available natively in current version ?

I found this 7 yr old enhancement request,

    Fetch OCSP responses on startup, and store across restarts
     https://trac.nginx.org/nginx/ticket/812

which afaict wasn't resolved.