If I'm reading that right, this could still read a couple of bytes past the wanted memory area.
Yeah, this is possible.
For example, imagine a case of 65 bytes with a location a bit unaligned (more than 2 bytes). You'd want to check the remaining size after the first loop, not the initial one.
I'd be OK to have a quick loop for the less-than-64-byte case rather than more checks depending on sizeof(size_t) spread, like Bertrand is suggesting.
I'm ok too. Maybe we are trying to optimize early.