I know its a absolute wonky workaround but you could use a second phone and enable google speech input – or an FOSS alternative: FUTO Voice Input (Local LLM Model that works pretty great. Better than google imo. Is better finding the correct words and also putting logical punctiation. – as in when should a comma or dot appear.)
Now you enable speech input on one phone and playback the voice message of the dude on the other end. Now you got all the text.
Not shitting ya here. Your comment actually made me think about what useless luxuries I am chasing and what are gimmics or targets that might actually make me more happy in the long run.