I wonder if something like this would help. We could create a function that accepts two names, does a fair bit of normalization of each of them, and then computes a similarity score.
Then on import, we can offer as options for normalization all names in the wiki above a certain threshold, which the user can accept or choose to ignore. I coded a very naïve implementation, which will give us values like this:
"Simone de Beauvoir" is a 100% match with "Beauvoir, Simone de"
"Simone de Beauvoir" is a 100% match with "de Beauvoir, Simone"
"Márquez, Gabriel García" is a 100% match with "Gabriel Garcia Marquez"
"Mills, Charles W." is a 90.25% match with "C Wright Mills"
"Nussbaum, Martha C." is a 90% match with "Nussbaum, Martha"
"Martha Nussbaum" is a 95% match with "M Nussbaum"
"Martha Nussbaum" is a 85.5% match with "M C Nussbaum"
"Martha C. Nussbaum" is a 95% match with "Martha Craven Nussbaum"
"Martha Nussbaum" is a 90% match with "Martha H Nussbaum"
"Friedrich Nietzsche" is a 90% match with "Nietzsche"
"T. S. Eliot" is a 90.25% match with "Eliot, Thomas Stearns"
"Martha C. Nussbaum" is a 50% match with "Martha H Nussbaum"
"Martha C. Nussbaum" is a 47.5% match with "M.H. Nussbaum"
"J.R.R. Tolkein" is a 25% match with "G.R.R. Martin"
"Prince" is a 40.5% match with "Prince Rogers Nelson"
"Beyoncé Giselle Knowles-Carter" is a 36.45% match with "Beyonce"
This is written in JS, and might not be easy to move to wikitext, but as I think it would make the most sense for this to end up as an operator, that doesn’t bother me much.
There are likely to be many adjustments needed to make this hit your 98% target, but I think it could get close, with a threshold value of 50% match.
It is biased toward last names, so “Nietzsche” is closer to “Friedrich Nietzsche” than “Prince” is to “Prince Rogers Nelson” There are certain potential problems. Right now “Van Halen” and “Eddie Van Halen” are only a 45% match, which seems too low. And totally unrelated names like “Scott Sauyet” and “Elise Springer” are a 25% match, which seems too high. But neither seems a show-stopper.
If this seems like a promising approach, I can explain the implementation more completely.