Of course “only an expert” (that is, one who knows the actual person plus various technical details — one recent “technical” threshold being how human hands actually look) is good at distinguishing synthetic photo imagery from real.
Finding and checking with those who can tell the difference has never been “quick,” and the problem is not in principle new.
Only an expert in a particular painter, plus various technical details of how paint works, is good at distinguishing a forgery…
Only someone who has read and interpreted Kant closely can tell that Eichmann (or the character on The Good Place) is paraphrasing Kant in a misleading way…
Only an expert in what undergraduate writing looks like, including the particular undergraduate submitting an assignment, has a good eye for LLM-conterfeit…
The real difficulty is still not whether and how the most well-positioned person can generally tell the difference. The difficulty is in the increased bandwidth-noise, resource-costs, and attention-burdens caused by needing to have such conversations so frequently, in a world where so many interactions lack the dense social familiarity that helps us track trustworthiness and credibility.