Google: auto-translated content not indexed
In a recent Office Hours stream, Google’s John Mueller set out how translating content automatically, especially at scale, can lead to manual actions on behalf of Google’s Web Spam Team, such as de-indexing machine-translated (MT) content.
He confirms translated content is not counted as duplicate content, but then goes into further detail on best practices for translation. Namely, avoiding blind use of their own Google Translate tool.
This is a good sign for search in general, showing willing to clean up the SERPs. Just as with their frequent algorithm updates which have managed to de-index swathes of content mills doing all kinds of programmatic “creation”.
Here’s how he puts it:
“So just in general, translated content is unique content. It’s different words, different letters on the page, so it’s different content. Depending on how you translate it, that would be more of a quality issue.
So if you use an automatic translating tool and you just translate your whole website automatically into a different language then probably we would see that as a lower quality website because often the translations are not that great.
But if you take a translation tool and then you rework it with maybe translators who know the language and you create a better version of that content, then that’s perfectly fine.”
The future of MT content?
Bearing in mind these tools have been around for over a decade now, and in their slightly more accurate neural form for nearly 5 years, how does he see the future of this creation method panning out?
“And I imagine, over time, the translation tools will get better so that it works a little bit better. But at least for the moment, if you just automatically translate it, from a quality point of view, that would be problematic.
And even a step further, if that’s something that is done at scale, then the web spam team might step in and say, this is automatically generated content, we don’t want to index it.”
He finishes with some comments on the copy needing to be natively written, for the sake of the users:
“Like if you’re searching in your language and you find a page and you read it,and it’s like… “I don’t know who wrote this. This doesn’t make much sense.” Then you wouldn’t trust that page, right?
Essentially it’s the same thing. You’re creating content for German users and if they look at it and say, “Oh, this doesn’t make much sense”, then they’re going to go somewhere else."
Where does that leave us?
What he doesn’t say is how Google will detect that this content is automatically translated, or lightly edited translation. Their NMT model doesn’t necessarily produce the same output for the same input every time, also I doubt they’ll be running comparisons on every piece of content.
This leaves the door wide open for the method to still be practiced. The natural deterrents of low engagement and high bounce rates may well ensure it remains relatively low-level in usage.
He also doesn’t say how machine translated website translation would be dealt with by the spam teams. If the navigation experience is poor, this should also compete itself away, but that process might be sped up if Google were clearer on this aspect.
At least if the knowledge is out there that MT content at scale will not be indexed in general, it might raise the bar all around. The awareness that good translation, copy, design or infrastructure can’t be just copied and pasted from one project to another should highlight the value offered by the professional creative industries.
But then, the most successful companies tend to already know that! if you’d like to get up to speed, take a look at our intro to Translation SEO for some inspiration.
If your company has a great content strategy in place and would like to gain a foothold in new markets through website and content translation, book a Free Consultation with Language Ops and we’ll be glad to talk through your options.
In brief: Does Google allow automatically machine translated content to rank?
At any kind of scale, no. It will be penalised. Done well, or heavily modified? It appears so. The best practice as of 2021 is to take your existing, best-performing content and get it professionally translated or re-written. This saves you doing your research twice, and incorporates all the localised issues that an automated or cheap crowd-funded service would not pick up on.
PS - Let's stay in touch
There's plenty more to share, let's keep this going a bit longer!