WDF*IDF is a popular formula used in SEO circles to test whether texts use a sufficiently large range of relevant terms without neglecting the actual keyword or vice versa, using this focus keyword too often.
Whether Google applies it (and if so in what kind and extent) is unclear. Nevertheless, writers and SEOs can draw important conclusions from the WDF * IDF values of their texts. The formula helps in particular to understand which terms other texts often use on the same topic in the net. In the simplest case, the formula provides clues as to which terms you can still use in a text to make it better.
WDF*IDF as a challenge to keyword density
The WDF*IDF formula has long been treated in SEO circles as a magic formula. Although hardly anyone understood (or had a desire to really understand) how to really calculate it, it quickly made SEO career. The formulas promise to salvation was, above all, to make a text predictable. Before the WDF*IDF formula entered the SEO stage, texts were calculated almost exclusively by the keyword density. The word or term for which you wanted to rank should occur frequently but not too often in the text. An ideal ratio to the total amount of words was 2 to 3 percent. For some, up to 5 percent were acceptable.
The fluctuations of the percentage in the keyword density result from the greatly simplified (by not to say, primitive) calculation. The counting is based on all words in the text. Therefore, we also include stop words such as “the, a, an, and, or …” etc. These words, however, occur in almost all texts. Even before the introduction of the Hummingbird algorithm and the conversion of the search engine into a semantic-holistic search the pure counting of keywords in texts was too little to capture their true value for users and the search engine.
The keyword density can always be the first clue to check if the text takes the keyword into consideration at all. There are countless texts on the net, which have a keyword density of 1% (especially for longer texts). If you check these texts with a WDF*IDF-Tool, you can quickly identify what they are about: they use the relevant and related terms for the selected keyword. They deal with their theme more comprehensively and are probably therefore more interesting for users. The assumption of the WDF*IDF disciples is that Google could see it the same way.
What are WDI and IDF at all?
WDF = Within Document Frequency
WDF refers to a formula that takes the frequency of the keywords as a basis, but limits them by logarithms, thus preventing a term from being over-weighted when used very frequently. This calculation is done in the WDF not just for one keyword, but for each meaningful word of the text.
IDF = Inverse Document Frequency
Since the terms of a text are already weighted by WDF, it is obvious to compare this weighting with that of other texts – of course, especially texts on the web that have chosen the same keyword are interesting. The IDF formula allows precisely this matching. It weights terms not on the basis of a text, but on the basis of all indexed texts on the web.
WDF*IDF – the mysterious combination
If WDF and IDF are linked to one another, one gets to know which terms are used most frequently in successfully ranking texts. The recommendation of the formula is to use these terms in your own text also with the relative frequency of the other texts – not rarer than the average of the texts, but also not more frequently. Nearly every WDF*IDF analysis promotes terms that have not been included in the text so far. SEO writers can thus decide whether these terms can be added to the text in order to make it even more relevant.
Helpful but no substitute for thinking!
The formula can not be used with every text: Product descriptions in an online shop are known to follow different rules as an informative guide in an online magazine and must be adapted individually according to the information density. Especially in the editorial area the keywords determined by the WDF*IDF tools are often not to be processed. Frequently, competition brands are mentioned, which may not play a role in the own text. In addition, the formula does not recognize the focus of the individual texts.
The WDF*IDF formula therefore provides clues as to which terms in a text could be missing. However, it must always be decided in each case whether these terms are actually relevant to the emphasis of the text. Simply adding them somehow is certainly not a good idea. Especially since Google now works apparently less with term weightings (as determined by WDF*IDF), but with term vectors. Vectors make it possible to establish connections between terms that are related to each other. In a text such as “Stockholm”, “Paris” and “Berlin”, a term vector can be drawn to “capital” and “Europe”, even if these terms are not mentioned in the text itself.
This method is anchored in the self-learning RankBrain, which Google has been using since 2015 and has become one of the most important ranking factors ever. The more Google develops towards a semantic-holistic search engine, the lower the value of a formula like WDF*IDF for SEOs is. As a source of inspiration for important, relevant terms, however, the corresponding tools are always good.