We electronically control revisions for this topic and deploy it directly online. Any copies you generate from the latest revision are uncontrolled. Ensure you refer to the latest revision online when possible.
Latest revision: /docs-as-code//2020/12/03/readability-analysis.html
Quickly cleaning text and generating a readability score
We can use python to generate a readability score for our documents. We may want to do this to all our documentation from time to time in order to prioritize reviews.
The below example gives us a single ARI score for a Sphinx document. We could also recursively apply this for all our documents at once if we want or add it to our DevOps for documentation pipeline.
# prerequisites: Sphinx and pandoc and below listed python libraries from readability import Readability from bs4 import BeautifulSoup import os os.system("sphinx-build -b singlehtml . _build/singlehtml") # Runs sphinx and builds a single html file of all content. file = open('_build/singlehtml/index.html',mode='r',encoding='utf8') # Opens the output file with utf8 encoding soup = BeautifulSoup(file, 'html5lib') # Cleans the markdown html file for h1 in soup("h1"): # Removes all h1 elements, add other elements as needed h1.decompose() r = Readability(soup.text) # Processes readability on cleaned text f = r.ari() # Runs ARI algorithm on readability print(f.score) # Prints out the ARI score