We electronically control revisions for this topic and deploy it directly online. Any copies you generate from the latest revision are uncontrolled. Ensure you refer to the latest revision online when possible.
Latest revision: /docs-as-code//2020/12/03/readability-analysis.html
Quickly cleaning text and generating a readability score
We can use python to generate a readability score for our documents. We may want to do this to all our documentation from time to time in order to prioritize reviews.
The below example gives us a single ARI score for a Sphinx document. We could also recursively apply this for all our documents at once if we want or add it to our DevOps for documentation pipeline.
# prerequisites: Sphinx and pandoc and below listed python libraries
from readability import Readability
from bs4 import BeautifulSoup
import os
os.system("sphinx-build -b singlehtml . _build/singlehtml") # Runs sphinx and builds a single html file of all content.
file = open('_build/singlehtml/index.html',mode='r',encoding='utf8') # Opens the output file with utf8 encoding
soup = BeautifulSoup(file, 'html5lib') # Cleans the markdown html file
for h1 in soup("h1"): # Removes all h1 elements, add other elements as needed
h1.decompose()
r = Readability(soup.text) # Processes readability on cleaned text
f = r.ari() # Runs ARI algorithm on readability
print(f.score) # Prints out the ARI score