Readability analysis

Readability analysis Quickly cleaning text and generating a readability score We can use python to generate a readability score for our documents. We may want to do this to all our documentation from time to time in order to prioritize reviews. The below example gives us a single [ARI](https://en.wikipedia.org/wiki/Automated_readability_index) score for a Sphinx document. We could also recursively apply this for all our documents at once if we want or add it to our DevOps for documentation pipeline. ```python # prerequisites: Sphinx and pandoc and below listed python libraries from readability import Readability from bs4 import BeautifulSoup import os os.system("sphinx-build -b singlehtml . _build/singlehtml") # Runs sphinx and builds a single html file of all content. file = open('_build/singlehtml/index.html',mode='r',encoding='utf8') # Opens the output file with utf8 encoding soup = BeautifulSoup(file, 'html5lib') # Cleans the markdown html file for h1 in soup("h1"): # Removes all h1 elements, add other elements as needed h1.decompose() r = Readability(soup.text) # Processes readability on cleaned text f = r.ari() # Runs ARI algorithm on readability print(f.score) # Prints out the ARI score ```

We electronically control revisions for this topic and deploy it directly online. Any copies you generate from the latest revision are uncontrolled. Ensure you refer to the latest revision online when possible.

Latest revision: /docs-as-code//2020/12/03/readability-analysis.html

We can use python to generate a readability score for our documents. We may want to do this to all our documentation from time to time in order to prioritize reviews.

The below example gives us a single ARI score for a Sphinx document. We could also recursively apply this for all our documents at once if we want or add it to our DevOps for documentation pipeline.

Example

  # prerequisites: Sphinx and pandoc and below listed python libraries
  from readability import Readability
  from bs4 import BeautifulSoup
  import os

  os.system("sphinx-build -b singlehtml . _build/singlehtml") # Runs sphinx and builds a single html file of all content.
  file = open('_build/singlehtml/index.html',mode='r',encoding='utf8') # Opens the output file with utf8 encoding
  soup = BeautifulSoup(file, 'html5lib') # Cleans the markdown html file

  for h1 in soup("h1"): # Removes all h1 elements, add other elements as needed
      h1.decompose()

  r = Readability(soup.text) # Processes readability on cleaned text
  f = r.ari() # Runs ARI algorithm on readability
  print(f.score) # Prints out the ARI score

Readability analysis

Example

See also