Readability analysis Quickly cleaning text and generating a readability score We can use python to generate a readability score for our documents. We may want to do this to all our documentation from time to time in order to prioritize reviews. The below example gives us a single [ARI](https://en.wikipedia.org/wiki/Automated_readability_index) score for a Sphinx document. We could also recursively apply this for all our documents at once if we want or add it to our DevOps for documentation pipeline. ```python # prerequisites: Sphinx and pandoc and below listed python libraries from readability import Readability from bs4 import BeautifulSoup import os os.system("sphinx-build -b singlehtml . _build/singlehtml") # Runs sphinx and builds a single html file of all content. file = open('_build/singlehtml/index.html',mode='r',encoding='utf8') # Opens the output file with utf8 encoding soup = BeautifulSoup(file, 'html5lib') # Cleans the markdown html file for h1 in soup("h1"): # Removes all h1 elements, add other elements as needed h1.decompose() r = Readability(soup.text) # Processes readability on cleaned text f = r.ari() # Runs ARI algorithm on readability print(f.score) # Prints out the ARI score ```
1 minute read | Concept

Readability analysis

Quickly cleaning text and generating a readability score

We can use python to generate a readability score for our documents. We may want to do this to all our documentation from time to time in order to prioritize reviews.

The below example gives us a single ARI score for a Sphinx document. We could also recursively apply this for all our documents at once if we want or add it to our DevOps for documentation pipeline.

Example
  # prerequisites: Sphinx and pandoc and below listed python libraries
  from readability import Readability
  from bs4 import BeautifulSoup
  import os

  os.system("sphinx-build -b singlehtml . _build/singlehtml") # Runs sphinx and builds a single html file of all content.
  file = open('_build/singlehtml/index.html',mode='r',encoding='utf8') # Opens the output file with utf8 encoding
  soup = BeautifulSoup(file, 'html5lib') # Cleans the markdown html file

  for h1 in soup("h1"): # Removes all h1 elements, add other elements as needed
      h1.decompose()

  r = Readability(soup.text) # Processes readability on cleaned text
  f = r.ari() # Runs ARI algorithm on readability
  print(f.score) # Prints out the ARI score
See also
DevOps for documentation

Home | Contact