Source data to customer content flow Using python to extract data from engineering systems, clean it, then parse it for user documentation. I just finished a recent project where we had csv tables consisting of 30,000+ lines that we needed to put into some sort of searchable and readable format for customers. After a few trials, I found the best method was to parse the information in a non-indexable html format that relied on the web browser text search function. Generating 30,000+ indexable locations in each file was a huge burden on the document generation and on the actual access of files. We used the Python *pandas* and *texttable* functions to generate the 30,000+ readable text tables from the content. We also created a single basic table that included the key searchable content and a link to the relevant individual table. After importing the source csv files from engineering, at a glance, we: 1. Applied regex cleaning of the data fields. 2. Created single text files for each line that included a description and a sub-table of key content, as shown below. The generated text files were built on-the-fly via our pipeline. This allowed us to import and parse source data as often as we needed to, without having to add a huge amount to data to the actual document repository. ```python # generateMemoryMapRegisters.py snippet # table generation from https://pypi.org/project/texttable/ tableObj = texttable.Texttable(max_width=118)# Set columns tableObj.header(["Range","Name", "Type", "Reset", "Description"]) for i, row in df.loc[[index]].iterrows(): # Iterate register individual fields as a table description = str(row['help'])+"\r\n\r\n"+str(row['map']) tableObj.add_row( [str(row['range']),str(row['name']),str(row['type']),str(row['reset']),description] ) # Display table print(tableObj.draw()) ``` ```yaml # ./gitlab-ci.yml pipeline example fields: stage: build script: - python generateMemoryMapOverview.py - python generateMemoryMapRegisters.py artifacts: paths: - _static/fields_* - docs/tables/*.csv_overview ```
2 minute read | Concept

Source data to customer content flow

Using python to extract data from engineering systems, clean it, then parse it for user documentation.

I just finished a recent project where we had csv tables consisting of 30,000+ lines that we needed to put into some sort of searchable and readable format for customers. After a few trials, I found the best method was to parse the information in a non-indexable html format that relied on the web browser text search function. Generating 30,000+ indexable locations in each file was a huge burden on the document generation and on the actual access of files.

We used the Python pandas and texttable functions to generate the 30,000+ readable text tables from the content. We also created a single basic table that included the key searchable content and a link to the relevant individual table.

After importing the source csv files from engineering, at a glance, we:

  1. Applied regex cleaning of the data fields.
  2. Created single text files for each line that included a description and a sub-table of key content, as shown below.

The generated text files were built on-the-fly via our pipeline. This allowed us to import and parse source data as often as we needed to, without having to add a huge amount to data to the actual document repository.

Example
# generateMemoryMapRegisters.py snippet

# table generation from https://pypi.org/project/texttable/
tableObj = texttable.Texttable(max_width=118)# Set columns
tableObj.header(["Range","Name", "Type", "Reset", "Description"])
for i, row in df.loc[[index]].iterrows(): # Iterate register individual fields as a table
    description = str(row['help'])+"\r\n\r\n"+str(row['map'])
    tableObj.add_row(
        [str(row['range']),str(row['name']),str(row['type']),str(row['reset']),description]
        )
# Display table
print(tableObj.draw())
# ./gitlab-ci.yml pipeline example

fields:
  stage: build
  script:
    - python generateMemoryMapOverview.py
    - python generateMemoryMapRegisters.py
  artifacts:
    paths:
      - _static/fields_*
      - docs/tables/*.csv_overview
See also

Home | Contact