Source data to customer content flow Using python to extract data from engineering systems, clean it, then parse it for user documentation. I just finished a recent project where we had csv tables consisting of 30,000+ lines that we needed to put into some sort of searchable and readable format for customers. After a few trials, I found the best method was to parse the information in a non-indexable html format that relied on the web browser text search function. Generating 30,000+ indexable locations in each file was a huge burden on the document generation and on the actual access of files. We used the Python *pandas* and *texttable* functions to generate the 30,000+ readable text tables from the content. We also created a single basic table that included the key searchable content and a link to the relevant individual table. After importing the source csv files from engineering, at a glance, we: 1. Applied regex cleaning of the data fields. 2. Created single text files for each line that included a description and a sub-table of key content, as shown below. The generated text files were built on-the-fly via our pipeline. This allowed us to import and parse source data as often as we needed to, without having to add a huge amount to data to the actual document repository. ```python # generateMemoryMapRegisters.py snippet # table generation from https://pypi.org/project/texttable/ tableObj = texttable.Texttable(max_width=118)# Set columns tableObj.header(["Range","Name", "Type", "Reset", "Description"]) for i, row in df.loc[[index]].iterrows(): # Iterate register individual fields as a table description = str(row['help'])+"\r\n\r\n"+str(row['map']) tableObj.add_row( [str(row['range']),str(row['name']),str(row['type']),str(row['reset']),description] ) # Display table print(tableObj.draw()) ``` ```yaml # ./gitlab-ci.yml pipeline example fields: stage: build script: - python generateMemoryMapOverview.py - python generateMemoryMapRegisters.py artifacts: paths: - _static/fields_* - docs/tables/*.csv_overview ```

We electronically control revisions for this topic and deploy it directly online. Any copies you generate from the latest revision are uncontrolled. Ensure you refer to the latest revision online when possible.

Latest revision: /docs-as-code//2022/02/24/source-data-to-customer-content.html

2 minute read | Concept

Source data to customer content flow

Using python to extract data from engineering systems, clean it, then parse it for user documentation.

I just finished a recent project where we had csv tables consisting of 30,000+ lines that we needed to put into some sort of searchable and readable format for customers. After a few trials, I found the best method was to parse the information in a non-indexable html format that relied on the web browser text search function. Generating 30,000+ indexable locations in each file was a huge burden on the document generation and on the actual access of files.

We used the Python pandas and texttable functions to generate the 30,000+ readable text tables from the content. We also created a single basic table that included the key searchable content and a link to the relevant individual table.

After importing the source csv files from engineering, at a glance, we:

Applied regex cleaning of the data fields.
Created single text files for each line that included a description and a sub-table of key content, as shown below.

The generated text files were built on-the-fly via our pipeline. This allowed us to import and parse source data as often as we needed to, without having to add a huge amount to data to the actual document repository.

Example

# generateMemoryMapRegisters.py snippet

# table generation from https://pypi.org/project/texttable/
tableObj = texttable.Texttable(max_width=118)# Set columns
tableObj.header(["Range","Name", "Type", "Reset", "Description"])
for i, row in df.loc[[index]].iterrows(): # Iterate register individual fields as a table
    description = str(row['help'])+"\r\n\r\n"+str(row['map'])
    tableObj.add_row(
        [str(row['range']),str(row['name']),str(row['type']),str(row['reset']),description]
        )
# Display table
print(tableObj.draw())

# ./gitlab-ci.yml pipeline example

fields:
  stage: build
  script:
    - python generateMemoryMapOverview.py
    - python generateMemoryMapRegisters.py
  artifacts:
    paths:
      - _static/fields_*
      - docs/tables/*.csv_overview

Source data to customer content flow

Example

See also