Data Science and the Digital Thread | Part 1

In the past year, Data Science has been getting a lot of play in the Systems Engineering world. Digital Transformation offers an enormous opportunity in the development of complex systems, but only if we can handle the enormous datasets that accompany real-world projects. My objective in this blog series is to demonstrate some practical approaches that are already available, using two open-source Data Science tools for analyzing the Digital Thread.

Digital Thread is one of those widely used terms that mean different things to different people. I will use it to mean a federation of models and data resident in multiple repositories and databases, as illustrated in Figure 1. Each individual repository contains artifacts and relations, which I will call intra-model connections, which can have attributes or properties. All this data is typically created and managed by specialized software tools through interfaces used by domain engineers, MEs, SEs, and so forth. The Digital Thread is realized by a fine-grained network of inter-model connections which link artifacts across model and repository boundaries. All of these, artifacts, intra-model and inter-model connections, and their attributes, comprise the Digital Thread.


Figure 1  Structure of the Digital Thread

We can propose some desirable characteristics for the Digital Thread:

  • It should be open to incorporate multiple disciplines, organizations, and software vendors
  • All the data should be accessible to users, preferably real-time, to help them in their individual tasks and to allow them to monitor, review and document the total system model
  • At the same time, it should be secure against unauthorized access
  • Dynamic and multi-branched mean it can evolve over time, but remember its history, and support multiple variant configurations simultaneously
  • And scalable, which brings us to Data Science.

Data Science has been called the offspring of statistics and computer science. At root, it is a systematic approach to extracting knowledge from data. That approach has multiple phases, starting from the bottom with collecting the data and potentially proceeding to creative generation of new ideas and new products (Figure 2).

The Data Scientist starts by gathering the data. Depending on the situation, this can be static or streaming, structured or unstructured, noisy or well-behaved. Next, the Data Scientist is concerned about how and where it will be stored and how it will get there. The data must be examined for gaps, errors, and outliers. Only then can he or she begin the statistical analysis with basic metrics, aggregations and dimensional reduction. Once the data is reorganized, the Data Scientist can start to ask and answer practical questions. The cutting edge of Data Science is in deep learning where the algorithms can start to answer questions they hadn’t even thought to ask.


Figure 2 Correlation between Data Science and Systems Engineering

How does this map to SE?  We start by collecting the domain models and data sets like test and simulation results. We manage the data in specialized repositories like PLM and ALM. In many cases, we also need to flow data and transform models between repositories. Verification and validation can be thought of as the preparation phase. Are the models properly formed? Are versions consistently matched?

Now we can begin to use the data, to query, search and visualize it to find the information we need to do our jobs. We can apply tools like trade studies, optimization, and product line engineering to evaluate different candidate configurations. The last stage is the future of systems engineering, extracting deep knowledge about the systems we build.

The remainder of this series will explore some of these ideas in practice. In Part 2 (forthcoming), we will look at the pieces required to apply Data Science to the Digital Thread.

For more blogs in the series:

Dirk Zwemer

Dr. Dirk Zwemer ( is President of Intercax LLC (Atlanta, GA), a supplier of MBE engineering software platforms like Syndeia and ParaMagic. He is an active teacher and consultant in the field and holds Level 4 Model Builder-Advanced certification as an OMG System Modeling Professional.