HOME / Detail

Dr. Dennis k. J. Lin shares the mystery of ghost data

2020-03-10

Guest Introduction

Dr. Dennis K. J. Lin is a university distinguished professor of supply chain and statistics at Penn State University. He currently serves or has served as associate editor for more than 10 professional journals and was co-editor for Applied Stochastic Models for Business and Industry. Dr. Lin is an elected fellow of ASA, IMS and ASQ, an elected member of ISI, a lifetime member of ICSA, and a fellow of RSS. He is an honorary chair professor for various universities, including a Chang-Jiang Scholar at Renmin University of China, Fudan University, and National Chengchi University (Taiwan).

49cfb26afd760fade864a1788d6b5054

Dr. Dennis K. J. Lin

Lecture Detail

Ghost data has been defined as data we cannot observe directly. Ghost data is as natural as the real data, ghost data is everywhere. It has several types like Virtual data, Missing data, Pretend data, Simulation data and Highly Sparse data. For example, in the movie Sherlock Holmes, Holmes can figure out the murderer by the information since the dog didn't bark. In this case, the dog's not barking is ghost data. Absence of evidence is not evidence of absence. Sometimes, ghost data can contain much of information.  In this Lecture, Prof. Lin gave a detailed discussion on importance and applications of ghost data. He showed several examples from school exam, pattern of missing in medical area and so on. Moreover, he introduced the process of dealing with ghost data problems: First we need to find ghost data. Then, if we have a specific aim, we need to figure out a way to speed up the process of machine learning. At last, he showed several ways to deal with the ghost data problems.

e90c83b2d246dbe10d6092e88e03da1f

In the lecture

In addition, Prof. Lin also introduced that big data have four aspects summarized as four "V"s: Volume (Data at Rest), Velocity (Data in Motion), Variety (Data in Many Forms) and Veracity (Data in Doubt). It is meaningless if we consider a very huge data set. In that case, the statistics often have bad performance. So it is important to research the data structure of data set. In his opinion, statistics should not be used as tool but as strategy.

In conclusion, Prof. Lin not only presented us with a systematic analysis of ghost data  but also shared his insightful view concerning the potential of such data and the future development of data science in general. The lecture was informative, inspiring and especially so in Dr. Lin's accessible manner and the discussion it entailed afterwards was also fruitful.

Q&A Session

In this Q&A Session, students and professors asked  questions regarding the relationship between statistics and data science and Prof. Lin gave a detailed answer one by one. And discussion was actively engaged.

a3a7be8ea3efa662bc1ee17bbf73e370

A student in the audience is asking question

53d98935ce7691540c677b237af5c0c5

Active discussion

Finally,  to express our gratitude, Prof. Shao  presented the honorary certificate to Prof. Lin on behalf of College of Science.

79e98f588ceabc61ee419934b0b6ab8a

Prof. Qiman Shao and Prof. Dennis K. J. Lin