Dr. Dennis k. J. Lin shares the mystery of ghost data
Dr. Dennis K. J. Lin is a university distinguished professor of supply chain and statistics at Penn State University. He currently serves or has served as associate editor for more than 10 professional journals and was co-editor for Applied Stochastic Models for Business and Industry. Dr. Lin is an elected fellow of ASA, IMS and ASQ, an elected member of ISI, a lifetime member of ICSA, and a fellow of RSS. He is an honorary chair professor for various universities, including a Chang-Jiang Scholar at Renmin University of China, Fudan University, and National Chengchi University (Taiwan).
Dr. Dennis K. J. Lin
Ghost data has been defined as data we cannot observe directly. Ghost data is as natural as the real data, ghost data is everywhere. It has several types like Virtual data, Missing data, Pretend data, Simulation data and Highly Sparse data. For example, in the movie Sherlock Holmes, Holmes can figure out the murderer by the information since the dog didn't bark. In this case, the dog's not barking is ghost data. Absence of evidence is not evidence of absence. Sometimes, ghost data can contain much of information. In this Lecture, Prof. Lin gave a detailed discussion on importance and applications of ghost data. He showed several examples from school exam, pattern of missing in medical area and so on. Moreover, he introduced the process of dealing with ghost data problems: First we need to find ghost data. Then, if we have a specific aim, we need to figure out a way to speed up the process of machine learning. At last, he showed several ways to deal with the ghost data problems.
In the lecture
In addition, Prof. Lin also introduced that big data have four aspects summarized as four "V"s: Volume (Data at Rest), Velocity (Data in Motion), Variety (Data in Many Forms) and Veracity (Data in Doubt). It is meaningless if we consider a very huge data set. In that case, the statistics often have bad performance. So it is important to research the data structure of data set. In his opinion, statistics should not be used as tool but as strategy.
In conclusion, Prof. Lin not only presented us with a systematic analysis of ghost data but also shared his insightful view concerning the potential of such data and the future development of data science in general. The lecture was informative, inspiring and especially so in Dr. Lin's accessible manner and the discussion it entailed afterwards was also fruitful.
In this Q&A Session, students and professors asked questions regarding the relationship between statistics and data science and Prof. Lin gave a detailed answer one by one. And discussion was actively engaged.
A student in the audience is asking question
Finally, to express our gratitude, Prof. Shao presented the honorary certificate to Prof. Lin on behalf of College of Science.
Prof. Qiman Shao and Prof. Dennis K. J. Lin