Keynote speakers
Mykola Pechenizkiy
Tarek Ghazawi
Title: Broad Learning on Big Data via Fusion of Heterogeneous Information |
![]() |
UIC Distinguished Professor and Wexler Chair in Information Technology University of Illinois at Chicago, Department of Computer Science.
Philip S. Yu’s main research interests include data mining, privacy preserving publishing and mining, data streams, database systems, Internet applications and technologies, multimedia systems, parallel and distributed processing, and performance modeling.
He is a Professor in the Department of Computer Science at the University of Illinois at Chicago and also holds the Wexler Chair in Information and Technology. He was manager of the Software Tools and Techniques group at the IBM Thomas J. Watson Research Center. Dr. Yu has published more than 500 papers in refereed journals and conferences. He holds or has applied for more than 300 US patents.
Dr. Yu is a Fellow of the ACM and of the IEEE. He is associate editors of ACM Transactions on the Internet Technology and ACM Transactions on Knowledge Discovery from Data.
He is on the steering committee of IEEE Conference on Data Mining and was a member of the IEEE Data Engineering steering committee. He was the Editor-in-Chief of IEEE Transactions on Knowledge and Data Engineering (2001-2004), an editor, advisory board member and also a guest co-editor of the special issue on mining of databases.
He had also served as an associate editor of Knowledge and Information Systems. In addition to serving as program committee member on various conferences, he was the program chair or co-chairs of the IEEE Workshop of Scalable Stream Processing Systems (SSPS��07), the IEEE Workshop on Mining Evolving and Streaming Data (2006), the 2006 joint conferences of the 8th IEEE Conference on E-Commerce Technology (CEC’ 06) and the 3rd IEEE Conference on Enterprise Computing, E-Commerce and E-Services (EEE’ 06), the 11th IEEE Intl. Conference on Data Engineering, the 6th Pacific Area Conference on Knowledge Discovery and Data Mining, the 9th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, the 2nd IEEE Intl. Workshop on Research Issues on Data Engineering: Transaction and Query Processing, the PAKDD Workshop on Knowledge Discovery from Advanced Databases, and the 2nd IEEE Intl. Workshop on Advanced Issues of E-Commerce and Web-based Information Systems. He served as the general chair or co-chairs of the 2006 ACM Conference on Information and Knowledge Management, the 14th IEEE Intl. Conference on Data Engineering, and the 2nd IEEE Intl. Conference on Data Mining. He had received several IBM honors including 2 IBM Outstanding Innovation Awards, an Outstanding Technical Achievement Award, 2 Research Division Awards and the 93rd plateau of Invention Achievement Awards. He was an IBM Master Inventor. Dr. Yu received a Research Contributions Award from IEEE Intl. Conference on Data Mining in 2003 and also an IEEE Region 1 Award for “promoting and perpetuating numerous new electrical engineering concepts” in 1999.
Dr. Yu received the B.S. Degree in E.E. from National Taiwan University, the M.S. and Ph.D. degrees in E.E. from Stanford University, and the M.B.A. degree from New York University.
In the era of big data, there are abundant of data available across many different data sources in various formats. “Broad Learning” is a
new type of learning task, which focuses on fusing multiple large-scale information sources of diverse varieties together and carrying out
synergistic data mining tasks across these fused sources in one unified analytic. Great challenges exist on “Broad Learning” for the effective
fusion of relevant knowledge across different data sources, which depend upon not only the relatedness of these data sources, but also the
target application problem. In this talk we examine how to fuse heterogeneous information to improve mining effectiveness over various
applications, including social network, recommendation, mobile health (m-health) and Question Answering (QA).
Title: Big Data for Better Life |
|
Yike Guo, is a Professor of Computing Science in the Department of Computing at Imperial College London. He is the founding Director of the Data Science Institute at Imperial College. He is a Fellow of the Royal Academy of Engineering (FREng), Member of Academia Europaea (MAE), Fellow of British Computer Society and a Trustee of The Royal Institution of Great Britain.
Professor Guo received a first-class honours degree in Computing Science from Tsinghua University, China, in 1985 and received his PhD in Computational Logic from Imperial College in 1993 under the supervision of Professor John Darlington. He founded InforSense, a software company specialized in big data analysis for life science and medicine, and served as CEO for several years before the company’s merger with IDBS, a global advanced R&D software provider, in 2009. He was then the Chief Innovation Officer of the IDBS until 2018. He also served as the Chief Technical Officer of the tranSMART foundation, a global alliance in building open source big data platform for translational medicine research.
He has been working on technology and platforms for scientific data analysis since the mid-1990s, where his research focuses on data mining, machine learning and large-scale data management. He has contributed to numerous major research projects including: the UK EPSRC platform project, Discovery Net; the Wellcome Trust-funded Biological Atlas of Insulin Resistance (BAIR); and the European Commission U-BIOPRED project. He was the Principal Investigator of the European Innovative Medicines Initiative (IMI) eTRIKS project, a €23M project building a cloud-based informatics platform, in which tranSMART is a core component for clinico-genomic medical research, and co-Investigator of Digital City Exchange, a £5.9M research programme exploring ways to digitally link utilities and services within smart cities.
Professor Guo has published over 250 articles, papers and reports. Projects he has contributed to have been internationally recognised, including winning the “Most Innovative Data Intensive Application Award” at the Supercomputing 2002 conference for Discovery Net, the Bio-IT World “Best Practices Award” for U-BIOPRED in 2014 and the “Best Open Source Software Award” from ACM SIGMM in 2017.
Life science now is data driven. Data science provides the core technology for biomedical research, healthcare and well being. This trend also provides great challenges as well as opportunities to big data research.
The primary opportunity from the integration of big data into clinical practice is better treatment for patients, stemming from improved diagnosis and decision support, personalisation of predictions, and quality of care.
More expansive, interlinked health records should make it easier to find suitable participants for clinical trials; likewise, finding another patient with similar symptoms should also be easier. More opportunities are from better treatment with the approach of personalised medicine. Moreover, wearable sensor technology provides the revolutionary change in health monitoring. In this talk, I will present our research in applying big data for healthcare and biomedical research, especially we will focus on machine learning technology for various medical applications.
Title: The cross-roads of algorithmic fairness, accountability and transparency in predictive analytics |
|
Mykola Pechenizkiy is Professor of Data Mining at the Department of Mathematics and Computer Science, TU Eindhoven. His core expertise and research interests are in predictive analytics and its application to real-world problems in industry, medicine and education. At the Data Science Center (DCS/e) he leads the Responsible Data Science interdisciplinary research program aiming at developing techniques for informed, accountable and transparent analytics. As principal investigator of several data science projects he aims at developing foundations for next generation predictive analytics and demonstrating their ecological validity in practice. Over the past decade he has co-authored more than 100 peer-reviewed publications and served on the program committees of the leading data mining and AI conferences.
Modern machine learning techniques contribute to the massive automation of the data-driven decision making and decision support. It becomes better understood and accepted, in particular due to the new General Data Protection Regulation (GDPR), that employed predictive models may need to be audited. Disregarding whether we deal with so-called black-box models (e.g. deep learning) or more interpretable models (e.g. decision trees), answering even basic questions like “why is this model giving these answer?” and “how do particular features affect the model output?” is nontrivial. In reality, auditors need tools not just to explain the decision logic of an algorithm, but also to uncover and characterize undesired or unlawful biases in predictive model performance, e.g. by law hiring decisions cannot be influenced by race or gender. In this talk I will give a brief overview of the different facets of comprehensibility of predictive analytics and reflect on the current state-of-the-art and further research needed for gaining a deeper understanding of what it means for predictive analytics to be truly transparent and accountable. I will also reflect on the necessity to study utility of the methods for interpretable predictive analytics.
Title :Machine-learning discrimination: bias in, bias out. |
![]() |
Toon Calders :received his PhD in Computer Science in 2003 from the University of Antwerp.
In 2006 he joined the Eindhoven University of Technology as an assistant professor, where he left in 2012 to become associate professor at the Université libre de Bruxelles.
In 2016 Toon Calders rejoined the University of Antwerp as a full professor.
The research interests of Toon Calders are situated in machine learning, data mining, and artificial intelligence.
More specifically, he carried out research projects on integrating pattern mining in database systems, on fairness in machine learning, on stream mining, and on dynamic network analysis.
He published over 80 papers in data mining and machine learning conferences such as ACM SIGKDD, ECML/PKDD, ACM PODS, IEEE ICDM, ACM WSDM, pVLDB, SIAM SDM and journals ACM Transactions
on Database Systems, Machine Learning, and Data Mining and Knowledge Discovery.
Artificial intelligence is more and more responsible for decisions that have a huge impact on our lives. But predictions made using data mining and algorithms can affect population subgroups differently. Academic researchers and journalists have shown that decisions taken by predictive algorithms sometimes lead to biased outcomes, reproducing inequalities already present in society. Is it possible to make a fairness-aware data mining process? Are algorithms biased because people are too? Or is it how machine learning works at the most fundamental level?
Title : Exascale and the Convergence of High-Performance Computing, Big Data, AI and IoT |
![]() |
Tarek El-Ghazawi is a Professor in the Department of Electrical and Computer Engineering at The George Washington University, where he leads the university-wide Strategic Academic Program in High-Performance Computing. He is the founding director of The GW Institute for Massively Parallel Applications and Computing Technologies (IMPACT) and was a founding Co-Director of the NSF Industry/University Center for High-Performance Reconfigurable Computing (CHREC), established with funding from NSF, government and industry. El-Ghazawi’s research interests include high-performance computing, computer architectures, reconfigurable and embedded computing, nano-photonic based computing, and computer vision and remote sensing. He is one of the principal co-authors of the UPC parallel programming language and the first author of the UPC book from John Wiley and Sons. El-Ghazawi is also one of the pioneers of the area of High-Performance Reconfigurable Computing (HPRC).
Dr. El-Ghazawi was also one of the early researchers in Cluster Computing and has built the first GW cluster in 1995. At present he is leading efforts for rebooting computing based on new paradigms including analog, nano-photonic and neuromorphic computing. He has served on many boards and served as a consultant for organizations like CESDIS and RIACS at NASA GSFC and NASA ARC, IBM and ARSC. He has received his Ph.D. degree in Electrical and Computer Engineering from New Mexico State University in 1988. El-Ghazawi has published over 250 refereed research publications in his area and his work was funded by government and industry. His research was funded extensively by such government organizations like DARPA, NSF, AFOSR, NASA, DoD and industrial organizations such as Intel, AMD, HP, SGI. Dr. El-Ghazawi has served in many editorial roles including an Associate Editor for the IEEE Transactions Parallel and Distributed Computing and the IEEE Transaction on Computers. He has chaired and co-chaired many IEEE international conferences and symposia, including IEEE PGAS 2015, IEEE/ACM CCGrid2018, IEEE HPCC/SmartCity/DSS 2017 to name a few. Professor El-Ghazawi is a Fellow of the IEEE and selected as a Research Faculty Fellow of the IBM Center for Advanced Studies, Toronto. He was also awarded the Alexander von Humboldt Research Award, from the Humboldt Foundation in Germany (given yearly to 100 scientists across all areas from around the world), the Alexander Schwarzkopf Prize for Technical Innovation, and the GW SEAS Distinguished Researcher Award. El-Ghazawi has served as a senior U.S. Fulbright Scholar.
George Washington University
The field of high-performance computing (HPC) or supercomputing refers to the building and using computing systems that are orders of magnitude faster than our common systems.
The top supercomputer, Summit, can perform 148,600 trillion calculations in one second (148.6 PF on LINPAC). The top two supercomputers are now in the USA followed by two Chinese supercomputers.
Many countries are racing to break the record and build an ExaFLOP supercomputer that can perform more than one million trillion (quintillion) calculations per second.
In fact the USA is planning two supercomputers in 2021 one of which, when fully operational (Frontier), will perform at 1.5 EF.
Incidentally, data volumes due social media and the internet of things (IoTs) have been exploding and AI has been a successful technique with advances in deep learning to leverage those large volumes of data.
Those concurrent developments have thus resulted in what is seen as the Convergence of Big Data and HPC as processing massive data amounts become impractical without HPC. In this talk we examine the progress in HPC and potential applications and capabilities of such convergence as the basis for a future smart world.
We also briefly take a bird’s eye peak at the race for creating postMoore’s law processors to address existing challenges beyond the exascale.
Title :Future Generation Education Technological Model |
|