Description
While our primary job as researchers is to generate new knowledge, it is equally important for us to understand what it is already out there. In this talk we present two examples in which Python was the enabling tool that allowed us to crunch that information in a time-efficient way: in the first example, our goal was to map a field of scientific research based on the analysis of bibliographic data to understand who, where, when, and what is being published. In particular we focused on the field of Atomic Layer Deposition, a materials synthesis technique that, among other things, has become a key component of semiconductor manufacturing, and it is one of the author's (AY) key areas of expertise. The second example focuses on mapping the evolution of Wikipedia's published content around a particular scientific discipline.
In both cases we used a similar approach: we took the simplest possible approach that minimized development/learning time, prioritizing lightweight, native code and the use of standard libraries over performance. In this talk we will begin by emphasizing the methodology that we followed, addressing questions such as how to transform bibliographic data into dataframes and graphs, and the approach that we took to parse Wikipedia. We will then provide an overview of the results to exemplify Python's capabilities for this kind of data analysis.