We live in a digital world where having access to the information at our fingertips doesn’t make it easier to communicate… it makes it harder. In every data bundle there are many concealed narratives and conclusions, some stories will be unremarkable, others will be highly charged, but all remain out of sight without the skills required to view the hidden truths. Revealing unseen tales within data bundles and communicating that information requires curiosity, determination and skill. Applying the basic concepts, tools and related know how of data science procedures is key to deciphering and revealing the headlining facts. Naturally, in order for this to happen the basic concepts and practices of data analysis need to be applied and adopted by those wishing to uncover the stories. Telling stories with data is a skill that’s becoming ever more important in the digital world of increasing data, and the desire for informed data driven decision making.
The most viable software of choice for reporting modern data analysis is R, as data and code is reproducible and therefore useful to others, as all the information collated within the analysis is available. R is the lingua franca of quantitative research and as such is an influential and indispensable feature of any data scrutiny process. Through the use of grammar of graphics in R and Shiny, a web application framework for R, it is possible to develop interactive applications for graphical data visualisation. Presenting findings through interactive and accessible visualisation methods is an effective form of reaching audiences that might not otherwise realise the value of the topic or data at hand.
This course will provide an overview of key concepts for creating an effective data science project and will introduce tools and techniques for data wrangling, visualisation and dynamic reproducible reporting using R, a public domain language for data analysis. The R language provides a rich and flexible environment for working with data, especially data to be used for statistical modelling or graphics.
The R system has an extensive library of packages that offer state-of-the-art-abilities. Many of the analyses that they offer are not even available in any of the standard packages. R enables you to escape from the restrictive environments and sterile analyses offered by commonly used statistical software packages. It enables easy experimentation and exploration, which improves data analysis. Sharing your data analysis knowledge discovery is necessary in making it useful. R is a tool that enables reporting modern data analyses in a reproducible manner. It makes analysis more useful to others because the data and code that actually conducted the analysis can be made available and easily shared. Accordingly, this course will emphasize packages that will help you do data analysis, visualisation and communication with the wider audience.
The course will start by introducing the fundamental concepts of R: basic use of R console through RStudio IDE, inputting and importing data, record keeping and general good practice of R project workflow. It will then progress to basic statistical concepts, which theoretically may be perceived as complex and thereby can be more effectively communicated by using visualisation. Hence, the formal abstract nature of Statistics can be demystified by visualising its application context, which is why the focus is directed on building appropriate visualisation of a given data analysis problem. At the end of the course, after students develop an understanding of their data, they will use R’s reproducible and interactive approach to knit this into a tight and concise narrative, and of course, present their story by creating reproducible RMarkdown documents and Shiny Web Apps. The course will finish by creating a website for the blog posts of your data science narratives using HUGO and blogdown.
Version control has become an essential tool for keeping track when working on DS projects, as well as collaborating. RStudio supports working with Git, an open source distributed version control system, which is easy to use when combined with GitHub, a web-based Git repository hosting service. We will introduce you to GitHub and you’ll become acquainted with good practice when incorporating the use of Git into your R project workflow.
There is a demand for open and transparent data sources by governments and civic groups as a means to improve the lives of citizens. Together we will investigate the importance of open source data and we will identify where open source data can be readily found accross the Internet. You will work on case studies inspired by real problems and based on open data.
To learn how to access and prepare data for the analysis
To introduce the basic principles behind effective data visualisation
To learn essential explanatory techniques for summarising data
To produce explanatory data visualisations providing insight into what could be found within the data
To utilise R’s library of tools to visualise geospatial problems
To design reproducible reports by automating the reporting process
To share the results of analysis as interactive, eye-catching web apps that are friendly to non-programmers.
To be familiar with R/RStudio’s data handling facilities that will expand the range of data analysis problems that can be effectively analysed.
The material is structured within 3 daily modules. Each module is a three and a half hour long session split into 2½ hours hands-on interactive student/teacher workshops with the last hour reserved for questions and discussions.
The course will be taught by Tatjana Kecojevic and Dusko Medic and will cover various related topics through appropriate case studies, presentations and readings. The conceptual models come to life when practice becomes reality during the hands-on taught sessions, through the application of R. Students are then expected to use their own time to practice and hone acquired data handling expertise acquired during the taught sessions.
Students are expected to participate fully in all of these delivery modes, but in particular are expected to have attempted any pre-set work and come fully prepared to discuss any problems encountered and debate the ideas and any issues raised.
We recommend you complete each of the following before the end of each module:
This course is designed for anyone who needs to communicate information to someone using data. It will benefit anyone who has the curiosity and desire to enter the realm of data exploration. We will seek to make sense of the world of data and learn effective and attractive ways to visually analyse and communicate related information. With the knowledge gained on this course, you will be ready to undertake your very first explanatory data analysis.
Data Science is not simply fashionable jargon, but rather a discipline with a set of tools that empower data enriched living, so whatever industry you’re in, this is relevant to you!
Prior experience is not required.
The course will be delivered in English and Serbian!
© 2020 Sister Analyst