Please join us on Wednesdays during the 2019 LSA Institute for

 

Data Summer Camp:

A Wednesday Minicourse on Reproducible Linguistic Research

This series of four 2-hour workshops will provide a gentle introduction to reproducible research in linguistics. Each workshop will provide participants with some basic knowledge and skills in data science that can not only help your linguistic career, but also enrich the quality of scholarship across the discipline.

Participation is free to anyone affiliated with the Institute.

All levels of experience are welcome, especially those who are not yet familiar with tools or processes for reproducible research. Computers are not supplied, so please bring your laptops.

Feel free to drop in to as many workshops as you like.

Seating will be limited, so arrive early!

 


Workshop 1: The where?, how?, and why-me? of data archiving and sharing

Date: June 26, 2019
Time: 10AM-12PM
Location: TBA

Instructor: Dr. Helene N. Andreassen (UiT The Arctic University of Norway)

Data collection is essential to many research projects, but sadly many of us fail to seek advice on how to archive our research data properly. This might lead to poor structuring and failure to safely store the data, which can result in our valuable data being forgotten, unreusable, and lost. In this session, we discuss how to go about preparing your data for archiving, and how to evaluate and select a repository that ensures optimal conditions for data preservation, retrieval and visibility. We will also discuss why proper archiving is important and how it contributes to the overall quality of the scientific linguistic enterprise.  

In recent years, open access to research data and citation requirements have become important factors in rendering research more transparent, and thereby more easily trustworthy in the scientific community. And there are in fact many benefits to sharing research data that are not subject to ethical, legal, or security restrictions. Sharing not only allows others to reproduce your research, it also leads to more citations and new collaborations. In addition, the requirement on sharing open research data is rapidly becoming more widespread among scientific journals, organizations, and funding agencies.

This session will walk you through three main components of research data management: Structuring and documentation, evaluation of repositories, and citation of data in publications. The session is relevant for all types of data, regardless of the subfield.

 

 


Workshop 2: Putting it all together with Knitr, LaTeX, and R Markdown


Date: July 10, 2019
Time: 10AM-12PM
Location: TBA

Instructors: Dr. Bradley McDonnell & Dr. Bradley Rentz (University of Hawaiʻi at Mānoa)

Reproducible research has seen recent developments in software packages that facilitate the publishing of the results and/or code alongside prose in a single document that is linked directly to the data. This has obvious advantages for conducting research in linguistics where the writing of an article can be directly linked to the analysis and data.

This session is a gentle introduction to software packages for the R programming language (R Core Team 2015) that allow linguists to embed code within a LaTeX or Markdown document that is directly linked to the data using the the R package knitR (Xie 2015). Among a number of practical advantages, these software packages allow the results and the data to remain separate, but linked.

Participants will be required to bring laptop computers to the workshop running OS-X (Mac) or Windows (mobile systems such as iPads, Android tablets, and Chromebooks are not suitable for the workshop). Basic knowledge of R will be beneficial but is not required. Participants will not be asked to write their own code; exercises will contain the necessary R code.

 

 


Workshop 3: The Big Payoff — How reproducible research can help your career

Date: July 10, 2019
Time: 2PM-4PM   
Location: TBA

Instructor: Dr. Lauren B. Collister (University of Pittsburgh)

The benefits of reproducible research to the scientific endeavor is obvious. However, reproducible research can also benefit your personal career track. In this session, we will go over ways to increase your H-index, explore other metrics of citation, and discuss Open Access, Copyright, and data citation issues. The session will also cover Open Research andContributor IDs (ORCID), how to get one, and why it is professionally beneficial to have one. By the end of the session, in addition to learning practical ways to improve your citation metrics, you will better know how to advocate for your data work in external evaluations like hiring, tenure and promotion.

 

 


Workshop 4: Being the best version of yourself with git and GitHub


Date: July 17
Time: 10AM-12PM
Location: TBA

Instructors: Dr. Na-Rae Han (University of Pittsburgh) & Dr. Bradley McDonnell  (University of Hawaiʻi at Mānoa)

Versioning (or version control)  – managing and tracking changes over the lifespan of documents, code, or data – has grown far beyond its roots in computer science, and versioning systems such as git are becoming increasingly important for linguists in their data management workflows. These systems are made even more powerful when used with web-based platforms for collaboration like GitHub, which is seeing increasing adoption by academics as a venue for hosting and showcasing collaborative projects.

This session teaches the conceptual underpinnings of versioning and its practical applications using git, a free, open source version control management tool, and GitHub, a free web-based collaborative platform. The course will teach versioning both using command line (e.g., Terminal) and popular Integrated Development Environments (i.e., RStudio for R, Jupyter Notebooks for Python) using documents, data and code.

Participants will be required to bring laptop computers to the workshop running OS-X (Mac) or Windows (mobile systems such as iPads, Android tablets, and Chromebooks are not suitable for the workshop). Basic knowledge of either R or Python will be beneficial but is not required. Participants will not be asked to write their own code; exercises will contain the necessary Python and R code.

 

 

 

 

Data Summer Camp is organized by Andrea Berez-Kroeker, Bradley McDonnell, and Eve Koller (University of Hawaiʻi at Mānoa), and funding is provided by the National Science Foundation under Grant No. SMA-1745249. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.