Please join us on Wednesdays during the 2019 LSA Institute for

Data Summer Camp:

A Wednesday Minicourse on Reproducible Linguistic Research

This series of four 2-hour workshops will provide a gentle introduction to reproducible research in linguistics. Each workshop will provide participants with some basic knowledge and skills in data science that can not only help your linguistic career, but also enrich the quality of scholarship across the discipline.

Participation is free to anyone affiliated with the Institute.

All levels of experience are welcome, especially those who are not yet familiar with tools or processes for reproducible research. Computers are not supplied, so please bring your laptops.

Feel free to drop in to as many workshops as you like.

Seating will be limited, so arrive early!

Workshop 1: The where?, ***how?, and why-me?* of data archiving and sharing**

Cancelled.

Workshop 2: Putting it all together with Knitr, LaTeX, and R Markdown

		Date: July 10, 2019 Time: 10AM-12PM Location: TBA Instructors: Dr. Bradley McDonnell & Dr. Bradley Rentz (University of Hawaiʻi at Mānoa)

Reproducible research has seen recent developments in software packages that facilitate the publishing of the results and/or code alongside prose in a single document that is linked directly to the data. This has obvious advantages for conducting research in linguistics where the writing of an article can be directly linked to the analysis and data.

This session is a gentle introduction to software packages for the R programming language (R Core Team 2015) that allow linguists to embed code within a LaTeX or Markdown document that is directly linked to the data using the the R package knitR (Xie 2015). Among a number of practical advantages, these software packages allow the results and the data to remain separate, but linked.

Participants will be required to bring laptop computers to the workshop running OS-X (Mac) or Windows (mobile systems such as iPads, Android tablets, and Chromebooks are not suitable for the workshop). Basic knowledge of R will be beneficial but is not required. Participants will not be asked to write their own code; exercises will contain the necessary R code.

Workshop 3: The Big Payoff — How reproducible research can help your career

	Date: July 10, 2019 Time: 2PM-4PM Location: TBA Instructor: Dr. Lauren B. Collister (University of Pittsburgh)

The benefits of reproducible research to the scientific endeavor is obvious. However, reproducible research can also benefit your personal career track. In this session, we will go over ways to increase your H-index, explore other metrics of citation, and discuss Open Access, Copyright, and data citation issues. The session will also cover Open Research andContributor IDs (ORCID), how to get one, and why it is professionally beneficial to have one. By the end of the session, in addition to learning practical ways to improve your citation metrics, you will better know how to advocate for your data work in external evaluations like hiring, tenure and promotion.

Workshop 4: Being the best version of yourself with git and GitHub

		Date: July 17 Time: 10AM-12PM Location: TBA Instructor: Dr. Na-Rae Han (University of Pittsburgh)

Versioning (or version control) – managing and tracking changes over the lifespan of documents, code, or data – has grown far beyond its roots in computer science, and versioning systems such as git are becoming increasingly important for linguists in their data management workflows. These systems are made even more powerful when used with web-based platforms for collaboration like GitHub, which is seeing increasing adoption by academics as a venue for hosting and showcasing collaborative projects.

This session teaches the conceptual underpinnings of versioning and its practical applications using git, a free, open source version control management tool, and GitHub, a free web-based collaborative platform. The course will teach versioning both using command line (e.g., Terminal) and popular Integrated Development Environments (i.e., RStudio for R, Jupyter Notebooks for Python) using documents, data and code.

Participants will be required to bring laptop computers to the workshop running OS-X (Mac) or Windows (mobile systems such as iPads, Android tablets, and Chromebooks are not suitable for the workshop). Basic knowledge of either R or Python will be beneficial but is not required. Participants will not be asked to write their own code; exercises will contain the necessary Python and R code.

Data Summer Camp is organized by Andrea Berez-Kroeker, Bradley McDonnell, and Eve Koller (University of Hawaiʻi at Mānoa), and funding is provided by the National Science Foundation under Grant No. SMA-1745249. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.