Coding Setup

VS Code, Git, Quarto … the stuff you need if you want to work with data

Setup

VS CodeGitQuartoR

You don’t need any other software!*

*ok maybe a shell, a browser, Python, pixi and packages

VS Code

  • Modern, cross-platform IDE
  • Extensibility
    • Languages and frameworks
    • Syntax higlighting and linting
    • Data viewers
    • AI agents (e.g. Cline)
  • Alternatives:

Git

  • As a version control system
    • Collaborate
    • Track changes
    • Undo & recover
    • Branch & merge
  • As a hosted Git platform (e.g. Github, Gitea, …)
    • Remote repository hosting
    • Software project management tool
    • Resource for open-source software
    • Issue and bug reporting for users

Example usage via CLI

# in an existing repo after making some changes
git add .

# commit with an informative message
git commit -m "describe changes here"

# push
git push

# later: pull before starting new work
git pull

Qaurto

Finding the right data science framework

Quarto

  • Open-source publishing system / coding notebook
  • Author with Markdown, HTML, LaTex and YAML
  • Code in R, Python, Julia and (Observable) JavaScript
  • Layouts and styling with CSS
  • Render to HTML or PDF and create articles, websites, dashboards, presentations, …
  • Interactive, data-driven framework

Example usage via CLI

# render a single file
quarto render index.qmd

# preview on local host
quarto preview

Markup Languages I

YAML – metadata and configuration

title: "Template"
author: "Lukas Schmoigl"
project:
  type: website
  output-dir: website
  render: 
    - index.qmd
  preview:
    port: 1111
    browser: true
    watch-inputs: true
format:
  html:
    theme: [style.scss]
    page-layout: article

Markdown – basic formatting and layouting

*italic text* and **bold text**

[Link to WU](https://www.wu.ac.at){.example-link}

`inline code` and code blocks

![Images](img/wifo.svg){width=60 fig-align="left"}

italic text and bold text

Link to WU

inline code and code blocks

Images

Markup Languages II

HTML – advanced formatting and layouting

<details>
  <summary>Click to to see some special information</summary>
  <p>
    <span style="color:#9C6B91">Quarto</span>  is the thing!
  </p>
</details>
Click to to see some special information

Quarto is the thing!

TeX/LaTeX – mathematical expressions and citations

$$ \sqrt{\frac{1}{n}\sum_{i=1}^{n}(y_{i} - \hat{y}_{i})^{2}} $$

\[ \sqrt{\frac{1}{n}\sum_{i=1}^{n}(y_{i} - \hat{y}_{i})^{2}} \]

It can be stated that intensive training programs have 
among the most effect on job market integration 
of long-term unemployed [@eppel2024].

It can be stated that intensive training programs have among the most effect on job market integration of long-term unemployed (Eppel et al. 2024).

Code Chunks – R

R – data wrangling and statistical modelling

library(tidyverse)
library(kableExtra)

data <- read_csv("data/cases.csv")

plot_data <- data %>%
  separate(case_nr, c("court", "case_nr"), ", ") %>%
  group_by(court, month) %>%
  count(name = "cases") %>%
  group_by(court) %>%
  mutate(sum_cases = sum(cases, na.rm = TRUE)) %>%
  filter(sum_cases > 100) %>%
  ungroup()

write_csv(plot_data, "data/plot_data.csv")

plot_data %>%
  filter(row_number() <= 15) %>%
  kable("html") 
court month cases sum_cases
BG Favoriten 2007-01-01 10 624
BG Favoriten 2007-02-01 11 624
BG Favoriten 2007-03-01 9 624
BG Favoriten 2007-04-01 11 624
BG Favoriten 2007-05-01 8 624
BG Favoriten 2007-06-01 8 624
BG Favoriten 2007-07-01 7 624
BG Favoriten 2007-08-01 8 624
BG Favoriten 2007-09-01 4 624
BG Favoriten 2007-10-01 4 624
BG Favoriten 2007-11-01 9 624
BG Favoriten 2007-12-01 11 624
BG Favoriten 2008-01-01 6 624
BG Favoriten 2008-02-01 8 624
BG Favoriten 2008-03-01 11 624

Code Chunks – OJS

Observable JavaScript – interactive data viz

import { vl } from "@vega/vega-lite-api-v5"

plot_data = FileAttachment("data/plot_data.csv")
  .csv({typed: true})

selectLegend = vl.selectPoint("selection")
  .fields("court")
  .toggle("event")
  .bind("legend")

vl.markLine()
  .encode(
    vl.x().fieldT("month").axis({ grid: false }),
    vl.y().fieldQ("cases"),
    vl.color()
      .fieldN("court")
      .legend({ orient: "top", title: null }),
    vl.opacity()
      .if(selectLegend, vl.value(1))
      .value(0),
  )
  .params(selectLegend)
  .data(plot_data)
  .config(config)
  .render({ renderer: "svg" })

Interactivity I

It gets funky with Quarto and inline code cells

In markdown we can write inline R

The data has ˋr nrow(plot_data)ˋ rows and ˋr ncol(plot_data)ˋ columns.  
The total number of cases is ˋr sum(plot_data$cases)ˋ cm.

The data has 280 rows and 4 columns. The total number of cases is 1177 cases.

Same with OJS code

The data has ˋ{ojs} plot_data.lengthˋ rows and ˋ{ojs} Object.keys(plot_data[0]).lengthˋ columns.
The total number of cases is ˋ{ojs} plot_data.reduce((sum, d) => sum + d.cases, 0)ˋ cases.

The data has rows and columns. The total number of cases is cases.

Interactivity II

It gets funkier with the Quarto and the Observable runtime and standard library

viewof text = Inputs.text({
  label: "Special information: ", 
  placeholder: "Type in info!"
  })
viewof color = Inputs.radio(
  ["#336699", "#e52320", "#9C6B91"], 
  {label: "Color"}
  )

htl.html`
<p>
  Here is some special information: 
  <span style="color:${color}">
    ${text}
  </span>
</p>
`

Interactivity III

It gets even funkier with Observable and some data viz

viewof court = Inputs.select(
  [... new Set(plot_data.map(d => d.court))], 
  {label: "Select a court: "}
  )

vl.markLine()
  .encode(
    vl.x()
      .fieldT("month")
      .axis({ grid: false }),
    vl.y()
      .fieldQ("cases"),
  )
  .data(plot_data.filter(d => d.court == court))
  .config(config)
  .render({ renderer: "svg" })

References

Eppel, Rainer, Ulrike Huemer, Helmut Mahringer, and Lukas Schmoigl. 2024. The B.E. Journal of Economic Analysis & Policy 24 (1): 141–85. https://doi.org/doi:10.1515/bejeap-2023-0079.