This function implements the computation of Krippendorff's Alpha as per https://repository.upenn.edu/cgi/viewcontent.cgi?article=1043&context=asc_papers # nolint

replicability(
  coders,
  unit_from = "unit",
  measurement_from = "measurement",
  frequency_from = NULL,
  return_by_unit = FALSE
)

Arguments

coders

`data.table` containing the reliability data in long format.

unit_from

Name of the column containing the unit ID

measurement_from

Name of the column containing the measurements

frequency_from

(Optional) Name of the column containing the frequencies, *if* the data is in the "aggregated" form described above.

return_by_unit

(default FALSE) If TRUE, return a data.table of

Value

alpha

Krippendorff's Alpha reliability index

De

Expected disagreement

Do

Overall observed disagreement accross all units

by_unit

Dataframe with one line per unit and columns

unit

Unit

mu

Number of observations in that unit

Do

Observed disagreement within this unit

Details

It is designed to be space efficient for sparse oberverments, and as thus does not take as input a reliability matrix, but a long-format data.table

Supports nominal and binary data.

If a tibble or a non-data.table dataframe is passed as input, this function will still work! It will silently create a data.table copy of your dataframe (or tibble).

There are two possible types of input: - The "extended" form: a tidy table of individual votes, where each row is one measurement of one unit by one coder. This is typically the raw data coming out of an annotation database. The user just needs to specify which column contains the unit ID, and which column contains the measurement. Each measurement takes one of the possible values of the nominal variable. - The "aggregated" form: a tidy table of frequencies of votes per unit, where each row is the number of measurements for one unit for one nominal value. The user needs to specify which column contains the unit ID, which column contains the measurement value (amongst one of the possible values of the nominal variable), and which column contains the frequency of coders having assigned this measurement to this unit.

WARNING: this function *will* change the data.table in `dt`. If you want to avoid this, better calling it on a copy of the table.