R/replicability.R
replicability.Rd
This function implements the computation of Krippendorff's Alpha as per https://repository.upenn.edu/cgi/viewcontent.cgi?article=1043&context=asc_papers # nolint
replicability( coders, unit_from = "unit", measurement_from = "measurement", frequency_from = NULL, return_by_unit = FALSE )
coders | `data.table` containing the reliability data in long format. |
---|---|
unit_from | Name of the column containing the unit ID |
measurement_from | Name of the column containing the measurements |
frequency_from | (Optional) Name of the column containing the frequencies, *if* the data is in the "aggregated" form described above. |
return_by_unit | (default FALSE) If TRUE, return a data.table of |
Krippendorff's Alpha reliability index
Expected disagreement
Overall observed disagreement accross all units
Dataframe with one line per unit and columns
Unit
Number of observations in that unit
Observed disagreement within this unit
It is designed to be space efficient for sparse oberverments, and as thus does not take as input a reliability matrix, but a long-format data.table
Supports nominal and binary data.
If a tibble or a non-data.table dataframe is passed as input, this function will still work! It will silently create a data.table copy of your dataframe (or tibble).
There are two possible types of input: - The "extended" form: a tidy table of individual votes, where each row is one measurement of one unit by one coder. This is typically the raw data coming out of an annotation database. The user just needs to specify which column contains the unit ID, and which column contains the measurement. Each measurement takes one of the possible values of the nominal variable. - The "aggregated" form: a tidy table of frequencies of votes per unit, where each row is the number of measurements for one unit for one nominal value. The user needs to specify which column contains the unit ID, which column contains the measurement value (amongst one of the possible values of the nominal variable), and which column contains the frequency of coders having assigned this measurement to this unit.
WARNING: this function *will* change the data.table in `dt`. If you want to avoid this, better calling it on a copy of the table.