cross validation

cross validation

The cross validation module evaluates the predictive accuracy of an interpolation model by systematically withholding subsets of the input data, re-running estimation on the remaining samples, and comparing the predicted values against the withheld actuals. Two methods are supported: K-Fold, which divides the data into K subsets that each take a turn as the validation set, and Leave One Out, which validates each individual sample or boring one at a time.

The module accepts interpolation options from an upstream estimation module and outputs a field containing the actual values, predicted values, and their differences. Summary statistics are displayed in the Properties panel after the run completes.

Ports

DirectionNameTypeDescription
InputInterpolation Options InputInterpolation OptionsThe interpolation options used for the cross-validation.
OutputOutput FieldFieldThe field containing the results of the cross-validation.

Properties

PropertyTypeDescription
Allow RunBooleanControls whether the module will run when applications are loaded or data changes. When on, the module runs automatically on load or when the Execute button is pushed. When off, the module will not run unless Execute is pressed.
ExecuteButtonForces the module to run even if Allow Run has been turned off, allowing multiple changes to be made before updating.
Cross Validation MethodChoice: K-Fold, Leave One OutSpecifies the method used for cross-validation. K-Fold divides the data into K subsets, training on K-1 folds and validating on the remaining fold. Leave One Out is a special case of K-Fold where each unit is left out and validated individually.
Fold UnitChoice: Sample, BoringSpecifies how samples are grouped into folds. Sample treats each individual measurement point as its own unit. Boring groups all measurements from the same borehole together, ensuring entire borings are divided into folds.
Number Of FoldsIntegerThe number of folds to use in K-Fold Cross-Validation. Only applicable when the K-Fold method is selected. A common choice is 5 or 10 folds, but the optimal number depends on the size and characteristics of the dataset.
Random SeedIntegerThe random seed used to ensure reproducibility in the random partitioning of data into folds during K-Fold Cross-Validation. Setting a specific seed guarantees the same data splits each run. A value of -1 uses a random seed based on the current system time, producing different splits each run.

Export Options

PropertyTypeDescription
Write As APDVFileExport the cross-validation results as an APDV file containing the actual values, predicted values, and differences for each sample.

Statistics

PropertyTypeDescription
AnalyteStringThe name of the analyte being evaluated. Read-only display field.
SamplesIntegerThe number of samples used in the cross-validation. Read-only display field.
MethodStringThe cross-validation method that was applied. Read-only display field.
FoldsIntegerThe number of folds used in the cross-validation run. Read-only display field.
Root Mean Square ErrorDoubleThe root mean square error between predicted and actual values. Read-only display field.
Mean Absolute ErrorDoubleThe mean absolute error between predicted and actual values. Read-only display field.
Median Absolute ErrorDoubleThe median absolute error between predicted and actual values. Read-only display field.
Pearson CorrelationDoubleThe Pearson correlation coefficient between predicted and actual values. Read-only display field.
R SquaredDoubleThe R-squared (coefficient of determination) value for the cross-validation results. Read-only display field.