cross validation
The cross validation module evaluates the predictive accuracy of an interpolation model by systematically withholding subsets of the input data, re-running estimation on the remaining samples, and comparing the predicted values against the withheld actuals. Two methods are supported: K-Fold, which divides the data into K subsets that each take a turn as the validation set, and Leave One Out, which validates each individual sample or boring one at a time.
The module accepts interpolation options from an upstream estimation module and outputs a field containing the actual values, predicted values, and their differences. Summary statistics are displayed in the Properties panel after the run completes.
Ports
| Direction | Name | Type | Description |
|---|---|---|---|
| Input | Interpolation Options Input | Interpolation Options | The interpolation options used for the cross-validation. |
| Output | Output Field | Field | The field containing the results of the cross-validation. |
Properties
| Property | Type | Description |
|---|---|---|
| Allow Run | Boolean | Controls whether the module will run when applications are loaded or data changes. When on, the module runs automatically on load or when the Execute button is pushed. When off, the module will not run unless Execute is pressed. |
| Execute | Button | Forces the module to run even if Allow Run has been turned off, allowing multiple changes to be made before updating. |
| Cross Validation Method | Choice: K-Fold, Leave One Out | Specifies the method used for cross-validation. K-Fold divides the data into K subsets, training on K-1 folds and validating on the remaining fold. Leave One Out is a special case of K-Fold where each unit is left out and validated individually. |
| Fold Unit | Choice: Sample, Boring | Specifies how samples are grouped into folds. Sample treats each individual measurement point as its own unit. Boring groups all measurements from the same borehole together, ensuring entire borings are divided into folds. |
| Number Of Folds | Integer | The number of folds to use in K-Fold Cross-Validation. Only applicable when the K-Fold method is selected. A common choice is 5 or 10 folds, but the optimal number depends on the size and characteristics of the dataset. |
| Random Seed | Integer | The random seed used to ensure reproducibility in the random partitioning of data into folds during K-Fold Cross-Validation. Setting a specific seed guarantees the same data splits each run. A value of -1 uses a random seed based on the current system time, producing different splits each run. |
Export Options
| Property | Type | Description |
|---|---|---|
| Write As APDV | File | Export the cross-validation results as an APDV file containing the actual values, predicted values, and differences for each sample. |
Statistics
| Property | Type | Description |
|---|---|---|
| Analyte | String | The name of the analyte being evaluated. Read-only display field. |
| Samples | Integer | The number of samples used in the cross-validation. Read-only display field. |
| Method | String | The cross-validation method that was applied. Read-only display field. |
| Folds | Integer | The number of folds used in the cross-validation run. Read-only display field. |
| Root Mean Square Error | Double | The root mean square error between predicted and actual values. Read-only display field. |
| Mean Absolute Error | Double | The mean absolute error between predicted and actual values. Read-only display field. |
| Median Absolute Error | Double | The median absolute error between predicted and actual values. Read-only display field. |
| Pearson Correlation | Double | The Pearson correlation coefficient between predicted and actual values. Read-only display field. |
| R Squared | Double | The R-squared (coefficient of determination) value for the cross-validation results. Read-only display field. |