cross validation :: Earth Volumetric Studio Help

cross validation

The cross validation module evaluates the predictive accuracy of an interpolation model by systematically withholding subsets of the input data, re-running estimation on the remaining samples, and comparing the predicted values against the withheld actuals. Two methods are supported: K-Fold, which divides the data into K subsets that each take a turn as the validation set, and Leave One Out, which validates each individual sample or boring one at a time.

The module accepts interpolation options from an upstream estimation module and outputs a field containing the actual values, predicted values, and their differences. Summary statistics are displayed in the Properties panel after the run completes.

Ports

Direction	Name	Type	Description
Input	Interpolation Options Input	Interpolation Options	The interpolation options used for the cross-validation.
Output	Output Field	Field	The field containing the results of the cross-validation.

Properties

Property	Type	Description
Allow Run	Boolean	Controls whether the module will run when applications are loaded or data changes. When on, the module runs automatically on load or when the Execute button is pushed. When off, the module will not run unless Execute is pressed.
Execute	Button	Forces the module to run even if Allow Run has been turned off, allowing multiple changes to be made before updating.
Cross Validation Method	Choice: K-Fold, Leave One Out	Specifies the method used for cross-validation. K-Fold divides the data into K subsets, training on K-1 folds and validating on the remaining fold. Leave One Out is a special case of K-Fold where each unit is left out and validated individually.
Fold Unit	Choice: Sample, Boring	Specifies how samples are grouped into folds. Sample treats each individual measurement point as its own unit. Boring groups all measurements from the same borehole together, ensuring entire borings are divided into folds.
Number Of Folds	Integer	The number of folds to use in K-Fold Cross-Validation. Only applicable when the K-Fold method is selected. A common choice is 5 or 10 folds, but the optimal number depends on the size and characteristics of the dataset.
Random Seed	Integer	The random seed used to ensure reproducibility in the random partitioning of data into folds during K-Fold Cross-Validation. Setting a specific seed guarantees the same data splits each run. A value of -1 uses a random seed based on the current system time, producing different splits each run.

Export Options

Property	Type	Description
Write As APDV	File	Export the cross-validation results as an APDV file containing the actual values, predicted values, and differences for each sample.

Statistics

Property	Type	Description
Analyte	String	The name of the analyte being evaluated. Read-only display field.
Samples	Integer	The number of samples used in the cross-validation. Read-only display field.
Method	String	The cross-validation method that was applied. Read-only display field.
Folds	Integer	The number of folds used in the cross-validation run. Read-only display field.
Root Mean Square Error	Double	The root mean square error between predicted and actual values. Read-only display field.
Mean Absolute Error	Double	The mean absolute error between predicted and actual values. Read-only display field.
Median Absolute Error	Double	The median absolute error between predicted and actual values. Read-only display field.
Pearson Correlation	Double	The Pearson correlation coefficient between predicted and actual values. Read-only display field.
R Squared	Double	The R-squared (coefficient of determination) value for the cross-validation results. Read-only display field.