file statistics

file statistics

The file statistics module reads an analytical, lithology, or geologic file (.apdv, .aidv, .pgf, .geo, .gmf, .lpdv, .lsdv, .dg) and reports summary statistics and a frequency distribution for a selected data component. During execution, the module reads the file, displays an error message if the file contains errors in format or numeric values, and then displays the statistical results in the EVS Information window. The sample data and a glyphed renderable can be passed downstream from the output ports for visualization.

Ports

DirectionNameTypeDescription
InputInput FilenameStringFile used to display data.
OutputOutput FilenameStringFile used to display data.
OutputSample DataFieldA field containing the sample point data.
OutputSample ObjectRenderableA renderable object displaying the sample data.
OutputNumeric Output 1NumberA numeric value computed from the expression.
OutputNumeric Output 2NumberA numeric value computed from the expression.
OutputString OutputStringA string value computed from the expression.

Properties

PropertyTypeDescription
Allow RunBooleanThis toggle can prevent the module from running, allowing the user to make changes to large data sets without waiting for updates.
ExecuteButtonThis button will force the module to run even if the Allow Run toggle has been turned off. This allows the user to make a number of changes before updating.
FilenameStringThe file name to process for display.
Use Application OriginBooleanWhen true, the module will apply the Application Origin. When false, data will be left in internal model space. Turn off when loading data intended to use as a glyph or similar.

Data Processing

PropertyTypeDescription
Component Or LayerIntegerThe Data Component is used to select which file data component to process for display.
Data ProcessingChoice: Linear Processing, Log ProcessingData Processing allows the module to be run in either linear or log space.
Z ScaleDoubleThe Z Scale is the vertical exaggeration to be applied to the output object.
Log Post Processing Clip MinDoubleLog Post Processing Clip Min replaces, after data processing, any sample property value that is less than the specified number in log space.
Linear Post Processing Clip MinDoubleLinear Post Processing Clip Min replaces, after data processing, any sample property value that is less than the specified number in linear space.
Detection LimitDoubleThe Detection Limit value affects any file values set with the ‘ND’ or other non-detect flags (for a list of these flags open the help for the APDV file format). When the module encounters this flag in the file it will insert a value equal to (Detection Limit * LT Multiplier).
Less Than MultiplierDoubleThe Less Than Multiplier is the value applied to any sample with the ‘<’ less than flag.

Statistic Settings

PropertyTypeDescription
Minimum Data LevelDoubleThe Minimum Data Level is used to set the lower limit on the data bins for statistical analysis. The default value is the minimum in the selected data component. If the statistical distribution should focus on only a portion of the data, this value can be changed to reflect only that desired range of data.
Maximum Data LevelDoubleThe Maximum Data Level is used to set the upper limit on the data bins for statistical analysis. The default value is the maximum in the selected data component. If the statistical distribution should focus on only a portion of the data, this value can be changed to reflect only that desired range of data.
Number Of BinsIntegerThe Number of Bins is used to set the number of distribution bins to be used in the analysis. The default is 10 and the range is from 2 to 255.
Summary StatisticsStatisticsA compound control that opens the full statistics view (histogram, percentiles, summary table) and exposes the computed values to the read-only fields below.
Data MinDoubleThe minimum value in the selected data component. Read-only display field.
Data MaxDoubleThe maximum value in the selected data component. Read-only display field.
ProcessingStringThe data processing mode (Linear or Log). Read-only display field.
MeanDoubleThe mean of the selected data component. Read-only display field.
MedianDoubleThe median of the selected data component. Read-only display field.
Std DevDoubleThe standard deviation of the selected data component. Read-only display field.

Time Settings

PropertyTypeDescription
Chem File Is Time DomainBooleanThe Chem File Is Time Domain toggle turns on date interpolation for time-domain analyte (e.g. chemistry) files.
Specify Date By ComponentBooleanThe Specify Date By Component toggle causes the Date field to be ignored and the date to be selected using the Data Component.
Date For InterpolationDateThe Date For Interpolation field is the date being interpolated to. For example, if there is an analyte value of 2 on 1/01/05 and a value of 4 on 1/03/05 and the date is set to 1/02/05 with Direct Interpolation, the value would be 3. The Date can be either set here or passed in via the Date port.
Analyte NameStringThe Analyte Name field is used for AIDV and APDV time files, where the dates take up the spots in these files usually reserved for analyte names.
Interpolation TypeChoice: Direct Interpolation Only, Interpolate Only, Interpolate and Extrapolate Beyond, Interpolate and ExtrapolateThe interpolation method defines how to interpolate analyte values to the chosen date. See Interpolation Methods below for a complete description of each option.
Use Nearest Measured DataBooleanThe Use Nearest Measured Data toggle causes the sample at the interpolated date to take the value of the nearest measured date instead of an interpolated value.

Interpolation Methods

Each interpolation method defines how to interpolate when given Missing values in a file. Non-Detect values are equal to either the Detection Limit or the Pre Clip Min. If the Date is set to the same time as a Non-Detect in the file, the sample will be a Non-Detect and not an interpolated value.

  1. Direct Interpolation Only: The most basic interpolation method, and the most accurate in terms of representing the data as it has been entered. Looks at the two dates surrounding the input Date. If either date, or both dates, have Missing as values, the value for that sample will be Missing and no interpolation will occur.
  2. Interpolate Only: Looks through the date columns both before and after the set Date for values that are not Missing, then interpolates between those numbers. If it fails to find a non-Missing value before and after the set date, it sets the data to Missing. Useful for files with a small amount of Missing values.
  3. Interpolate and Extrapolate Beyond: Looks through the date columns both before and after the set Date for the first instance of a value that is not Missing. If it does not find a valid non-Missing date after the input Date, it extrapolates beyond the last useable date to the input Date.
  4. Interpolate and Extrapolate: Looks through the date columns both before and after the set Date for the first instance of a value that is not Missing. If it fails to find one, it extrapolates the first value backwards to the input Date. If the date after the input Date is Missing, it looks forward through the time columns until it finds a date that is not Missing. It also extrapolates beyond the last valid date in the file.

Glyph Settings

PropertyTypeDescription
Points As GlyphsBooleanThe Points As Glyphs toggle causes the points to be displayed as a user-selected glyph.
Point WidthIntegerThe Point Width sets the size of the rendered pixels. The default is 0, which is equivalent to 1.
Glyph SizeDoubleThe Glyph Size value is used to scale the glyphs in all directions. The default is automatically computed based on the input data.
PriorityChoice: Maximum, MinimumThe Priority of the glyph reverses the scaling so that the smallest sample values have the largest size.
Minimum Scale FactorDoubleThe Minimum Scale Factor scales the sample values with the least Priority.
Maximum Scale FactorDoubleThe Maximum Scale Factor scales the sample values with the greatest Priority.
Use Log DataBooleanThe Use Log Data toggle forces the size of the glyph to be based on the log10 of the selected data.
Generated GlyphChoice: Sphere, Cube, Cone, Cylinder, Polygon, DiskThe Generated Glyph choice selects the type of glyph that is automatically generated.
Sphere SubdivisionsIntegerThe Sphere Subdivisions value defines how finely the sample spheres are rendered. Higher values mean smoother spheres but at a higher memory cost.
Glyph ResolutionIntegerThe resolution for generated cone, polygon, cylinder, and disk glyphs.
Primary Axis FactorDoubleThe scale factor for the primary axis of the glyph.
Secondary Axis FactorDoubleThe scale factor for the secondary axis of the glyph.
Heading DipHeading/DipThe Heading and Dip values are used to align the glyphs to a constant orientation.
RollDoubleThe roll of the glyph along its primary axis.

Output Port Settings

PropertyTypeDescription
Numeric Expression 1StringPython expression for the first numeric output port. Use variable names like Mean, Median, DataMin, DataMax, StdDev, Variance, FirstQuartile, ThirdQuartile, IQR, NumberOfPoints, NumberOfCells to reference statistics values.
Numeric Expression 2StringPython expression for the second numeric output port.
String ExpressionStringPython f-string expression for the string output port. Use {VariableName} placeholders such as {Processing}, {Units}, {Report} to reference statistics values.