file statistics
The file statistics module reads an analytical, lithology, or geologic file (.apdv, .aidv, .pgf, .geo, .gmf, .lpdv, .lsdv, .dg) and reports summary statistics and a frequency distribution for a selected data component. During execution, the module reads the file, displays an error message if the file contains errors in format or numeric values, and then displays the statistical results in the EVS Information window. The sample data and a glyphed renderable can be passed downstream from the output ports for visualization.
Ports
| Direction | Name | Type | Description |
|---|---|---|---|
| Input | Input Filename | String | File used to display data. |
| Output | Output Filename | String | File used to display data. |
| Output | Sample Data | Field | A field containing the sample point data. |
| Output | Sample Object | Renderable | A renderable object displaying the sample data. |
| Output | Numeric Output 1 | Number | A numeric value computed from the expression. |
| Output | Numeric Output 2 | Number | A numeric value computed from the expression. |
| Output | String Output | String | A string value computed from the expression. |
Properties
| Property | Type | Description |
|---|---|---|
| Allow Run | Boolean | This toggle can prevent the module from running, allowing the user to make changes to large data sets without waiting for updates. |
| Execute | Button | This button will force the module to run even if the Allow Run toggle has been turned off. This allows the user to make a number of changes before updating. |
| Filename | String | The file name to process for display. |
| Use Application Origin | Boolean | When true, the module will apply the Application Origin. When false, data will be left in internal model space. Turn off when loading data intended to use as a glyph or similar. |
Data Processing
| Property | Type | Description |
|---|---|---|
| Component Or Layer | Integer | The Data Component is used to select which file data component to process for display. |
| Data Processing | Choice: Linear Processing, Log Processing | Data Processing allows the module to be run in either linear or log space. |
| Z Scale | Double | The Z Scale is the vertical exaggeration to be applied to the output object. |
| Log Post Processing Clip Min | Double | Log Post Processing Clip Min replaces, after data processing, any sample property value that is less than the specified number in log space. |
| Linear Post Processing Clip Min | Double | Linear Post Processing Clip Min replaces, after data processing, any sample property value that is less than the specified number in linear space. |
| Detection Limit | Double | The Detection Limit value affects any file values set with the ‘ND’ or other non-detect flags (for a list of these flags open the help for the APDV file format). When the module encounters this flag in the file it will insert a value equal to (Detection Limit * LT Multiplier). |
| Less Than Multiplier | Double | The Less Than Multiplier is the value applied to any sample with the ‘<’ less than flag. |
Statistic Settings
| Property | Type | Description |
|---|---|---|
| Minimum Data Level | Double | The Minimum Data Level is used to set the lower limit on the data bins for statistical analysis. The default value is the minimum in the selected data component. If the statistical distribution should focus on only a portion of the data, this value can be changed to reflect only that desired range of data. |
| Maximum Data Level | Double | The Maximum Data Level is used to set the upper limit on the data bins for statistical analysis. The default value is the maximum in the selected data component. If the statistical distribution should focus on only a portion of the data, this value can be changed to reflect only that desired range of data. |
| Number Of Bins | Integer | The Number of Bins is used to set the number of distribution bins to be used in the analysis. The default is 10 and the range is from 2 to 255. |
| Summary Statistics | Statistics | A compound control that opens the full statistics view (histogram, percentiles, summary table) and exposes the computed values to the read-only fields below. |
| Data Min | Double | The minimum value in the selected data component. Read-only display field. |
| Data Max | Double | The maximum value in the selected data component. Read-only display field. |
| Processing | String | The data processing mode (Linear or Log). Read-only display field. |
| Mean | Double | The mean of the selected data component. Read-only display field. |
| Median | Double | The median of the selected data component. Read-only display field. |
| Std Dev | Double | The standard deviation of the selected data component. Read-only display field. |
Time Settings
| Property | Type | Description |
|---|---|---|
| Chem File Is Time Domain | Boolean | The Chem File Is Time Domain toggle turns on date interpolation for time-domain analyte (e.g. chemistry) files. |
| Specify Date By Component | Boolean | The Specify Date By Component toggle causes the Date field to be ignored and the date to be selected using the Data Component. |
| Date For Interpolation | Date | The Date For Interpolation field is the date being interpolated to. For example, if there is an analyte value of 2 on 1/01/05 and a value of 4 on 1/03/05 and the date is set to 1/02/05 with Direct Interpolation, the value would be 3. The Date can be either set here or passed in via the Date port. |
| Analyte Name | String | The Analyte Name field is used for AIDV and APDV time files, where the dates take up the spots in these files usually reserved for analyte names. |
| Interpolation Type | Choice: Direct Interpolation Only, Interpolate Only, Interpolate and Extrapolate Beyond, Interpolate and Extrapolate | The interpolation method defines how to interpolate analyte values to the chosen date. See Interpolation Methods below for a complete description of each option. |
| Use Nearest Measured Data | Boolean | The Use Nearest Measured Data toggle causes the sample at the interpolated date to take the value of the nearest measured date instead of an interpolated value. |
Interpolation Methods
Each interpolation method defines how to interpolate when given Missing values in a file. Non-Detect values are equal to either the Detection Limit or the Pre Clip Min. If the Date is set to the same time as a Non-Detect in the file, the sample will be a Non-Detect and not an interpolated value.
- Direct Interpolation Only: The most basic interpolation method, and the most accurate in terms of representing the data as it has been entered. Looks at the two dates surrounding the input Date. If either date, or both dates, have Missing as values, the value for that sample will be Missing and no interpolation will occur.
- Interpolate Only: Looks through the date columns both before and after the set Date for values that are not Missing, then interpolates between those numbers. If it fails to find a non-Missing value before and after the set date, it sets the data to Missing. Useful for files with a small amount of Missing values.
- Interpolate and Extrapolate Beyond: Looks through the date columns both before and after the set Date for the first instance of a value that is not Missing. If it does not find a valid non-Missing date after the input Date, it extrapolates beyond the last useable date to the input Date.
- Interpolate and Extrapolate: Looks through the date columns both before and after the set Date for the first instance of a value that is not Missing. If it fails to find one, it extrapolates the first value backwards to the input Date. If the date after the input Date is Missing, it looks forward through the time columns until it finds a date that is not Missing. It also extrapolates beyond the last valid date in the file.
Glyph Settings
| Property | Type | Description |
|---|---|---|
| Points As Glyphs | Boolean | The Points As Glyphs toggle causes the points to be displayed as a user-selected glyph. |
| Point Width | Integer | The Point Width sets the size of the rendered pixels. The default is 0, which is equivalent to 1. |
| Glyph Size | Double | The Glyph Size value is used to scale the glyphs in all directions. The default is automatically computed based on the input data. |
| Priority | Choice: Maximum, Minimum | The Priority of the glyph reverses the scaling so that the smallest sample values have the largest size. |
| Minimum Scale Factor | Double | The Minimum Scale Factor scales the sample values with the least Priority. |
| Maximum Scale Factor | Double | The Maximum Scale Factor scales the sample values with the greatest Priority. |
| Use Log Data | Boolean | The Use Log Data toggle forces the size of the glyph to be based on the log10 of the selected data. |
| Generated Glyph | Choice: Sphere, Cube, Cone, Cylinder, Polygon, Disk | The Generated Glyph choice selects the type of glyph that is automatically generated. |
| Sphere Subdivisions | Integer | The Sphere Subdivisions value defines how finely the sample spheres are rendered. Higher values mean smoother spheres but at a higher memory cost. |
| Glyph Resolution | Integer | The resolution for generated cone, polygon, cylinder, and disk glyphs. |
| Primary Axis Factor | Double | The scale factor for the primary axis of the glyph. |
| Secondary Axis Factor | Double | The scale factor for the secondary axis of the glyph. |
| Heading Dip | Heading/Dip | The Heading and Dip values are used to align the glyphs to a constant orientation. |
| Roll | Double | The roll of the glyph along its primary axis. |
Output Port Settings
| Property | Type | Description |
|---|---|---|
| Numeric Expression 1 | String | Python expression for the first numeric output port. Use variable names like Mean, Median, DataMin, DataMax, StdDev, Variance, FirstQuartile, ThirdQuartile, IQR, NumberOfPoints, NumberOfCells to reference statistics values. |
| Numeric Expression 2 | String | Python expression for the second numeric output port. |
| String Expression | String | Python f-string expression for the string output port. Use {VariableName} placeholders such as {Processing}, {Units}, {Report} to reference statistics values. |