org.checkita.dqf.core.metrics.df.regular.GroupingDFMetrics
SequenceCompletenessDFMetricCalculator
case class SequenceCompletenessDFMetricCalculator(metricId: String, columns: Seq[String], increment: Long) extends GroupingDFMetricCalculator with Product with Serializable
Calculates completeness of incremental integer (long) sequence, i.e. checks if sequence does not have missing elements.
Works for single column only!
- metricId
Id of the metric.
- columns
Sequence of columns which are used for metric calculation
- Note
If exact result is not mandatory, then it's better to use HyperLogLog-based metric calculator called "APPROXIMATE_SEQUENCE_COMPLETENESS".
- Alphabetic
- By Inheritance
- SequenceCompletenessDFMetricCalculator
- Serializable
- Serializable
- Product
- Equals
- GroupingDFMetricCalculator
- DFMetricCalculator
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
val
columns: Seq[String]
- Definition Classes
- SequenceCompletenessDFMetricCalculator → DFMetricCalculator
-
val
emptyValue: Column
Value which is returned when metric result is null.
Value which is returned when metric result is null. As all grouping metric calculators use summation during aggregation of intermediate per-group results then they should return zero when applied to empty sequence.
- Attributes
- protected
- Definition Classes
- GroupingDFMetricCalculator → DFMetricCalculator
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
errorConditionExpr(implicit colTypes: Map[String, DataType]): Column
Collect error data for groups where at least on of the column values is null.
Collect error data for groups where at least on of the column values is null.
- colTypes
Map of column names to their datatype.
- returns
Spark row-level expression yielding boolean result.
- Attributes
- protected
- Definition Classes
- SequenceCompletenessDFMetricCalculator → DFMetricCalculator
-
def
errorExpr(rowData: Column)(implicit colTypes: Map[String, DataType]): Column
Grouping calculators will have different API for collecting per-group arrays with error data.
Grouping calculators will have different API for collecting per-group arrays with error data. Therefore, an exception will be throw if this method is called.
- rowData
Array of row data from columns related to this metric calculator (source keyFields + metric columns + window start time column for streaming applications)
- colTypes
Map of column names to their datatype.
- returns
Spark expression that will yield row data in case of metric error.
- Attributes
- protected
- Definition Classes
- GroupingDFMetricCalculator → DFMetricCalculator
-
def
errorMessage: String
Error message that will be returned when metric increment fails.
Error message that will be returned when metric increment fails.
The error data is collected for rows where some of the columns values cannot be cast to number (Long).
- returns
Metric increment failure message.
- Definition Classes
- SequenceCompletenessDFMetricCalculator → DFMetricCalculator
-
def
errors(implicit errorDumpSize: Int, keyFields: Seq[String], colTypes: Map[String, DataType]): Column
Final metric errors aggregation expression.
Final metric errors aggregation expression. Merges all per-group metric errors into final array of error data. The size of array is limited by maximum allowed error dump size parameter.
- errorDumpSize
Maximum allowed number of errors to be collected per single metric.
- keyFields
Sequence of source/stream key fields.
- colTypes
Map of column names to their datatype.
- returns
Spark expression that will yield array of metric errors.
- Definition Classes
- GroupingDFMetricCalculator → DFMetricCalculator
-
val
errorsCol: String
Name of the column that will store metric errors
Name of the column that will store metric errors
- Definition Classes
- DFMetricCalculator
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
val
groupAggregationFunction: (Column) ⇒ Column
Function that aggregates metric increments into intermediate per-group metric results.
Function that aggregates metric increments into intermediate per-group metric results. Accepts spark expression
groupResultExpr
as input and returns another spark expression that will yield aggregated double metric result per each group.We will consider only those groups whose value is a natural number of Long type. Thus for other groups (where casting to Long type yields null) the zero value is assigned.
- Attributes
- protected
- Definition Classes
- SequenceCompletenessDFMetricCalculator → GroupingDFMetricCalculator
-
def
groupErrorExpr(rowData: Column, errorDumpSize: Int): Column
Per-group error collection expression: collects row data for ENTIRE group in case of metric error.
Per-group error collection expression: collects row data for ENTIRE group in case of metric error. The size of array is limited by maximum allowed error dump size parameter.
- rowData
Array of row data from columns related to this metric calculator for current group. (source keyFields + metric columns + window start time column for streaming applications)
- errorDumpSize
Maximum allowed number of errors to be collected per single metric.
- returns
Spark expression that will yield array of row data per group in case of metric error.
- Attributes
- protected
- Definition Classes
- SequenceCompletenessDFMetricCalculator → GroupingDFMetricCalculator
-
def
groupErrors(implicit errorDumpSize: Int, keyFields: Seq[String], colTypes: Map[String, DataType]): Column
Per-group metric errors aggregation expression.
Per-group metric errors aggregation expression. Collects all metric errors into an array column pear each group. The size of array is limited by maximum allowed error dump size parameter.
The main difference from one-pass DF-calculators is that error condition depends on the intermediate group aggregation result rather than on individual row result.
- errorDumpSize
Maximum allowed number of errors to be collected per single metric.
- keyFields
Sequence of source/stream key fields.
- colTypes
Map of column names to their datatype.
- returns
Spark expression that will yield array of metric errors.
- Definition Classes
- GroupingDFMetricCalculator
- Note
For streaming applications, we need to collect metric errors in per-window basis. Therefore, error row data has to contain window start time (first element of array).
-
val
groupErrorsCol: String
Name of the column that will store intermediate metric errors per each group.
Name of the column that will store intermediate metric errors per each group.
- Definition Classes
- GroupingDFMetricCalculator
-
def
groupResult(implicit colTypes: Map[String, DataType]): Column
Per-group intermediate metric aggregation expression that MUST yield double value.
Per-group intermediate metric aggregation expression that MUST yield double value.
- colTypes
Map of column names to their datatype.
- returns
Spark expression that will yield double metric calculator result
- Definition Classes
- GroupingDFMetricCalculator
-
val
groupResultCol: String
Name of the column that will store intermediate metric results per each group.
Name of the column that will store intermediate metric results per each group.
- Definition Classes
- GroupingDFMetricCalculator
- val increment: Long
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
val
metricId: String
Unlike RDD calculators, DF calculators are not groped by its type.
Unlike RDD calculators, DF calculators are not groped by its type. For each metric defined in DQ job, there will be created its own instance of DF calculator. Thus, DF metric calculators can be linked to metric definitions by metricId.
- Definition Classes
- SequenceCompletenessDFMetricCalculator → DFMetricCalculator
-
val
metricName: MetricName
- Definition Classes
- SequenceCompletenessDFMetricCalculator → DFMetricCalculator
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
def
result(implicit colTypes: Map[String, DataType]): Column
Final metric aggregation expression that MUST yield double value.
Final metric aggregation expression that MUST yield double value.
- colTypes
Map of column names to their datatype.
- returns
Spark expression that will yield double metric calculator result
- Definition Classes
- SequenceCompletenessDFMetricCalculator → GroupingDFMetricCalculator → DFMetricCalculator
-
val
resultAggregateFunction: (Column) ⇒ Column
Function that aggregates intermediate metric per-group results into final metric value.
Function that aggregates intermediate metric per-group results into final metric value. Default aggregation for grouping metrics it is just a summation.
- Attributes
- protected
- Definition Classes
- GroupingDFMetricCalculator → DFMetricCalculator
-
val
resultCol: String
Name of the column that will store metric result
Name of the column that will store metric result
- Definition Classes
- DFMetricCalculator
-
def
resultExpr(implicit colTypes: Map[String, DataType]): Column
Spark expression yielding numeric result for each row being processed per each group.
Spark expression yielding numeric result for each row being processed per each group. Metric will be incremented with this result per each group using associated per-group aggregation function.
Thus, for the purpose of finding number completeness of numerical sequence, we need to cast each value to a Long type as sequence completeness can be determined only for a sequence of natural numbers.
- colTypes
Map of column names to their datatype.
- returns
Spark row-level expression yielding numeric result.
- Attributes
- protected
- Definition Classes
- SequenceCompletenessDFMetricCalculator → DFMetricCalculator
- Note
Spark expression MUST process single row but not aggregate multiple rows.
-
def
rowDataExpr(keyFields: Seq[String]): Column
Row data collection expression: collects values of selected columns to array for row where metric error occurred.
Row data collection expression: collects values of selected columns to array for row where metric error occurred.
- keyFields
Sequence of source/stream key fields.
- returns
Spark expression that will yield array of row data for column related to this metric calculator.
- Attributes
- protected
- Definition Classes
- DFMetricCalculator
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()