Packages

c

org.checkita.dqf.core.metrics.df.regular.GroupingDFMetrics

SequenceCompletenessDFMetricCalculator

case class SequenceCompletenessDFMetricCalculator(metricId: String, columns: Seq[String], increment: Long) extends GroupingDFMetricCalculator with Product with Serializable

Calculates completeness of incremental integer (long) sequence, i.e. checks if sequence does not have missing elements.

Works for single column only!

metricId

Id of the metric.

columns

Sequence of columns which are used for metric calculation

Note

If exact result is not mandatory, then it's better to use HyperLogLog-based metric calculator called "APPROXIMATE_SEQUENCE_COMPLETENESS".

Linear Supertypes
Serializable, Serializable, Product, Equals, GroupingDFMetricCalculator, DFMetricCalculator, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. SequenceCompletenessDFMetricCalculator
  2. Serializable
  3. Serializable
  4. Product
  5. Equals
  6. GroupingDFMetricCalculator
  7. DFMetricCalculator
  8. AnyRef
  9. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new SequenceCompletenessDFMetricCalculator(metricId: String, columns: Seq[String], increment: Long)

    metricId

    Id of the metric.

    columns

    Sequence of columns which are used for metric calculation

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  6. val columns: Seq[String]
  7. val emptyValue: Column

    Value which is returned when metric result is null.

    Value which is returned when metric result is null. As all grouping metric calculators use summation during aggregation of intermediate per-group results then they should return zero when applied to empty sequence.

    Attributes
    protected
    Definition Classes
    GroupingDFMetricCalculatorDFMetricCalculator
  8. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  9. def errorConditionExpr(implicit colTypes: Map[String, DataType]): Column

    Collect error data for groups where at least on of the column values is null.

    Collect error data for groups where at least on of the column values is null.

    colTypes

    Map of column names to their datatype.

    returns

    Spark row-level expression yielding boolean result.

    Attributes
    protected
    Definition Classes
    SequenceCompletenessDFMetricCalculatorDFMetricCalculator
  10. def errorExpr(rowData: Column)(implicit colTypes: Map[String, DataType]): Column

    Grouping calculators will have different API for collecting per-group arrays with error data.

    Grouping calculators will have different API for collecting per-group arrays with error data. Therefore, an exception will be throw if this method is called.

    rowData

    Array of row data from columns related to this metric calculator (source keyFields + metric columns + window start time column for streaming applications)

    colTypes

    Map of column names to their datatype.

    returns

    Spark expression that will yield row data in case of metric error.

    Attributes
    protected
    Definition Classes
    GroupingDFMetricCalculatorDFMetricCalculator
  11. def errorMessage: String

    Error message that will be returned when metric increment fails.

    Error message that will be returned when metric increment fails.

    The error data is collected for rows where some of the columns values cannot be cast to number (Long).

    returns

    Metric increment failure message.

    Definition Classes
    SequenceCompletenessDFMetricCalculatorDFMetricCalculator
  12. def errors(implicit errorDumpSize: Int, keyFields: Seq[String], colTypes: Map[String, DataType]): Column

    Final metric errors aggregation expression.

    Final metric errors aggregation expression. Merges all per-group metric errors into final array of error data. The size of array is limited by maximum allowed error dump size parameter.

    errorDumpSize

    Maximum allowed number of errors to be collected per single metric.

    keyFields

    Sequence of source/stream key fields.

    colTypes

    Map of column names to their datatype.

    returns

    Spark expression that will yield array of metric errors.

    Definition Classes
    GroupingDFMetricCalculatorDFMetricCalculator
  13. val errorsCol: String

    Name of the column that will store metric errors

    Name of the column that will store metric errors

    Definition Classes
    DFMetricCalculator
  14. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  15. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  16. val groupAggregationFunction: (Column) ⇒ Column

    Function that aggregates metric increments into intermediate per-group metric results.

    Function that aggregates metric increments into intermediate per-group metric results. Accepts spark expression groupResultExpr as input and returns another spark expression that will yield aggregated double metric result per each group.

    We will consider only those groups whose value is a natural number of Long type. Thus for other groups (where casting to Long type yields null) the zero value is assigned.

    Attributes
    protected
    Definition Classes
    SequenceCompletenessDFMetricCalculatorGroupingDFMetricCalculator
  17. def groupErrorExpr(rowData: Column, errorDumpSize: Int): Column

    Per-group error collection expression: collects row data for ENTIRE group in case of metric error.

    Per-group error collection expression: collects row data for ENTIRE group in case of metric error. The size of array is limited by maximum allowed error dump size parameter.

    rowData

    Array of row data from columns related to this metric calculator for current group. (source keyFields + metric columns + window start time column for streaming applications)

    errorDumpSize

    Maximum allowed number of errors to be collected per single metric.

    returns

    Spark expression that will yield array of row data per group in case of metric error.

    Attributes
    protected
    Definition Classes
    SequenceCompletenessDFMetricCalculatorGroupingDFMetricCalculator
  18. def groupErrors(implicit errorDumpSize: Int, keyFields: Seq[String], colTypes: Map[String, DataType]): Column

    Per-group metric errors aggregation expression.

    Per-group metric errors aggregation expression. Collects all metric errors into an array column pear each group. The size of array is limited by maximum allowed error dump size parameter.

    The main difference from one-pass DF-calculators is that error condition depends on the intermediate group aggregation result rather than on individual row result.

    errorDumpSize

    Maximum allowed number of errors to be collected per single metric.

    keyFields

    Sequence of source/stream key fields.

    colTypes

    Map of column names to their datatype.

    returns

    Spark expression that will yield array of metric errors.

    Definition Classes
    GroupingDFMetricCalculator
    Note

    For streaming applications, we need to collect metric errors in per-window basis. Therefore, error row data has to contain window start time (first element of array).

  19. val groupErrorsCol: String

    Name of the column that will store intermediate metric errors per each group.

    Name of the column that will store intermediate metric errors per each group.

    Definition Classes
    GroupingDFMetricCalculator
  20. def groupResult(implicit colTypes: Map[String, DataType]): Column

    Per-group intermediate metric aggregation expression that MUST yield double value.

    Per-group intermediate metric aggregation expression that MUST yield double value.

    colTypes

    Map of column names to their datatype.

    returns

    Spark expression that will yield double metric calculator result

    Definition Classes
    GroupingDFMetricCalculator
  21. val groupResultCol: String

    Name of the column that will store intermediate metric results per each group.

    Name of the column that will store intermediate metric results per each group.

    Definition Classes
    GroupingDFMetricCalculator
  22. val increment: Long
  23. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  24. val metricId: String

    Unlike RDD calculators, DF calculators are not groped by its type.

    Unlike RDD calculators, DF calculators are not groped by its type. For each metric defined in DQ job, there will be created its own instance of DF calculator. Thus, DF metric calculators can be linked to metric definitions by metricId.

    Definition Classes
    SequenceCompletenessDFMetricCalculatorDFMetricCalculator
  25. val metricName: MetricName
  26. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  27. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  28. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  29. def result(implicit colTypes: Map[String, DataType]): Column

    Final metric aggregation expression that MUST yield double value.

    Final metric aggregation expression that MUST yield double value.

    colTypes

    Map of column names to their datatype.

    returns

    Spark expression that will yield double metric calculator result

    Definition Classes
    SequenceCompletenessDFMetricCalculatorGroupingDFMetricCalculatorDFMetricCalculator
  30. val resultAggregateFunction: (Column) ⇒ Column

    Function that aggregates intermediate metric per-group results into final metric value.

    Function that aggregates intermediate metric per-group results into final metric value. Default aggregation for grouping metrics it is just a summation.

    Attributes
    protected
    Definition Classes
    GroupingDFMetricCalculatorDFMetricCalculator
  31. val resultCol: String

    Name of the column that will store metric result

    Name of the column that will store metric result

    Definition Classes
    DFMetricCalculator
  32. def resultExpr(implicit colTypes: Map[String, DataType]): Column

    Spark expression yielding numeric result for each row being processed per each group.

    Spark expression yielding numeric result for each row being processed per each group. Metric will be incremented with this result per each group using associated per-group aggregation function.

    Thus, for the purpose of finding number completeness of numerical sequence, we need to cast each value to a Long type as sequence completeness can be determined only for a sequence of natural numbers.

    colTypes

    Map of column names to their datatype.

    returns

    Spark row-level expression yielding numeric result.

    Attributes
    protected
    Definition Classes
    SequenceCompletenessDFMetricCalculatorDFMetricCalculator
    Note

    Spark expression MUST process single row but not aggregate multiple rows.

  33. def rowDataExpr(keyFields: Seq[String]): Column

    Row data collection expression: collects values of selected columns to array for row where metric error occurred.

    Row data collection expression: collects values of selected columns to array for row where metric error occurred.

    keyFields

    Sequence of source/stream key fields.

    returns

    Spark expression that will yield array of row data for column related to this metric calculator.

    Attributes
    protected
    Definition Classes
    DFMetricCalculator
  34. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  35. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  36. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  37. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()

Inherited from Serializable

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from DFMetricCalculator

Inherited from AnyRef

Inherited from Any

Ungrouped