Packages

c

org.checkita.dqf.context

DQStreamWindowJob

final case class DQStreamWindowJob(jobConfig: JobConfig, settings: AppSettings, sources: Seq[Source], metrics: Seq[RegularMetricConfig], composedMetrics: Seq[ComposedMetricConfig] = Seq.empty, trendMetrics: Seq[TrendMetricConfig] = Seq.empty, checks: Seq[CheckConfig] = Seq.empty, loadChecks: Seq[LoadCheckConfig] = Seq.empty, targets: Seq[TargetConfig] = Seq.empty, schemas: Map[String, SourceSchema] = Map.empty, connections: Map[String, DQConnection] = Map.empty, storageManager: Option[DqStorageManager] = None)(implicit jobId: String, spark: SparkSession, fs: FileSystem, buffer: ProcessorBuffer, dumpSize: Int) extends Thread with DQJob with Product with Serializable

Data Quality Window Job: provides all required functionality to calculate quality metrics, perform checks, save results and send targets for windows of processed streams: runs in a separate thread and monitors streaming processor buffer for windows that are ready to be processed. Once such windows appear, processes them one by one in time-order. This job is started from within a data quality stream job.

settings

Application settings object. Settings are passed explicitly as they will be updated for each window with actual execution and reference datetime.

sources

Sequence of sources to process

metrics

Sequence of metrics to calculate

composedMetrics

Sequence of composed metrics to calculate

trendMetrics

Sequence of trend metrics to calculate

checks

Sequence of checks to perform

loadChecks

Sequence of load checks to perform

targets

Sequence of targets to send

schemas

Map of user-defined schemas (used for load checks evaluation)

connections

Map of connections to external systems (used to send targets)

storageManager

Data Quality Storage manager (used to save results)

jobId

Implicit job ID

spark

Implicit spark session object

fs

Implicit hadoop file system object

dumpSize

Implicit value of maximum number of metric failures (or errors) to be collected (per metric). Used to prevent OOM errors.

Linear Supertypes
Serializable, Serializable, Product, Equals, DQJob, Logging, Thread, Runnable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. DQStreamWindowJob
  2. Serializable
  3. Serializable
  4. Product
  5. Equals
  6. DQJob
  7. Logging
  8. Thread
  9. Runnable
  10. AnyRef
  11. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new DQStreamWindowJob(jobConfig: JobConfig, settings: AppSettings, sources: Seq[Source], metrics: Seq[RegularMetricConfig], composedMetrics: Seq[ComposedMetricConfig] = Seq.empty, trendMetrics: Seq[TrendMetricConfig] = Seq.empty, checks: Seq[CheckConfig] = Seq.empty, loadChecks: Seq[LoadCheckConfig] = Seq.empty, targets: Seq[TargetConfig] = Seq.empty, schemas: Map[String, SourceSchema] = Map.empty, connections: Map[String, DQConnection] = Map.empty, storageManager: Option[DqStorageManager] = None)(implicit jobId: String, spark: SparkSession, fs: FileSystem, buffer: ProcessorBuffer, dumpSize: Int)

    settings

    Application settings object. Settings are passed explicitly as they will be updated for each window with actual execution and reference datetime.

    sources

    Sequence of sources to process

    metrics

    Sequence of metrics to calculate

    composedMetrics

    Sequence of composed metrics to calculate

    trendMetrics

    Sequence of trend metrics to calculate

    checks

    Sequence of checks to perform

    loadChecks

    Sequence of load checks to perform

    targets

    Sequence of targets to send

    schemas

    Map of user-defined schemas (used for load checks evaluation)

    connections

    Map of connections to external systems (used to send targets)

    storageManager

    Data Quality Storage manager (used to save results)

    jobId

    Implicit job ID

    spark

    Implicit spark session object

    fs

    Implicit hadoop file system object

    dumpSize

    Implicit value of maximum number of metric failures (or errors) to be collected (per metric). Used to prevent OOM errors.

Type Members

  1. trait RegularMetricsProcessor extends AnyRef

    Metrics processed differently for batch and streaming job.

    Metrics processed differently for batch and streaming job. Therefore, to generalize metric processing API, we introduce this trait that common method to process all regular metrics.

    Attributes
    protected
    Definition Classes
    DQJob

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. implicit val buffer: ProcessorBuffer
  6. def calculateComposedMetrics(stage: String, regularMetricResults: Result[MetricResults]): Result[MetricResults]

    Calculates all composed metrics provided with regular metric results.

    Calculates all composed metrics provided with regular metric results.

    stage

    Stage indication used for logging.

    regularMetricResults

    Map of regular metric results

    returns

    Either a map of composed metric results or a list of calculation errors.

    Attributes
    protected
    Definition Classes
    DQJob
  7. def calculateTrendMetrics(stage: String)(implicit jobId: String, manager: Option[DqStorageManager], settings: AppSettings): Result[MetricResults]

    Calculates all trend metrics provided with regular and composed metric results.

    Calculates all trend metrics provided with regular and composed metric results.

    stage

    Stage indication used for logging.

    jobId

    Implicit current job ID.

    manager

    Implicit storage manager used to load historical results.

    settings

    Implicit application settings object.

    returns

    Either a map of trend metric results or a list of calculation errors.

    Attributes
    protected
    Definition Classes
    DQJob
  8. final def checkAccess(): Unit
    Definition Classes
    Thread
  9. val checks: Seq[CheckConfig]
    Definition Classes
    DQStreamWindowJobDQJob
  10. val checksStage: String
    Attributes
    protected
    Definition Classes
    DQJob
  11. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    Thread → AnyRef
    Annotations
    @throws( ... )
  12. def combineResults(stage: String, loadCheckResults: Seq[ResultCheckLoad], checkResults: Result[Seq[ResultCheck]], jobState: Result[JobState], regularMetricResults: Result[Seq[ResultMetricRegular]], composedMetricResults: Result[Seq[ResultMetricComposed]], trendMetricResults: Result[Seq[ResultMetricTrend]], metricErrors: Result[Seq[ResultMetricError]])(implicit settings: AppSettings): Result[ResultSet]

    Combines all results into a final result set.

    Combines all results into a final result set.

    stage

    Stage indication used for logging.

    loadCheckResults

    Sequence of load check results

    checkResults

    Sequence of check results (wrapped into Either)

    regularMetricResults

    Sequence of regular metric results (wrapped into Either)

    composedMetricResults

    Sequence of composed metric results (wrapped into Either)

    trendMetricResults

    Sequence of trend metrics results (wrapped into Either)

    metricErrors

    Sequence of metric errors (wrapped into Either)

    settings

    Implicit application settings object

    returns

    Combined results in form of ResultSet

    Attributes
    protected
    Definition Classes
    DQJob
  13. val composedMetrics: Seq[ComposedMetricConfig]
    Definition Classes
    DQStreamWindowJobDQJob
  14. val composedMetricsMap: Map[String, ComposedMetricConfig]
    Attributes
    protected
    Definition Classes
    DQJob
  15. val connections: Map[String, DQConnection]
    Definition Classes
    DQStreamWindowJobDQJob
  16. implicit val dumpSize: Int
  17. def encryptMetricErrors(errors: Seq[ResultMetricError])(implicit settings: AppSettings): Result[Seq[ResultMetricError]]

    Encrypts rowData field in metric errors if requested per application configuration.

    Encrypts rowData field in metric errors if requested per application configuration. Row data field in metric errors contains excerpt from data source and, therefore, these data can contain some sensitive information. In order to protect it, users can configure encryption chapter in application configuration and store encrypted rowData in DQ storage.

    errors

    Sequence of metric errors to encrypt

    settings

    Implicit application settings object

    returns

    Sequence of metric errors with encrypted rowData field (if requested per configuration) or a list of encryption errors.

    Attributes
    protected
    Definition Classes
    DQJob
    Note

    When metric errors are send via targets rowData field is never encrypted.

  18. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  19. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  20. def finalizeComposedMetrics(stage: String, metricResults: Result[MetricResults])(implicit settings: AppSettings): Result[Seq[ResultMetricComposed]]

    Finalizes composed metric results: selects only composed metrics results and converts them to final composed metric results representation ready for writing into storage DB or sending via targets.

    Finalizes composed metric results: selects only composed metrics results and converts them to final composed metric results representation ready for writing into storage DB or sending via targets.

    stage

    Stage indication used for logging.

    metricResults

    Map with all metric results

    settings

    Implicit application settings object

    returns

    Either a finalized sequence of composed metric results or a list of conversion errors.

    Attributes
    protected
    Definition Classes
    DQJob
  21. def finalizeJobState(jobConfig: JobConfig)(implicit settings: AppSettings, jobId: String): Result[JobState]

    Finalizes job state: select whether to encrypt or not and converts to the final representation ready for writing into storage DB or sending via targets.

    Finalizes job state: select whether to encrypt or not and converts to the final representation ready for writing into storage DB or sending via targets.

    jobConfig

    Parsed Data Quality job configuration

    settings

    Implicit application settings object

    jobId

    Current Job ID

    returns

    Either a finalized of job state.

    Attributes
    protected
    Definition Classes
    DQJob
  22. def finalizeMetricErrors(stage: String, metricResults: Result[MetricResults])(implicit settings: AppSettings): Result[Seq[ResultMetricError]]

    Finalizes metric errors: retrieves metrics errors from results and converts them to final metric errors representation ready for writing into storage DB or sending via targets.

    Finalizes metric errors: retrieves metrics errors from results and converts them to final metric errors representation ready for writing into storage DB or sending via targets.

    stage

    Stage indication used for logging.

    metricResults

    Map with all metric results

    settings

    Implicit application settings object

    returns

    Either a finalized sequence of metric errors or a list of conversion errors.

    Attributes
    protected
    Definition Classes
    DQJob
    Note

    There could be a situations when metric errors hold the same error data. We are not interested in sending repeating data neither to storage database nor to Targets. Therefore, sequence of metric errors is deduplicated by unique constraint.

  23. def finalizeRegularMetrics(stage: String, metricResults: Result[MetricResults])(implicit settings: AppSettings): Result[Seq[ResultMetricRegular]]

    Finalizes regular metric results: selects only regular metrics results and converts them to final regular metric results representation ready for writing into storage DB or sending via targets.

    Finalizes regular metric results: selects only regular metrics results and converts them to final regular metric results representation ready for writing into storage DB or sending via targets.

    stage

    Stage indication used for logging.

    metricResults

    Map with all metric results

    settings

    Implicit application settings object

    returns

    Either a finalized sequence of regular metric results or a list of conversion errors.

    Attributes
    protected
    Definition Classes
    DQJob
  24. def finalizeTrendMetrics(stage: String, metricResults: Result[MetricResults])(implicit settings: AppSettings): Result[Seq[ResultMetricTrend]]

    Finalizes trend metric results: selects only trend metrics results and converts them to final trend metric results representation ready for writing into storage DB or sending via targets.

    Finalizes trend metric results: selects only trend metrics results and converts them to final trend metric results representation ready for writing into storage DB or sending via targets.

    stage

    Stage indication used for logging.

    metricResults

    Map with all metric results

    settings

    Implicit application settings object

    returns

    Either a finalized sequence of trend metric results or a list of conversion errors.

    Attributes
    protected
    Definition Classes
    DQJob
  25. implicit val fs: FileSystem
    Definition Classes
    DQStreamWindowJobDQJob
  26. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  27. def getContextClassLoader(): ClassLoader
    Definition Classes
    Thread
    Annotations
    @CallerSensitive()
  28. def getId(): Long
    Definition Classes
    Thread
  29. final def getName(): String
    Definition Classes
    Thread
  30. final def getPriority(): Int
    Definition Classes
    Thread
  31. def getStackTrace(): Array[StackTraceElement]
    Definition Classes
    Thread
  32. def getState(): State
    Definition Classes
    Thread
  33. final def getThreadGroup(): ThreadGroup
    Definition Classes
    Thread
  34. def getUncaughtExceptionHandler(): UncaughtExceptionHandler
    Definition Classes
    Thread
  35. def initLogger(lvl: Level): Unit

    Initialises logger:

    Initialises logger:

    • gets log4j properties with following priority: resources directory -> working directory -> default settings
    • updates root logger level with verbosity level defined at application start
    • reconfigures Logger
    lvl

    Root logger level defined at application start

    Definition Classes
    Logging
  36. def interrupt(): Unit
    Definition Classes
    Thread
  37. final def isAlive(): Boolean
    Definition Classes
    Thread
    Annotations
    @native()
  38. final def isDaemon(): Boolean
    Definition Classes
    Thread
  39. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  40. def isInterrupted(): Boolean
    Definition Classes
    Thread
  41. val jobConfig: JobConfig
    Definition Classes
    DQStreamWindowJobDQJob
  42. implicit val jobId: String
    Definition Classes
    DQStreamWindowJobDQJob
  43. final def join(): Unit
    Definition Classes
    Thread
    Annotations
    @throws( ... )
  44. final def join(arg0: Long, arg1: Int): Unit
    Definition Classes
    Thread
    Annotations
    @throws( ... )
  45. final def join(arg0: Long): Unit
    Definition Classes
    Thread
    Annotations
    @throws( ... )
  46. val loadCheckStage: String
    Attributes
    protected
    Definition Classes
    DQJob
  47. val loadChecks: Seq[LoadCheckConfig]
    Definition Classes
    DQStreamWindowJobDQJob
  48. val loadChecksBySources: Map[String, Seq[LoadCheckConfig]]
    Attributes
    protected
    Definition Classes
    DQJob
  49. lazy val log: Logger
    Definition Classes
    Logging
    Annotations
    @transient()
  50. def logMetricResults(stage: String, metType: String, mr: MetricResults): Unit

    Logs metric calculation results.

    Logs metric calculation results. Used during metric processing, to immediately log their calculation status.

    stage

    Stage indication used for logging.

    metType

    Type of the metrics being calculated (either regular or composed)

    mr

    Map with metric results

    Attributes
    protected
    Definition Classes
    DQJob
  51. implicit val manager: Option[DqStorageManager]
    Definition Classes
    DQJob
  52. val metricStage: String
    Attributes
    protected
    Definition Classes
    DQJob
  53. val metrics: Seq[RegularMetricConfig]
    Definition Classes
    DQStreamWindowJobDQJob
  54. val metricsBySources: Map[String, Seq[RegularMetricConfig]]
    Attributes
    protected
    Definition Classes
    DQJob
  55. val metricsMap: Map[String, RegularMetricConfig]
    Attributes
    protected
    Definition Classes
    DQJob
  56. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  57. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  58. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  59. def performChecks(stage: String, metricResults: Result[MetricResults])(implicit settings: AppSettings): Result[Seq[ResultCheck]]

    Performs all checks

    Performs all checks

    stage

    Stage indication used for logging.

    metricResults

    Map with metric results (both regular and composed)

    settings

    Implicit application settings object

    returns

    Either sequence of check results or a list of check evaluation errors.

    Attributes
    protected
    Definition Classes
    DQJob
  60. def performLoadChecks(stage: String)(implicit settings: AppSettings): Seq[ResultCheckLoad]

    Performs all load checks

    Performs all load checks

    stage

    Stage indication used for logging.

    settings

    Implicit application settings object

    returns

    Sequence of load check results.

    Attributes
    protected
    Definition Classes
    DQJob
  61. def processAll(regularMetricsProcessor: RegularMetricsProcessor, stagePrefix: Option[String] = None)(implicit settings: AppSettings): Result[ResultSet]

    Top-level processing function: aggregates and runs all processing stages in required order.

    Top-level processing function: aggregates and runs all processing stages in required order.

    regularMetricsProcessor

    Regular metric processor used to calculate regular metric results.

    stagePrefix

    Prefix to stage names. Used for logging in streaming applications to indicate window for which results are processed.

    settings

    Implicit application settings object

    returns

    Either final results set or a list of processing errors

    Attributes
    protected
    Definition Classes
    DQJob
  62. def processTargets(stage: String, resultSet: Result[ResultSet])(implicit settings: AppSettings): Result[Unit]

    Processes all targets

    Processes all targets

    stage

    Stage indication used for logging.

    resultSet

    Final results set

    settings

    Implicit application settings object

    returns

    Either unit or a list of target processing errors.

    Attributes
    protected
    Definition Classes
    DQJob
  63. def run(): Unit

    Runs windows processing job in a separate thread

    Runs windows processing job in a separate thread

    Definition Classes
    DQStreamWindowJob → Thread → Runnable
  64. def runStorageMigration(stage: String)(implicit settings: AppSettings): Result[String]

    Runs database migration provided with storage manager.

    Runs database migration provided with storage manager.

    stage

    Stage indication used for logging.

    settings

    Implicit application settings object

    returns

    Nothing in case of successful migration or a list of migration errors.

    Attributes
    protected
    Definition Classes
    DQJob
  65. def saveResults(stage: String, resultSet: Result[ResultSet])(implicit settings: AppSettings): Result[String]

    Saves results into Data Quality storage

    Saves results into Data Quality storage

    stage

    Stage indication used for logging.

    resultSet

    Final results set

    returns

    Either a status string or a list of saving errors.

    Attributes
    protected
    Definition Classes
    DQJob
  66. val schemas: Map[String, SourceSchema]
    Definition Classes
    DQStreamWindowJobDQJob
  67. def setContextClassLoader(arg0: ClassLoader): Unit
    Definition Classes
    Thread
  68. final def setDaemon(arg0: Boolean): Unit
    Definition Classes
    Thread
  69. final def setName(arg0: String): Unit
    Definition Classes
    Thread
  70. final def setPriority(arg0: Int): Unit
    Definition Classes
    Thread
  71. def setUncaughtExceptionHandler(arg0: UncaughtExceptionHandler): Unit
    Definition Classes
    Thread
  72. val settings: AppSettings
  73. val sources: Seq[Source]
    Definition Classes
    DQStreamWindowJobDQJob
  74. implicit val spark: SparkSession
    Definition Classes
    DQStreamWindowJobDQJob
  75. def start(): Unit
    Definition Classes
    Thread
  76. val storageManager: Option[DqStorageManager]
    Definition Classes
    DQStreamWindowJobDQJob
  77. val storageStage: String
    Attributes
    protected
    Definition Classes
    DQJob
  78. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  79. val targets: Seq[TargetConfig]
    Definition Classes
    DQStreamWindowJobDQJob
  80. val targetsStage: String
    Attributes
    protected
    Definition Classes
    DQJob
  81. def toString(): String
    Definition Classes
    Thread → AnyRef → Any
  82. val trendMetrics: Seq[TrendMetricConfig]
    Definition Classes
    DQStreamWindowJobDQJob
  83. val trendMetricsMap: Map[String, TrendMetricConfig]
    Attributes
    protected
    Definition Classes
    DQJob
  84. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  85. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  86. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()

Deprecated Value Members

  1. def countStackFrames(): Int
    Definition Classes
    Thread
    Annotations
    @native() @Deprecated
    Deprecated
  2. def destroy(): Unit
    Definition Classes
    Thread
    Annotations
    @Deprecated
    Deprecated
  3. final def resume(): Unit
    Definition Classes
    Thread
    Annotations
    @Deprecated
    Deprecated
  4. final def stop(arg0: Throwable): Unit
    Definition Classes
    Thread
    Annotations
    @Deprecated
    Deprecated
  5. final def stop(): Unit
    Definition Classes
    Thread
    Annotations
    @Deprecated
    Deprecated
  6. final def suspend(): Unit
    Definition Classes
    Thread
    Annotations
    @Deprecated
    Deprecated

Inherited from Serializable

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from DQJob

Inherited from Logging

Inherited from Thread

Inherited from Runnable

Inherited from AnyRef

Inherited from Any

Ungrouped