Packages

p

org.checkita.dqf

context

package context

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. final case class DQBatchJob(jobConfig: JobConfig, sources: Seq[Source], metrics: Seq[RegularMetricConfig], composedMetrics: Seq[ComposedMetricConfig] = Seq.empty, trendMetrics: Seq[TrendMetricConfig] = Seq.empty, checks: Seq[CheckConfig] = Seq.empty, loadChecks: Seq[LoadCheckConfig] = Seq.empty, targets: Seq[TargetConfig] = Seq.empty, schemas: Map[String, SourceSchema] = Map.empty, connections: Map[String, DQConnection] = Map.empty, storageManager: Option[DqStorageManager] = None)(implicit jobId: String, settings: AppSettings, spark: SparkSession, fs: FileSystem) extends DQJob with Product with Serializable

    Data Quality Batch Job: provides all required functionality to calculate quality metrics, perform checks, save results and send targets for static data sources.

    Data Quality Batch Job: provides all required functionality to calculate quality metrics, perform checks, save results and send targets for static data sources.

    sources

    Sequence of sources to process

    metrics

    Sequence of metrics to calculate

    composedMetrics

    Sequence of composed metrics to calculate

    trendMetrics

    Sequence of trend metrics to calculate

    checks

    Sequence of checks to perform

    loadChecks

    Sequence of load checks to perform

    targets

    Sequence of targets to send

    schemas

    Map of user-defined schemas (used for load checks evaluation)

    connections

    Map of connections to external systems (used to send targets)

    storageManager

    Data Quality Storage manager (used to save results)

    jobId

    Implicit job ID

    settings

    Implicit application settings object

    spark

    Implicit spark session object

    fs

    Implicit hadoop file system object

  2. class DQContext extends Logging

    Checkita Data Quality context.

    Checkita Data Quality context. The main purpose of this context is to unify Data Quality job building and running API. Thus, various context builders are available depending on use case. Having a valid data quality context, job can be build, again, using various builders depending on user needs.

  3. trait DQJob extends Logging

    Base trait defining basic functionality of Data Quality Job

  4. final case class DQStreamJob(jobConfig: JobConfig, sources: Seq[Source], metrics: Seq[RegularMetricConfig], composedMetrics: Seq[ComposedMetricConfig] = Seq.empty, trendMetrics: Seq[TrendMetricConfig] = Seq.empty, loadChecks: Seq[LoadCheckConfig] = Seq.empty, checks: Seq[CheckConfig] = Seq.empty, targets: Seq[TargetConfig] = Seq.empty, schemas: Map[String, SourceSchema] = Map.empty, connections: Map[String, DQConnection] = Map.empty, storageManager: Option[DqStorageManager] = None, bufferCheckpoint: Option[ProcessorBuffer] = None)(implicit jobId: String, settings: AppSettings, spark: SparkSession, fs: FileSystem) extends Logging with Product with Serializable

    Data Quality Streaming Job: provides all required functionality to calculate quality metrics, perform checks, save results and send targets for streaming data sources.

    Data Quality Streaming Job: provides all required functionality to calculate quality metrics, perform checks, save results and send targets for streaming data sources.

    sources

    Sequence of sources to process

    metrics

    Sequence of metrics to calculate

    composedMetrics

    Sequence of composed metrics to calculate

    trendMetrics

    Sequence of trend metrics to calculate

    loadChecks

    Sequence of load checks to perform

    checks

    Sequence of checks to perform

    targets

    Sequence of targets to send

    schemas

    Map of user-defined schemas (used for load checks evaluation)

    connections

    Map of connections to external systems (used to send targets)

    storageManager

    Data Quality Storage manager (used to save results)

    bufferCheckpoint

    Buffer state from last checkpoint for this job.

    jobId

    Implicit job ID

    settings

    Implicit application settings object

    spark

    Implicit spark session object

    fs

    Implicit hadoop file system object

  5. final case class DQStreamWindowJob(jobConfig: JobConfig, settings: AppSettings, sources: Seq[Source], metrics: Seq[RegularMetricConfig], composedMetrics: Seq[ComposedMetricConfig] = Seq.empty, trendMetrics: Seq[TrendMetricConfig] = Seq.empty, checks: Seq[CheckConfig] = Seq.empty, loadChecks: Seq[LoadCheckConfig] = Seq.empty, targets: Seq[TargetConfig] = Seq.empty, schemas: Map[String, SourceSchema] = Map.empty, connections: Map[String, DQConnection] = Map.empty, storageManager: Option[DqStorageManager] = None)(implicit jobId: String, spark: SparkSession, fs: FileSystem, buffer: ProcessorBuffer, dumpSize: Int) extends Thread with DQJob with Product with Serializable

    Data Quality Window Job: provides all required functionality to calculate quality metrics, perform checks, save results and send targets for windows of processed streams: runs in a separate thread and monitors streaming processor buffer for windows that are ready to be processed.

    Data Quality Window Job: provides all required functionality to calculate quality metrics, perform checks, save results and send targets for windows of processed streams: runs in a separate thread and monitors streaming processor buffer for windows that are ready to be processed. Once such windows appear, processes them one by one in time-order. This job is started from within a data quality stream job.

    settings

    Application settings object. Settings are passed explicitly as they will be updated for each window with actual execution and reference datetime.

    sources

    Sequence of sources to process

    metrics

    Sequence of metrics to calculate

    composedMetrics

    Sequence of composed metrics to calculate

    trendMetrics

    Sequence of trend metrics to calculate

    checks

    Sequence of checks to perform

    loadChecks

    Sequence of load checks to perform

    targets

    Sequence of targets to send

    schemas

    Map of user-defined schemas (used for load checks evaluation)

    connections

    Map of connections to external systems (used to send targets)

    storageManager

    Data Quality Storage manager (used to save results)

    jobId

    Implicit job ID

    spark

    Implicit spark session object

    fs

    Implicit hadoop file system object

    dumpSize

    Implicit value of maximum number of metric failures (or errors) to be collected (per metric). Used to prevent OOM errors.

  6. sealed abstract class RunStage extends EnumEntry

    Application execution stages.

    Application execution stages. Used in log messages.

Value Members

  1. object DQContext
  2. object RunStage extends Enum[RunStage]

Ungrouped