package context
- Alphabetic
- Public
- All
Type Members
-
final
case class
DQBatchJob(jobConfig: JobConfig, sources: Seq[Source], metrics: Seq[RegularMetricConfig], composedMetrics: Seq[ComposedMetricConfig] = Seq.empty, trendMetrics: Seq[TrendMetricConfig] = Seq.empty, checks: Seq[CheckConfig] = Seq.empty, loadChecks: Seq[LoadCheckConfig] = Seq.empty, targets: Seq[TargetConfig] = Seq.empty, schemas: Map[String, SourceSchema] = Map.empty, connections: Map[String, DQConnection] = Map.empty, storageManager: Option[DqStorageManager] = None)(implicit jobId: String, settings: AppSettings, spark: SparkSession, fs: FileSystem) extends DQJob with Product with Serializable
Data Quality Batch Job: provides all required functionality to calculate quality metrics, perform checks, save results and send targets for static data sources.
Data Quality Batch Job: provides all required functionality to calculate quality metrics, perform checks, save results and send targets for static data sources.
- sources
Sequence of sources to process
- metrics
Sequence of metrics to calculate
- composedMetrics
Sequence of composed metrics to calculate
- trendMetrics
Sequence of trend metrics to calculate
- checks
Sequence of checks to perform
- loadChecks
Sequence of load checks to perform
- targets
Sequence of targets to send
- schemas
Map of user-defined schemas (used for load checks evaluation)
- connections
Map of connections to external systems (used to send targets)
- storageManager
Data Quality Storage manager (used to save results)
- jobId
Implicit job ID
- settings
Implicit application settings object
- spark
Implicit spark session object
- fs
Implicit hadoop file system object
-
class
DQContext extends Logging
Checkita Data Quality context.
Checkita Data Quality context. The main purpose of this context is to unify Data Quality job building and running API. Thus, various context builders are available depending on use case. Having a valid data quality context, job can be build, again, using various builders depending on user needs.
-
trait
DQJob extends Logging
Base trait defining basic functionality of Data Quality Job
-
final
case class
DQStreamJob(jobConfig: JobConfig, sources: Seq[Source], metrics: Seq[RegularMetricConfig], composedMetrics: Seq[ComposedMetricConfig] = Seq.empty, trendMetrics: Seq[TrendMetricConfig] = Seq.empty, loadChecks: Seq[LoadCheckConfig] = Seq.empty, checks: Seq[CheckConfig] = Seq.empty, targets: Seq[TargetConfig] = Seq.empty, schemas: Map[String, SourceSchema] = Map.empty, connections: Map[String, DQConnection] = Map.empty, storageManager: Option[DqStorageManager] = None, bufferCheckpoint: Option[ProcessorBuffer] = None)(implicit jobId: String, settings: AppSettings, spark: SparkSession, fs: FileSystem) extends Logging with Product with Serializable
Data Quality Streaming Job: provides all required functionality to calculate quality metrics, perform checks, save results and send targets for streaming data sources.
Data Quality Streaming Job: provides all required functionality to calculate quality metrics, perform checks, save results and send targets for streaming data sources.
- sources
Sequence of sources to process
- metrics
Sequence of metrics to calculate
- composedMetrics
Sequence of composed metrics to calculate
- trendMetrics
Sequence of trend metrics to calculate
- loadChecks
Sequence of load checks to perform
- checks
Sequence of checks to perform
- targets
Sequence of targets to send
- schemas
Map of user-defined schemas (used for load checks evaluation)
- connections
Map of connections to external systems (used to send targets)
- storageManager
Data Quality Storage manager (used to save results)
- bufferCheckpoint
Buffer state from last checkpoint for this job.
- jobId
Implicit job ID
- settings
Implicit application settings object
- spark
Implicit spark session object
- fs
Implicit hadoop file system object
-
final
case class
DQStreamWindowJob(jobConfig: JobConfig, settings: AppSettings, sources: Seq[Source], metrics: Seq[RegularMetricConfig], composedMetrics: Seq[ComposedMetricConfig] = Seq.empty, trendMetrics: Seq[TrendMetricConfig] = Seq.empty, checks: Seq[CheckConfig] = Seq.empty, loadChecks: Seq[LoadCheckConfig] = Seq.empty, targets: Seq[TargetConfig] = Seq.empty, schemas: Map[String, SourceSchema] = Map.empty, connections: Map[String, DQConnection] = Map.empty, storageManager: Option[DqStorageManager] = None)(implicit jobId: String, spark: SparkSession, fs: FileSystem, buffer: ProcessorBuffer, dumpSize: Int) extends Thread with DQJob with Product with Serializable
Data Quality Window Job: provides all required functionality to calculate quality metrics, perform checks, save results and send targets for windows of processed streams: runs in a separate thread and monitors streaming processor buffer for windows that are ready to be processed.
Data Quality Window Job: provides all required functionality to calculate quality metrics, perform checks, save results and send targets for windows of processed streams: runs in a separate thread and monitors streaming processor buffer for windows that are ready to be processed. Once such windows appear, processes them one by one in time-order. This job is started from within a data quality stream job.
- settings
Application settings object. Settings are passed explicitly as they will be updated for each window with actual execution and reference datetime.
- sources
Sequence of sources to process
- metrics
Sequence of metrics to calculate
- composedMetrics
Sequence of composed metrics to calculate
- trendMetrics
Sequence of trend metrics to calculate
- checks
Sequence of checks to perform
- loadChecks
Sequence of load checks to perform
- targets
Sequence of targets to send
- schemas
Map of user-defined schemas (used for load checks evaluation)
- connections
Map of connections to external systems (used to send targets)
- storageManager
Data Quality Storage manager (used to save results)
- jobId
Implicit job ID
- spark
Implicit spark session object
- fs
Implicit hadoop file system object
- dumpSize
Implicit value of maximum number of metric failures (or errors) to be collected (per metric). Used to prevent OOM errors.
-
sealed abstract
class
RunStage extends EnumEntry
Application execution stages.
Application execution stages. Used in log messages.