DQContext

Companion object DQContext

class DQContext extends Logging

Checkita Data Quality context. The main purpose of this context is to unify Data Quality job building and running API. Thus, various context builders are available depending on use case. Having a valid data quality context, job can be build, again, using various builders depending on user needs.

Linear Supertypes

Logging, AnyRef, Any

Ordering

Alphabetic
By Inheritance

Inherited

DQContext
Logging
AnyRef
Any

Hide All
Show All

Visibility

Public
All

Instance Constructors

new DQContext(settings: AppSettings, spark: SparkSession, fs: FileSystem)
settings
Application settings object.
spark
Spark session object.
fs
Hadoop filesystem object.

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def asInstanceOf[T0]: T0

Definition Classes
Any
def buildBatchJob(jobConfigs: Seq[String]): Result[DQBatchJob]
Build Data Quality batch job provided with sequence of paths to job configuration HOCON files.
Build Data Quality batch job provided with sequence of paths to job configuration HOCON files.
jobConfigs
Path to a job-level configuration file (HOCON)
returns
Data Quality job instance or a list of building errors.

Note
HOCON format support configuration merging. Thus, it is also allowed to define different parts of job configuration in different files and merge them prior parsing. This will allow, for example, not to duplicate sections with common connections configurations or other sections that are common among several jobs.
def buildBatchJob(jobConfig: JobConfig): Result[DQBatchJob]
Builds Data Quality batch job provided with job configuration.
Builds Data Quality batch job provided with job configuration.
jobConfig
Job configuration
returns
Data Quality job instance or a list of building errors.
Note
Data Quality job creation out of job configuration assumes following steps:
all configured connections are established;
all schemas defined in job configuration are read;
all sources (both regular and virtual ones) defined in job configuration are read;
storage manager is initialized (if configured).
def buildBatchJob(jobConfig: JobConfig, jobId: String, sources: Seq[Source], metrics: Seq[RegularMetricConfig] = Seq.empty, composedMetrics: Seq[ComposedMetricConfig] = Seq.empty, trendMetrics: Seq[TrendMetricConfig] = Seq.empty, checks: Seq[CheckConfig] = Seq.empty, loadChecks: Seq[LoadCheckConfig] = Seq.empty, targets: Seq[TargetConfig] = Seq.empty, schemas: Map[String, SourceSchema] = Map.empty, connections: Map[String, DQConnection] = Map.empty): Result[DQBatchJob]
Fundamental Data Quality batch job builder: builds batch job provided with all job components.
Fundamental Data Quality batch job builder: builds batch job provided with all job components.
jobId
Job ID
sources
Sequence of sources to check
metrics
Sequence of regular metrics to calculate
composedMetrics
Sequence of composed metrics to calculate on top of regular metrics results.
trendMetrics
Sequence of trend metrics to calculate
checks
Sequence of checks to preform based in metrics results.
loadChecks
Sequence of load checks to perform directly over the sources (validate source metadata).
targets
Sequence of targets to be send/saved (alternative channels to communicate DQ Job results).
schemas
Sequence of user-defined schemas used primarily to perform loadChecks (i.e. source schema validation).
connections
Sequence of user-defined connections to external data systems (RDBMS, Kafka, etc..). Connections are used primarily to send targets.
returns
Data Quality batch job instances wrapped into Result[_].
def buildStreamJob(jobConfigs: Seq[String]): Result[DQStreamJob]
Build Data Quality stream job provided with sequence of paths to job configuration HOCON files.
Build Data Quality stream job provided with sequence of paths to job configuration HOCON files.
jobConfigs
Path to a job-level configuration file (HOCON)
returns
Data Quality job instance or a list of building errors.

Note
HOCON format support configuration merging. Thus, it is also allowed to define different parts of job configuration in different files and merge them prior parsing. This will allow, for example, not to duplicate sections with common connections configurations or other sections that are common among several jobs.
def buildStreamJob(jobConfig: JobConfig): Result[DQStreamJob]
Builds Data Quality stream job provided with job configuration.
Builds Data Quality stream job provided with job configuration.
jobConfig
Job configuration
returns
Data Quality job instance or a list of building errors.
Note
Data Quality job creation out of job configuration assumes following steps:
all configured connections are established;
all schemas defined in job configuration are read;
all sources (both regular and virtual ones) defined in job configuration are read;
storage manager is initialized (if configured).
def buildStreamJob(jobConfig: JobConfig, jobId: String, sources: Seq[Source], metrics: Seq[RegularMetricConfig] = Seq.empty, composedMetrics: Seq[ComposedMetricConfig] = Seq.empty, trendMetrics: Seq[TrendMetricConfig] = Seq.empty, checks: Seq[CheckConfig] = Seq.empty, targets: Seq[TargetConfig] = Seq.empty, schemas: Map[String, SourceSchema] = Map.empty, connections: Map[String, DQConnection] = Map.empty): Result[DQStreamJob]
Fundamental Data Quality stream job builder: builds stream job provided with all job components.
Fundamental Data Quality stream job builder: builds stream job provided with all job components.
jobId
Job ID
sources
Sequence of sources to check (streamable sources)
metrics
Sequence of regular metrics to calculate
composedMetrics
Sequence of composed metrics to calculate on top of regular metrics results.
trendMetrics
Sequence of trend metrics to calculate
checks
Sequence of checks to preform based in metrics results.
targets
Sequence of targets to be send/saved (alternative channels to communicate DQ Job results).
schemas
Sequence of user-defined schemas used primarily to perform loadChecks (i.e. source schema validation).
connections
Sequence of user-defined connections to external data systems (RDBMS, Kafka, etc..). Connections are used primarily to send targets.
returns
Data Quality stream job instances wrapped into Result[_].
def clone(): AnyRef

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( ... ) @native()
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def finalize(): Unit

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
Annotations
@native()
def hashCode(): Int

Definition Classes
AnyRef → Any
Annotations
@native()
def initLogger(lvl: Level): Unit
Initialises logger:
Initialises logger:
- gets log4j properties with following priority: resources directory -> working directory -> default settings
- updates root logger level with verbosity level defined at application start
- reconfigures Logger
lvl
Root logger level defined at application start

Definition Classes
Logging
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
lazy val log: Logger

Definition Classes
Logging
Annotations
@transient()
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
Annotations
@native()
final def notifyAll(): Unit

Definition Classes
AnyRef
Annotations
@native()
def stop(): Result[Unit]
Stops this DQ context by stopping spark session if needed
Stops this DQ context by stopping spark session if needed
returns
Nothing
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
AnyRef → Any
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... ) @native()

Packages

DQContext

Companion object DQContext

class DQContext extends Logging

Instance Constructors

Value Members

Inherited from Logging

Inherited from AnyRef

Inherited from Any

Ungrouped

Packages

DQContext 

Companion object DQContext

class DQContext extends Logging

Instance Constructors

Value Members

Inherited from Logging

Inherited from AnyRef

Inherited from Any

Ungrouped

DQContext