class DQContext extends Logging
Checkita Data Quality context. The main purpose of this context is to unify Data Quality job building and running API. Thus, various context builders are available depending on use case. Having a valid data quality context, job can be build, again, using various builders depending on user needs.
- Alphabetic
- By Inheritance
- DQContext
- Logging
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
-
new
DQContext(settings: AppSettings, spark: SparkSession, fs: FileSystem)
- settings
Application settings object.
- spark
Spark session object.
- fs
Hadoop filesystem object.
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
buildBatchJob(jobConfigs: Seq[String]): Result[DQBatchJob]
Build Data Quality batch job provided with sequence of paths to job configuration HOCON files.
Build Data Quality batch job provided with sequence of paths to job configuration HOCON files.
- jobConfigs
Path to a job-level configuration file (HOCON)
- returns
Data Quality job instance or a list of building errors.
- Note
HOCON format support configuration merging. Thus, it is also allowed to define different parts of job configuration in different files and merge them prior parsing. This will allow, for example, not to duplicate sections with common connections configurations or other sections that are common among several jobs.
-
def
buildBatchJob(jobConfig: JobConfig): Result[DQBatchJob]
Builds Data Quality batch job provided with job configuration.
Builds Data Quality batch job provided with job configuration.
- jobConfig
Job configuration
- returns
Data Quality job instance or a list of building errors.
- Note
Data Quality job creation out of job configuration assumes following steps:
- all configured connections are established;
- all schemas defined in job configuration are read;
- all sources (both regular and virtual ones) defined in job configuration are read;
- storage manager is initialized (if configured).
-
def
buildBatchJob(jobConfig: JobConfig, jobId: String, sources: Seq[Source], metrics: Seq[RegularMetricConfig] = Seq.empty, composedMetrics: Seq[ComposedMetricConfig] = Seq.empty, trendMetrics: Seq[TrendMetricConfig] = Seq.empty, checks: Seq[CheckConfig] = Seq.empty, loadChecks: Seq[LoadCheckConfig] = Seq.empty, targets: Seq[TargetConfig] = Seq.empty, schemas: Map[String, SourceSchema] = Map.empty, connections: Map[String, DQConnection] = Map.empty): Result[DQBatchJob]
Fundamental Data Quality batch job builder: builds batch job provided with all job components.
Fundamental Data Quality batch job builder: builds batch job provided with all job components.
- jobId
Job ID
- sources
Sequence of sources to check
- metrics
Sequence of regular metrics to calculate
- composedMetrics
Sequence of composed metrics to calculate on top of regular metrics results.
- trendMetrics
Sequence of trend metrics to calculate
- checks
Sequence of checks to preform based in metrics results.
- loadChecks
Sequence of load checks to perform directly over the sources (validate source metadata).
- targets
Sequence of targets to be send/saved (alternative channels to communicate DQ Job results).
- schemas
Sequence of user-defined schemas used primarily to perform loadChecks (i.e. source schema validation).
- connections
Sequence of user-defined connections to external data systems (RDBMS, Kafka, etc..). Connections are used primarily to send targets.
- returns
Data Quality batch job instances wrapped into Result[_].
-
def
buildStreamJob(jobConfigs: Seq[String]): Result[DQStreamJob]
Build Data Quality stream job provided with sequence of paths to job configuration HOCON files.
Build Data Quality stream job provided with sequence of paths to job configuration HOCON files.
- jobConfigs
Path to a job-level configuration file (HOCON)
- returns
Data Quality job instance or a list of building errors.
- Note
HOCON format support configuration merging. Thus, it is also allowed to define different parts of job configuration in different files and merge them prior parsing. This will allow, for example, not to duplicate sections with common connections configurations or other sections that are common among several jobs.
-
def
buildStreamJob(jobConfig: JobConfig): Result[DQStreamJob]
Builds Data Quality stream job provided with job configuration.
Builds Data Quality stream job provided with job configuration.
- jobConfig
Job configuration
- returns
Data Quality job instance or a list of building errors.
- Note
Data Quality job creation out of job configuration assumes following steps:
- all configured connections are established;
- all schemas defined in job configuration are read;
- all sources (both regular and virtual ones) defined in job configuration are read;
- storage manager is initialized (if configured).
-
def
buildStreamJob(jobConfig: JobConfig, jobId: String, sources: Seq[Source], metrics: Seq[RegularMetricConfig] = Seq.empty, composedMetrics: Seq[ComposedMetricConfig] = Seq.empty, trendMetrics: Seq[TrendMetricConfig] = Seq.empty, checks: Seq[CheckConfig] = Seq.empty, targets: Seq[TargetConfig] = Seq.empty, schemas: Map[String, SourceSchema] = Map.empty, connections: Map[String, DQConnection] = Map.empty): Result[DQStreamJob]
Fundamental Data Quality stream job builder: builds stream job provided with all job components.
Fundamental Data Quality stream job builder: builds stream job provided with all job components.
- jobId
Job ID
- sources
Sequence of sources to check (streamable sources)
- metrics
Sequence of regular metrics to calculate
- composedMetrics
Sequence of composed metrics to calculate on top of regular metrics results.
- trendMetrics
Sequence of trend metrics to calculate
- checks
Sequence of checks to preform based in metrics results.
- targets
Sequence of targets to be send/saved (alternative channels to communicate DQ Job results).
- schemas
Sequence of user-defined schemas used primarily to perform loadChecks (i.e. source schema validation).
- connections
Sequence of user-defined connections to external data systems (RDBMS, Kafka, etc..). Connections are used primarily to send targets.
- returns
Data Quality stream job instances wrapped into Result[_].
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
initLogger(lvl: Level): Unit
Initialises logger:
Initialises logger:
- gets log4j properties with following priority: resources directory -> working directory -> default settings
- updates root logger level with verbosity level defined at application start
- reconfigures Logger
- lvl
Root logger level defined at application start
- Definition Classes
- Logging
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
lazy val
log: Logger
- Definition Classes
- Logging
- Annotations
- @transient()
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
def
stop(): Result[Unit]
Stops this DQ context by stopping spark session if needed
Stops this DQ context by stopping spark session if needed
- returns
Nothing
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()