Packages

class DQContext extends Logging

Checkita Data Quality context. The main purpose of this context is to unify Data Quality job building and running API. Thus, various context builders are available depending on use case. Having a valid data quality context, job can be build, again, using various builders depending on user needs.

Linear Supertypes
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. DQContext
  2. Logging
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new DQContext(settings: AppSettings, spark: SparkSession, fs: FileSystem)

    settings

    Application settings object.

    spark

    Spark session object.

    fs

    Hadoop filesystem object.

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def buildBatchJob(jobConfigs: Seq[String]): Result[DQBatchJob]

    Build Data Quality batch job provided with sequence of paths to job configuration HOCON files.

    Build Data Quality batch job provided with sequence of paths to job configuration HOCON files.

    jobConfigs

    Path to a job-level configuration file (HOCON)

    returns

    Data Quality job instance or a list of building errors.

    Note

    HOCON format support configuration merging. Thus, it is also allowed to define different parts of job configuration in different files and merge them prior parsing. This will allow, for example, not to duplicate sections with common connections configurations or other sections that are common among several jobs.

  6. def buildBatchJob(jobConfig: JobConfig): Result[DQBatchJob]

    Builds Data Quality batch job provided with job configuration.

    Builds Data Quality batch job provided with job configuration.

    jobConfig

    Job configuration

    returns

    Data Quality job instance or a list of building errors.

    Note

    Data Quality job creation out of job configuration assumes following steps:

    • all configured connections are established;
    • all schemas defined in job configuration are read;
    • all sources (both regular and virtual ones) defined in job configuration are read;
    • storage manager is initialized (if configured).
  7. def buildBatchJob(jobConfig: JobConfig, jobId: String, sources: Seq[Source], metrics: Seq[RegularMetricConfig] = Seq.empty, composedMetrics: Seq[ComposedMetricConfig] = Seq.empty, trendMetrics: Seq[TrendMetricConfig] = Seq.empty, checks: Seq[CheckConfig] = Seq.empty, loadChecks: Seq[LoadCheckConfig] = Seq.empty, targets: Seq[TargetConfig] = Seq.empty, schemas: Map[String, SourceSchema] = Map.empty, connections: Map[String, DQConnection] = Map.empty): Result[DQBatchJob]

    Fundamental Data Quality batch job builder: builds batch job provided with all job components.

    Fundamental Data Quality batch job builder: builds batch job provided with all job components.

    jobId

    Job ID

    sources

    Sequence of sources to check

    metrics

    Sequence of regular metrics to calculate

    composedMetrics

    Sequence of composed metrics to calculate on top of regular metrics results.

    trendMetrics

    Sequence of trend metrics to calculate

    checks

    Sequence of checks to preform based in metrics results.

    loadChecks

    Sequence of load checks to perform directly over the sources (validate source metadata).

    targets

    Sequence of targets to be send/saved (alternative channels to communicate DQ Job results).

    schemas

    Sequence of user-defined schemas used primarily to perform loadChecks (i.e. source schema validation).

    connections

    Sequence of user-defined connections to external data systems (RDBMS, Kafka, etc..). Connections are used primarily to send targets.

    returns

    Data Quality batch job instances wrapped into Result[_].

  8. def buildStreamJob(jobConfigs: Seq[String]): Result[DQStreamJob]

    Build Data Quality stream job provided with sequence of paths to job configuration HOCON files.

    Build Data Quality stream job provided with sequence of paths to job configuration HOCON files.

    jobConfigs

    Path to a job-level configuration file (HOCON)

    returns

    Data Quality job instance or a list of building errors.

    Note

    HOCON format support configuration merging. Thus, it is also allowed to define different parts of job configuration in different files and merge them prior parsing. This will allow, for example, not to duplicate sections with common connections configurations or other sections that are common among several jobs.

  9. def buildStreamJob(jobConfig: JobConfig): Result[DQStreamJob]

    Builds Data Quality stream job provided with job configuration.

    Builds Data Quality stream job provided with job configuration.

    jobConfig

    Job configuration

    returns

    Data Quality job instance or a list of building errors.

    Note

    Data Quality job creation out of job configuration assumes following steps:

    • all configured connections are established;
    • all schemas defined in job configuration are read;
    • all sources (both regular and virtual ones) defined in job configuration are read;
    • storage manager is initialized (if configured).
  10. def buildStreamJob(jobConfig: JobConfig, jobId: String, sources: Seq[Source], metrics: Seq[RegularMetricConfig] = Seq.empty, composedMetrics: Seq[ComposedMetricConfig] = Seq.empty, trendMetrics: Seq[TrendMetricConfig] = Seq.empty, checks: Seq[CheckConfig] = Seq.empty, targets: Seq[TargetConfig] = Seq.empty, schemas: Map[String, SourceSchema] = Map.empty, connections: Map[String, DQConnection] = Map.empty): Result[DQStreamJob]

    Fundamental Data Quality stream job builder: builds stream job provided with all job components.

    Fundamental Data Quality stream job builder: builds stream job provided with all job components.

    jobId

    Job ID

    sources

    Sequence of sources to check (streamable sources)

    metrics

    Sequence of regular metrics to calculate

    composedMetrics

    Sequence of composed metrics to calculate on top of regular metrics results.

    trendMetrics

    Sequence of trend metrics to calculate

    checks

    Sequence of checks to preform based in metrics results.

    targets

    Sequence of targets to be send/saved (alternative channels to communicate DQ Job results).

    schemas

    Sequence of user-defined schemas used primarily to perform loadChecks (i.e. source schema validation).

    connections

    Sequence of user-defined connections to external data systems (RDBMS, Kafka, etc..). Connections are used primarily to send targets.

    returns

    Data Quality stream job instances wrapped into Result[_].

  11. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  12. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  13. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  14. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  15. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  16. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  17. def initLogger(lvl: Level): Unit

    Initialises logger:

    Initialises logger:

    • gets log4j properties with following priority: resources directory -> working directory -> default settings
    • updates root logger level with verbosity level defined at application start
    • reconfigures Logger
    lvl

    Root logger level defined at application start

    Definition Classes
    Logging
  18. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  19. lazy val log: Logger
    Definition Classes
    Logging
    Annotations
    @transient()
  20. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  21. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  22. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  23. def stop(): Result[Unit]

    Stops this DQ context by stopping spark session if needed

    Stops this DQ context by stopping spark session if needed

    returns

    Nothing

  24. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  25. def toString(): String
    Definition Classes
    AnyRef → Any
  26. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  27. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  28. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()

Inherited from Logging

Inherited from AnyRef

Inherited from Any

Ungrouped