final case class DQStreamJob(jobConfig: JobConfig, sources: Seq[Source], metrics: Seq[RegularMetricConfig], composedMetrics: Seq[ComposedMetricConfig] = Seq.empty, trendMetrics: Seq[TrendMetricConfig] = Seq.empty, loadChecks: Seq[LoadCheckConfig] = Seq.empty, checks: Seq[CheckConfig] = Seq.empty, targets: Seq[TargetConfig] = Seq.empty, schemas: Map[String, SourceSchema] = Map.empty, connections: Map[String, DQConnection] = Map.empty, storageManager: Option[DqStorageManager] = None, bufferCheckpoint: Option[ProcessorBuffer] = None)(implicit jobId: String, settings: AppSettings, spark: SparkSession, fs: FileSystem) extends Logging with Product with Serializable
Data Quality Streaming Job: provides all required functionality to calculate quality metrics, perform checks, save results and send targets for streaming data sources.
- sources
Sequence of sources to process
- metrics
Sequence of metrics to calculate
- composedMetrics
Sequence of composed metrics to calculate
- trendMetrics
Sequence of trend metrics to calculate
- loadChecks
Sequence of load checks to perform
- checks
Sequence of checks to perform
- targets
Sequence of targets to send
- schemas
Map of user-defined schemas (used for load checks evaluation)
- connections
Map of connections to external systems (used to send targets)
- storageManager
Data Quality Storage manager (used to save results)
- bufferCheckpoint
Buffer state from last checkpoint for this job.
- jobId
Implicit job ID
- settings
Implicit application settings object
- spark
Implicit spark session object
- fs
Implicit hadoop file system object
- Alphabetic
- By Inheritance
- DQStreamJob
- Serializable
- Serializable
- Product
- Equals
- Logging
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
-
new
DQStreamJob(jobConfig: JobConfig, sources: Seq[Source], metrics: Seq[RegularMetricConfig], composedMetrics: Seq[ComposedMetricConfig] = Seq.empty, trendMetrics: Seq[TrendMetricConfig] = Seq.empty, loadChecks: Seq[LoadCheckConfig] = Seq.empty, checks: Seq[CheckConfig] = Seq.empty, targets: Seq[TargetConfig] = Seq.empty, schemas: Map[String, SourceSchema] = Map.empty, connections: Map[String, DQConnection] = Map.empty, storageManager: Option[DqStorageManager] = None, bufferCheckpoint: Option[ProcessorBuffer] = None)(implicit jobId: String, settings: AppSettings, spark: SparkSession, fs: FileSystem)
- sources
Sequence of sources to process
- metrics
Sequence of metrics to calculate
- composedMetrics
Sequence of composed metrics to calculate
- trendMetrics
Sequence of trend metrics to calculate
- loadChecks
Sequence of load checks to perform
- checks
Sequence of checks to perform
- targets
Sequence of targets to send
- schemas
Map of user-defined schemas (used for load checks evaluation)
- connections
Map of connections to external systems (used to send targets)
- storageManager
Data Quality Storage manager (used to save results)
- bufferCheckpoint
Buffer state from last checkpoint for this job.
- jobId
Implicit job ID
- settings
Implicit application settings object
- spark
Implicit spark session object
- fs
Implicit hadoop file system object
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
- val bufferCheckpoint: Option[ProcessorBuffer]
- implicit val caseSensitive: Boolean
- val checks: Seq[CheckConfig]
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
- val composedMetrics: Seq[ComposedMetricConfig]
- val connections: Map[String, DQConnection]
- implicit val dumpSize: Int
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
initLogger(lvl: Level): Unit
Initialises logger:
Initialises logger:
- gets log4j properties with following priority: resources directory -> working directory -> default settings
- updates root logger level with verbosity level defined at application start
- reconfigures Logger
- lvl
Root logger level defined at application start
- Definition Classes
- Logging
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- val jobConfig: JobConfig
- val loadChecks: Seq[LoadCheckConfig]
-
lazy val
log: Logger
- Definition Classes
- Logging
- Annotations
- @transient()
-
def
logPreMsg(): Unit
- Attributes
- protected
- val metrics: Seq[RegularMetricConfig]
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
def
run(): Result[Unit]
Runs Data Quality job.
Runs Data Quality job. Job include following stages (some of them can be omitted depending on the configuration):
- processing load checks
- calculating regular metrics
- calculating composed composed metrics
- processing checks
- saving results
- sending/saving targets
- returns
Either a set of job results or a list of errors that occurred during job run.
- val schemas: Map[String, SourceSchema]
- val sources: Seq[Source]
- val storageManager: Option[DqStorageManager]
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
- val targets: Seq[TargetConfig]
- val trendMetrics: Seq[TrendMetricConfig]
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()