final case class DQBatchJob(jobConfig: JobConfig, sources: Seq[Source], metrics: Seq[RegularMetricConfig], composedMetrics: Seq[ComposedMetricConfig] = Seq.empty, trendMetrics: Seq[TrendMetricConfig] = Seq.empty, checks: Seq[CheckConfig] = Seq.empty, loadChecks: Seq[LoadCheckConfig] = Seq.empty, targets: Seq[TargetConfig] = Seq.empty, schemas: Map[String, SourceSchema] = Map.empty, connections: Map[String, DQConnection] = Map.empty, storageManager: Option[DqStorageManager] = None)(implicit jobId: String, settings: AppSettings, spark: SparkSession, fs: FileSystem) extends DQJob with Product with Serializable
Data Quality Batch Job: provides all required functionality to calculate quality metrics, perform checks, save results and send targets for static data sources.
- sources
Sequence of sources to process
- metrics
Sequence of metrics to calculate
- composedMetrics
Sequence of composed metrics to calculate
- trendMetrics
Sequence of trend metrics to calculate
- checks
Sequence of checks to perform
- loadChecks
Sequence of load checks to perform
- targets
Sequence of targets to send
- schemas
Map of user-defined schemas (used for load checks evaluation)
- connections
Map of connections to external systems (used to send targets)
- storageManager
Data Quality Storage manager (used to save results)
- jobId
Implicit job ID
- settings
Implicit application settings object
- spark
Implicit spark session object
- fs
Implicit hadoop file system object
- Alphabetic
- By Inheritance
- DQBatchJob
- Serializable
- Serializable
- Product
- Equals
- DQJob
- Logging
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
-
new
DQBatchJob(jobConfig: JobConfig, sources: Seq[Source], metrics: Seq[RegularMetricConfig], composedMetrics: Seq[ComposedMetricConfig] = Seq.empty, trendMetrics: Seq[TrendMetricConfig] = Seq.empty, checks: Seq[CheckConfig] = Seq.empty, loadChecks: Seq[LoadCheckConfig] = Seq.empty, targets: Seq[TargetConfig] = Seq.empty, schemas: Map[String, SourceSchema] = Map.empty, connections: Map[String, DQConnection] = Map.empty, storageManager: Option[DqStorageManager] = None)(implicit jobId: String, settings: AppSettings, spark: SparkSession, fs: FileSystem)
- sources
Sequence of sources to process
- metrics
Sequence of metrics to calculate
- composedMetrics
Sequence of composed metrics to calculate
- trendMetrics
Sequence of trend metrics to calculate
- checks
Sequence of checks to perform
- loadChecks
Sequence of load checks to perform
- targets
Sequence of targets to send
- schemas
Map of user-defined schemas (used for load checks evaluation)
- connections
Map of connections to external systems (used to send targets)
- storageManager
Data Quality Storage manager (used to save results)
- jobId
Implicit job ID
- settings
Implicit application settings object
- spark
Implicit spark session object
- fs
Implicit hadoop file system object
Type Members
-
trait
RegularMetricsProcessor extends AnyRef
Metrics processed differently for batch and streaming job.
Metrics processed differently for batch and streaming job. Therefore, to generalize metric processing API, we introduce this trait that common method to process all regular metrics.
- Attributes
- protected
- Definition Classes
- DQJob
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
calculateComposedMetrics(stage: String, regularMetricResults: Result[MetricResults]): Result[MetricResults]
Calculates all composed metrics provided with regular metric results.
Calculates all composed metrics provided with regular metric results.
- stage
Stage indication used for logging.
- regularMetricResults
Map of regular metric results
- returns
Either a map of composed metric results or a list of calculation errors.
- Attributes
- protected
- Definition Classes
- DQJob
-
def
calculateTrendMetrics(stage: String)(implicit jobId: String, manager: Option[DqStorageManager], settings: AppSettings): Result[MetricResults]
Calculates all trend metrics provided with regular and composed metric results.
Calculates all trend metrics provided with regular and composed metric results.
- stage
Stage indication used for logging.
- jobId
Implicit current job ID.
- manager
Implicit storage manager used to load historical results.
- settings
Implicit application settings object.
- returns
Either a map of trend metric results or a list of calculation errors.
- Attributes
- protected
- Definition Classes
- DQJob
- implicit val caseSensitive: Boolean
-
val
checks: Seq[CheckConfig]
- Definition Classes
- DQBatchJob → DQJob
-
val
checksStage: String
- Attributes
- protected
- Definition Classes
- DQJob
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
combineResults(stage: String, loadCheckResults: Seq[ResultCheckLoad], checkResults: Result[Seq[ResultCheck]], jobState: Result[JobState], regularMetricResults: Result[Seq[ResultMetricRegular]], composedMetricResults: Result[Seq[ResultMetricComposed]], trendMetricResults: Result[Seq[ResultMetricTrend]], metricErrors: Result[Seq[ResultMetricError]])(implicit settings: AppSettings): Result[ResultSet]
Combines all results into a final result set.
Combines all results into a final result set.
- stage
Stage indication used for logging.
- loadCheckResults
Sequence of load check results
- checkResults
Sequence of check results (wrapped into Either)
- regularMetricResults
Sequence of regular metric results (wrapped into Either)
- composedMetricResults
Sequence of composed metric results (wrapped into Either)
- trendMetricResults
Sequence of trend metrics results (wrapped into Either)
- metricErrors
Sequence of metric errors (wrapped into Either)
- settings
Implicit application settings object
- returns
Combined results in form of ResultSet
- Attributes
- protected
- Definition Classes
- DQJob
-
val
composedMetrics: Seq[ComposedMetricConfig]
- Definition Classes
- DQBatchJob → DQJob
-
val
composedMetricsMap: Map[String, ComposedMetricConfig]
- Attributes
- protected
- Definition Classes
- DQJob
-
val
connections: Map[String, DQConnection]
- Definition Classes
- DQBatchJob → DQJob
- implicit val dumpSize: Int
-
def
encryptMetricErrors(errors: Seq[ResultMetricError])(implicit settings: AppSettings): Result[Seq[ResultMetricError]]
Encrypts rowData field in metric errors if requested per application configuration.
Encrypts rowData field in metric errors if requested per application configuration. Row data field in metric errors contains excerpt from data source and, therefore, these data can contain some sensitive information. In order to protect it, users can configure encryption chapter in application configuration and store encrypted rowData in DQ storage.
- errors
Sequence of metric errors to encrypt
- settings
Implicit application settings object
- returns
Sequence of metric errors with encrypted rowData field (if requested per configuration) or a list of encryption errors.
- Attributes
- protected
- Definition Classes
- DQJob
- Note
When metric errors are send via targets rowData field is never encrypted.
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
def
finalizeComposedMetrics(stage: String, metricResults: Result[MetricResults])(implicit settings: AppSettings): Result[Seq[ResultMetricComposed]]
Finalizes composed metric results: selects only composed metrics results and converts them to final composed metric results representation ready for writing into storage DB or sending via targets.
Finalizes composed metric results: selects only composed metrics results and converts them to final composed metric results representation ready for writing into storage DB or sending via targets.
- stage
Stage indication used for logging.
- metricResults
Map with all metric results
- settings
Implicit application settings object
- returns
Either a finalized sequence of composed metric results or a list of conversion errors.
- Attributes
- protected
- Definition Classes
- DQJob
-
def
finalizeJobState(jobConfig: JobConfig)(implicit settings: AppSettings, jobId: String): Result[JobState]
Finalizes job state: select whether to encrypt or not and converts to the final representation ready for writing into storage DB or sending via targets.
Finalizes job state: select whether to encrypt or not and converts to the final representation ready for writing into storage DB or sending via targets.
- jobConfig
Parsed Data Quality job configuration
- settings
Implicit application settings object
- jobId
Current Job ID
- returns
Either a finalized of job state.
- Attributes
- protected
- Definition Classes
- DQJob
-
def
finalizeMetricErrors(stage: String, metricResults: Result[MetricResults])(implicit settings: AppSettings): Result[Seq[ResultMetricError]]
Finalizes metric errors: retrieves metrics errors from results and converts them to final metric errors representation ready for writing into storage DB or sending via targets.
Finalizes metric errors: retrieves metrics errors from results and converts them to final metric errors representation ready for writing into storage DB or sending via targets.
- stage
Stage indication used for logging.
- metricResults
Map with all metric results
- settings
Implicit application settings object
- returns
Either a finalized sequence of metric errors or a list of conversion errors.
- Attributes
- protected
- Definition Classes
- DQJob
- Note
There could be a situations when metric errors hold the same error data. We are not interested in sending repeating data neither to storage database nor to Targets. Therefore, sequence of metric errors is deduplicated by unique constraint.
-
def
finalizeRegularMetrics(stage: String, metricResults: Result[MetricResults])(implicit settings: AppSettings): Result[Seq[ResultMetricRegular]]
Finalizes regular metric results: selects only regular metrics results and converts them to final regular metric results representation ready for writing into storage DB or sending via targets.
Finalizes regular metric results: selects only regular metrics results and converts them to final regular metric results representation ready for writing into storage DB or sending via targets.
- stage
Stage indication used for logging.
- metricResults
Map with all metric results
- settings
Implicit application settings object
- returns
Either a finalized sequence of regular metric results or a list of conversion errors.
- Attributes
- protected
- Definition Classes
- DQJob
-
def
finalizeTrendMetrics(stage: String, metricResults: Result[MetricResults])(implicit settings: AppSettings): Result[Seq[ResultMetricTrend]]
Finalizes trend metric results: selects only trend metrics results and converts them to final trend metric results representation ready for writing into storage DB or sending via targets.
Finalizes trend metric results: selects only trend metrics results and converts them to final trend metric results representation ready for writing into storage DB or sending via targets.
- stage
Stage indication used for logging.
- metricResults
Map with all metric results
- settings
Implicit application settings object
- returns
Either a finalized sequence of trend metric results or a list of conversion errors.
- Attributes
- protected
- Definition Classes
- DQJob
-
implicit
val
fs: FileSystem
- Definition Classes
- DQBatchJob → DQJob
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
initLogger(lvl: Level): Unit
Initialises logger:
Initialises logger:
- gets log4j properties with following priority: resources directory -> working directory -> default settings
- updates root logger level with verbosity level defined at application start
- reconfigures Logger
- lvl
Root logger level defined at application start
- Definition Classes
- Logging
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
val
jobConfig: JobConfig
- Definition Classes
- DQBatchJob → DQJob
-
implicit
val
jobId: String
- Definition Classes
- DQBatchJob → DQJob
-
val
loadCheckStage: String
- Attributes
- protected
- Definition Classes
- DQJob
-
val
loadChecks: Seq[LoadCheckConfig]
- Definition Classes
- DQBatchJob → DQJob
-
val
loadChecksBySources: Map[String, Seq[LoadCheckConfig]]
- Attributes
- protected
- Definition Classes
- DQJob
-
lazy val
log: Logger
- Definition Classes
- Logging
- Annotations
- @transient()
-
def
logMetricResults(stage: String, metType: String, mr: MetricResults): Unit
Logs metric calculation results.
Logs metric calculation results. Used during metric processing, to immediately log their calculation status.
- stage
Stage indication used for logging.
- metType
Type of the metrics being calculated (either regular or composed)
- mr
Map with metric results
- Attributes
- protected
- Definition Classes
- DQJob
-
def
logPostMsg(): Unit
- Attributes
- protected
-
def
logPreMsg(): Unit
- Attributes
- protected
-
implicit
val
manager: Option[DqStorageManager]
- Definition Classes
- DQJob
-
val
metricStage: String
- Attributes
- protected
- Definition Classes
- DQJob
-
val
metrics: Seq[RegularMetricConfig]
- Definition Classes
- DQBatchJob → DQJob
-
val
metricsBySources: Map[String, Seq[RegularMetricConfig]]
- Attributes
- protected
- Definition Classes
- DQJob
-
val
metricsMap: Map[String, RegularMetricConfig]
- Attributes
- protected
- Definition Classes
- DQJob
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
def
performChecks(stage: String, metricResults: Result[MetricResults])(implicit settings: AppSettings): Result[Seq[ResultCheck]]
Performs all checks
Performs all checks
- stage
Stage indication used for logging.
- metricResults
Map with metric results (both regular and composed)
- settings
Implicit application settings object
- returns
Either sequence of check results or a list of check evaluation errors.
- Attributes
- protected
- Definition Classes
- DQJob
-
def
performLoadChecks(stage: String)(implicit settings: AppSettings): Seq[ResultCheckLoad]
Performs all load checks
Performs all load checks
- stage
Stage indication used for logging.
- settings
Implicit application settings object
- returns
Sequence of load check results.
- Attributes
- protected
- Definition Classes
- DQJob
-
def
processAll(regularMetricsProcessor: RegularMetricsProcessor, stagePrefix: Option[String] = None)(implicit settings: AppSettings): Result[ResultSet]
Top-level processing function: aggregates and runs all processing stages in required order.
Top-level processing function: aggregates and runs all processing stages in required order.
- regularMetricsProcessor
Regular metric processor used to calculate regular metric results.
- stagePrefix
Prefix to stage names. Used for logging in streaming applications to indicate window for which results are processed.
- settings
Implicit application settings object
- returns
Either final results set or a list of processing errors
- Attributes
- protected
- Definition Classes
- DQJob
-
def
processTargets(stage: String, resultSet: Result[ResultSet])(implicit settings: AppSettings): Result[Unit]
Processes all targets
Processes all targets
- stage
Stage indication used for logging.
- resultSet
Final results set
- settings
Implicit application settings object
- returns
Either unit or a list of target processing errors.
- Attributes
- protected
- Definition Classes
- DQJob
-
def
run: Result[ResultSet]
Runs Data Quality job.
Runs Data Quality job. Job include following stages (some of them can be omitted depending on the configuration):
- processing load checks
- calculating regular metrics
- calculating composed composed metrics
- processing checks
- saving results
- sending/saving targets
- returns
Either a set of job results or a list of errors that occurred during job run.
-
def
runStorageMigration(stage: String)(implicit settings: AppSettings): Result[String]
Runs database migration provided with storage manager.
Runs database migration provided with storage manager.
- stage
Stage indication used for logging.
- settings
Implicit application settings object
- returns
Nothing in case of successful migration or a list of migration errors.
- Attributes
- protected
- Definition Classes
- DQJob
-
def
saveResults(stage: String, resultSet: Result[ResultSet])(implicit settings: AppSettings): Result[String]
Saves results into Data Quality storage
Saves results into Data Quality storage
- stage
Stage indication used for logging.
- resultSet
Final results set
- returns
Either a status string or a list of saving errors.
- Attributes
- protected
- Definition Classes
- DQJob
-
val
schemas: Map[String, SourceSchema]
- Definition Classes
- DQBatchJob → DQJob
- implicit val settings: AppSettings
-
val
sources: Seq[Source]
- Definition Classes
- DQBatchJob → DQJob
-
implicit
val
spark: SparkSession
- Definition Classes
- DQBatchJob → DQJob
-
val
storageManager: Option[DqStorageManager]
- Definition Classes
- DQBatchJob → DQJob
-
val
storageStage: String
- Attributes
- protected
- Definition Classes
- DQJob
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
val
targets: Seq[TargetConfig]
- Definition Classes
- DQBatchJob → DQJob
-
val
targetsStage: String
- Attributes
- protected
- Definition Classes
- DQJob
-
val
trendMetrics: Seq[TrendMetricConfig]
- Definition Classes
- DQBatchJob → DQJob
-
val
trendMetricsMap: Map[String, TrendMetricConfig]
- Attributes
- protected
- Definition Classes
- DQJob
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()