Changelog
2.2.0 (2024-12-27)
Bug Fixes
- refactor late records processing (#73) (bdf8f00)
- resolve API server build (#67) (d1321e8)
- resolve bugs related to minimum watermark search and kafka initial offsets: (#71) (3e3261a)
- update regex pattern in windowBy (#68) (5cb41e6)
Features
- add timeout and retry settings for schema registry connection (#72) (3297b90)
- added options params for some sources (#70) (2bacc35)
- added support for case-when and if-else in composed metrics (#69) (cfd6f07)
2.1.0 (2024-11-18)
Bug Fixes
- added checkpoint initialization and validation for kafka streams (#62) (1af4105)
- added logger name in props (#59) (1a1d5a8)
- expression checks calculation for streaming jobs (#66) (13217c9)
- hive and file storage (#60) (bdedfad)
- update expressions parsing (#65) (886d29c)
Features
- added job config generator for api server (#54) (51050a8)
- added persist for regular sources (#63) (8070fc1)
- adding arima and linear regression trend metrics (#64) (54231d8)
- fail flag for checks (#61) (cf32cbd)
2.0.0 (2024-08-07)
Bug Fixes
Features
- add support of
*
for selection of all source columns in metric configuration (#49) (ca09fe4) - change project domain (#52) (6020463)
- Checkita 2.0 release (#45) (e747659)
BREAKING CHANGES
- move project from ru.raiffeisen domain to org.checkita domain.
Other changes include:
- fix backticks issue in GroupingDFMetricCalculator
- refactor DFMerticCalculator API to implicitly pass column types.
-
major updates to Checkita Core that enables new functionality and enhances existing one.
-
new metrics engine based on Spark DF-API: improves stability and performance of regular metrics computation. Supported in batch-jobs only.
- checkpointing for streaming application: restart your application from the same point where it stopped (or crushed).
- Checkita API Server - experimental MVP service that provides basic functionality to work with configurations and DQ Storage.
- regular metrics refactoring to comply with SQL standards.
- new type of metrics: TREND metrics. Enables computing various statistics over historical metric results.
- new type of checks: EXPRESSION. Allows to define check pass condition using arbitrary boolean expression.
- enhanced formulas (for both composed metrics and expression checks): formulas now supports basic mathematical functions.
- new Swagger documentation covering both Checkita Configurations and API methods (still in development, will be completed shortly)
- support of Confluent Schema Registry to read schemas from.
- minor bug fixes.
1.7.2 (2024-06-21)
Bug Fixes
- filter NaN values from historical metric results in trend checks to avoid errors during computation of average metric value. (#44) (d6da0f1)
1.7.1 (2024-05-27)
Bug Fixes
1.7.0 (2024-05-13)
Bug Fixes
- modify unique key constraint for regular metrics results table (#38) (6a67b2f)
- update covariance metric logic (#39) (0951ddc)
Features
1.6.0 (2024-04-16)
Features
1.5.0 (2024-03-29)
Bug Fixes
- implement custom Spark collection accumulator to limit total number of collected errors (#35) (f36e09f)
- update JobConfig and added test for config encryptor (#34) (bc8bfd8)
Features
- added functionality to reverse error collection logic for regular metrics with conditional criteria (#36) (9244377)
1.4.2 (2024-03-20)
Bug Fixes
1.4.1 (2024-03-14)
Bug Fixes
1.4.0 (2024-03-13)
Features
1.3.2 (2024-03-11)
Bug Fixes
1.3.1 (2024-3-4)
Bug Fixes
- update prepare-release scripts in order to fix auto-versioning and changelog formatting. (#29) (9a30d59)
1.3.0 (2024-3-4)
Features
- job configuration enhancements: added description and metadata fields and refactored hive source partition filtering (#28) (7cf24db)
1.2.1 (2024-2-26)
Bug Fixes
1.2.0 (2024-2-16)
Bug Fixes
- docs update (describe new type of conn) and minor fixes (#23) (c6abb81)
- minor code fixes (#25) (688929d)
Features
- added ClickHouse connection (#22) (d8add79)
- added Greenplum connection (#21) (5756eff)
- new entity in storage db - job config state (#24) (fbe650c)
1.1.1 (2023-12-20)
Bug Fixes
1.1.0 (2023-12-08)
Features
- added new type connections (#17) (205cbc8)
- adding functionality to run quality checks over streaming sources (#19) (0895fed)
- adding streaming sources and stream readers (#16) (b5398fa)
1.0.1 (2023-11-13)
Bug Fixes
1.0.0 (2023-11-01)
Bug Fixes
Features
- added template support for email subject (#9) (9246f7f), closes #6
- adding duplicateValues metric (#8) (da77daa), closes #5
- Incorporating latest updates (1fee747)
- Publishing major update of Checkita DQ (09af3a2), closes #3
0.3.5 (2023-09-19)
Bug Fixes
- Ensure that JDBC connection is alive prior saving results. There could be situation when Application runs quite long and DB server can close idle connection due to inactivity. In such cases it is required to open it again before saving results.
0.3.4 (2023-09-14)
Features
- Added support of customisable email subject templates.
Bug Fixes
- Fixed duplicateValues metric: some duplicate values can only be determined during metric calculator merge.
- Fix sender name for check Alert. It was hardcoded but need to refer the name configured in application configuration file.
- Fixed email encoding: changed to UTF-8
0.3.3 (2023-09-04)
Features
- Added new metric: duplicateValues.
- Enhanced joinSql virtual source by allowing to supply it with arbitrary number of parent sources.
- Enhanced table source by allowing to supply it with query to execute (on the DB side) and thus, read only query results.
- Docs update
0.3.2 (2023-08-25)
Features
- Added email sender name customization
- Added functionality to provide html and markdown templates to build body of check alerts and summary report messages when sending them to either email or Mattermost.
- Added custom source
- Docs update
Bug Fixes
- Pass both source and virtual sources to target processors
- Fix json serialization bugs.
0.3.1 (2023-08-17)
Bug Fixes
- Fix params json when sending results to kafka
- Add sourceId and sourceKeyFields to errorCollection reports (or kafka messages)
- Fix CSV headers for checkAlert attachments
- Fix column metric result representation when saving to DB (sourceId and metric description were mixed)
- Add MD5 hash to message key when sending results to Kafka in order to ensure idempotent message consumption.
0.3.0 (2023-07-31)
Features
- Change DB model:
- Added referenceDateTime and executionDateTime
- Type in DB - timestamp with tz
- Render format can be setup in application.conf
- Job-conf variables are changed to referenceDateTime and executionDateTime
- Changed init sql script and also added alter sql script
- Added option to sent aggregated messaged to Kafka: one per each target type
- Added option to run DQ in Shared Spark Context
- Added new types of history DB: Hive and File (both managed by spark without extra services)
Bug Fixes
- Fixed SQL checks
- Made DQ case-insensitive in terms of column names
- Docs updates
0.2.0 (2023-06-21)
Features
- Adding support of Spark 2.4+ and Spark 3+
- Project is rebuild for Scala 2.12.18
- Added test for HdfsReader
Bug Fixes
- Fixed HdfsReader in terms of loading fixed-width files: added casting to requested column types
- HBase source is temporarily turned off (due to connector is incompatible with newer versions of Scala and Spark)
0.1.10 (2023-06-15)
Features
- Adding Kafka support:
- New section in run configuration to describe connection to Kafka Brokers
- New type of source to read from Kafka topic
- Output of all targets to Kafka topic
- Adding Mattermost notifications:
- New section added to application configuration to describe connection to Mattermost API.
- CheckAlerts and summary reports can be sent to Mattermost.
- Notifications can be sent to both channels and user direct messages.
- Added new DQ application argument -v to pass extra variables to be prepended to application configuration file. Can be used to pass secrets for email, mattermost and storage DB on startup.
- Documentation is updated according to new features.
Bug Fixes
- Fixed email configuration for cases when smpt support anonymous connection (user and password are undefined)
0.1.9 (2023-05-17)
Features
- Enable mailing notifications
- Added summary section to targets to set up summary reports sending via email
- Added checkAlerts section to targets to set up critical check alerts via email
- Added errorCollection section to targets to set up error collection (stored to HDFS only for now)
- Refactored metric error collection: single file with unified format will be written
- Modified config model for virtualSources to allow set the saving options directly when declaring virtual source.
- Update application config files with default production settings for mailing
- Update documentation
Bug Fixes
- Update spark accumulator for metric error collection
- Add TLS Support to mailer
- Change mail attachments to ByteArrayDataSource in order to provide attachment as a byte stream instead of file.
- Minor documentation fixes
- Prevent empty file creation when errors accumulator is empty (this is also the case when keyFields are not set)
0.1.8 (2023-04-20)
Features
- Enable metric error collection.
- Add keyFields to source for purpose of metric error collection.
- Update metrics to make them Statusable whenever possible.
- Enhance DQ command line arguments to allow passing arbitrary number of variables on startup which will be added to DQ configuration file in runtime and may be references within it.
- Updated documentation to reflect aforementioned changes.
Bug Fixes
- Updated metric tests to cover Statusable calculation.
- Refactored configuration file parsing: now the variables are prepended to configuration file by means of streams and no temporary file is created.
0.1.7 (2023-04-04)
Features
- Added numberNutBetween metric
- Added new results view to database init sql
- Added jobId to metric results model
Bug Fixes
- Fixed composed metric calculator: increased power operation priority
- Fixed database config reading
- Fixed RDBMS source reading
- Fixed HistoryDBManager to additionally filter query results by jobId
0.1.6 (2022-08-23)
Features
- Added unit tests for column and file metric calculators;
- Refactored date-related metrics to make them work with timestamp column type correctly;
Bug Fixes
- Fix build.sbt for production environment
- Add hive meta schema initialisation (script for schema initialisation in DQ database, not part of DQ application);
- Fix schema creation in db-init.sql (script for schema initialisation in DQ database, not part of DQ application);
- Other minor bug fixes.
0.1.5 (2022-07-01)
Features
- Added documentation
- SBT build is updated in terms of assembling uber-jars.
0.1.4 (2022-06-22)
Features
- New ConfigReader is added
- Format of metrics configuration file should be set within application.conf (0.x for old format 1.x for new format)
- Documentation on how to fill various sections of job configuration file is added.