Packages

o

org.checkita.dqf.core.metrics.rdd.regular

BasicStringRDDMetrics

object BasicStringRDDMetrics

Basic metrics that can be applied to string (or string like) elements

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. BasicStringRDDMetrics
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. case class AvgStringRDDMetricCalculator(sum: Double, cnt: Long, failCount: Long = 0, status: CalculatorStatus = CalculatorStatus.Success, failMsg: String = "OK") extends RDDMetricCalculator with Product with Serializable

    Calculates average length of processed elements

    Calculates average length of processed elements

    sum

    Current sum of lengths

    cnt

    Current count of elements

    returns

    result map with keys: "AVG_STRING"

    Note

    Null values are omitted: For values: "foo", "bar-buz", null Metric result would be: (3 + 7) / 2 = 5

  2. case class CompletenessRDDMetricCalculator(nullCnt: Long, cellCnt: Long, includeEmptyStrings: Boolean, reversed: Boolean, failCount: Long = 0, status: CalculatorStatus = CalculatorStatus.Success, failMsg: String = "OK") extends RDDMetricCalculator with ReversibleRDDCalculator with Product with Serializable

    Calculates completeness of values in the specified columns

    Calculates completeness of values in the specified columns

    nullCnt

    Current amount of null values.

    cellCnt

    Current amount of cells.

    includeEmptyStrings

    Flag which sets whether empty strings are considered in addition to null values.

    reversed

    Boolean flag indicating whether error collection logic should be direct or reversed.

    returns

    result map with keys: "COMPLETENESS"

  3. case class DistinctValuesRDDMetricCalculator(uniqueValues: Set[String] = Set.empty[String], failCount: Long = 0, status: CalculatorStatus = CalculatorStatus.Success, failMsg: String = "OK") extends RDDMetricCalculator with Product with Serializable

    Calculates count of distinct values in processed elements WARNING: Uses set without any kind of trimming and hashing.

    Calculates count of distinct values in processed elements WARNING: Uses set without any kind of trimming and hashing. Returns the exact count. So if a big diversion of elements needs to be processed and exact result is not mandatory, then it's better to use HyperLogLog version called "APPROXIMATE_DISTINCT_VALUES".

    uniqueValues

    Set of processed values

    returns

    result map with keys: "DISTINCT_VALUES"

  4. case class DuplicateValuesRDDMetricCalculator(numDuplicates: Long, uniqueValues: Set[String], failCount: Long = 0, status: CalculatorStatus = CalculatorStatus.Success, failMsg: String = "OK") extends RDDMetricCalculator with Product with Serializable

    Calculates number of duplicate values for given column or tuple of columns.

    Calculates number of duplicate values for given column or tuple of columns. WARNING: In order to find duplicates, the processed unique values are stored as a set without any kind of trimming and hashing. So if a big diversion of elements needs to be processed there is a risk of getting OOM error in cases when executors have insufficient memory allocation.

    numDuplicates

    Number of found duplicates

    uniqueValues

    Set of unique values obtained from already processed rows

  5. case class EmptinessRDDMetricCalculator(nullCnt: Long, cellCnt: Long, includeEmptyStrings: Boolean, reversed: Boolean, failCount: Long = 0, status: CalculatorStatus = CalculatorStatus.Success, failMsg: String = "OK") extends RDDMetricCalculator with ReversibleRDDCalculator with Product with Serializable

    Calculates emptiness of values in the specified columns, i.e.

    Calculates emptiness of values in the specified columns, i.e. percentage of null values or empty values (if configured to account for empty values).

    nullCnt

    Current amount of null values.

    cellCnt

    Current amount of cells.

    includeEmptyStrings

    Flag which sets whether empty strings are considered in addition to null values.

    reversed

    Boolean flag indicating whether error collection logic should be direct or reversed.

    returns

    result map with keys: "EMPTINESS"

  6. case class EmptyValuesRDDMetricCalculator(cnt: Long, reversed: Boolean, failCount: Long = 0, status: CalculatorStatus = CalculatorStatus.Success, failMsg: String = "OK") extends RDDMetricCalculator with ReversibleRDDCalculator with Product with Serializable

    Calculates amount of empty strings in processed elements.

    Calculates amount of empty strings in processed elements.

    cnt

    Current amount of empty strings.

    reversed

    Boolean flag indicating whether error collection logic should be direct or reversed.

    returns

    result map with keys: "EMPTY_VALUES"

  7. case class FormattedDateRDDMetricCalculator(cnt: Long, dateFormat: String, reversed: Boolean, failCount: Long = 0, status: CalculatorStatus = CalculatorStatus.Success, failMsg: String = "OK") extends RDDMetricCalculator with ReversibleRDDCalculator with Product with Serializable

    Calculates amount of strings in provided date format

    Calculates amount of strings in provided date format

    cnt

    Current count of filtered elements

    dateFormat

    Requested date format

    returns

    result map with keys: "FORMATTED_DATE"

  8. case class MaxStringRDDMetricCalculator(strl: Int, failCount: Long = 0, status: CalculatorStatus = CalculatorStatus.Success, failMsg: String = "OK") extends RDDMetricCalculator with Product with Serializable

    Calculates maximal length of processed elements

    Calculates maximal length of processed elements

    strl

    Current maximal string length

    returns

    result map with keys: "MAX_STRING"

  9. case class MinStringRDDMetricCalculator(strl: Int, failCount: Long = 0, status: CalculatorStatus = CalculatorStatus.Success, failMsg: String = "OK") extends RDDMetricCalculator with Product with Serializable

    Calculates minimal length of processed elements

    Calculates minimal length of processed elements

    strl

    Current minimal string length

    returns

    result map with keys: "MIN_STRING"

  10. case class NullValuesRDDMetricCalculator(cnt: Long, reversed: Boolean, failCount: Long = 0, status: CalculatorStatus = CalculatorStatus.Success, failMsg: String = "OK") extends RDDMetricCalculator with ReversibleRDDCalculator with Product with Serializable

    Calculates amount of null values in processed elements

    Calculates amount of null values in processed elements

    cnt

    Current amount of null values

    reversed

    Boolean flag indicating whether error collection logic should be direct or reversed.

    returns

    result map with keys: "NULL_VALUES"

  11. case class RegexMatchRDDMetricCalculator(cnt: Long, regex: String, reversed: Boolean, failCount: Long = 0, status: CalculatorStatus = CalculatorStatus.Success, failMsg: String = "OK") extends RDDMetricCalculator with ReversibleRDDCalculator with Product with Serializable

    Calculates amount of values that match the provided regular expression

    Calculates amount of values that match the provided regular expression

    cnt

    Current counter

    regex

    Regex pattern

    returns

    result map with keys: "REGEX_MATCH"

  12. case class RegexMismatchRDDMetricCalculator(cnt: Long, regex: String, reversed: Boolean, failCount: Long = 0, status: CalculatorStatus = CalculatorStatus.Success, failMsg: String = "OK") extends RDDMetricCalculator with ReversibleRDDCalculator with Product with Serializable

    Calculates amount of rows that do not match the provided regular expression

    Calculates amount of rows that do not match the provided regular expression

    cnt

    Current counter

    regex

    Regex pattern

    returns

    result map with keys: "REGEX_MISMATCH"

  13. case class StringInDomainRDDMetricCalculator(cnt: Long, domain: Set[String], reversed: Boolean, failCount: Long = 0, status: CalculatorStatus = CalculatorStatus.Success, failMsg: String = "OK") extends RDDMetricCalculator with ReversibleRDDCalculator with Product with Serializable

    Calculates amount of strings from provided domain

    Calculates amount of strings from provided domain

    cnt

    Current count of filtered elements

    domain

    Set of strings that represents the requested domain

    returns

    result map with keys: "STRING_IN_DOMAIN"

  14. case class StringLengthRDDMetricCalculator(cnt: Long, length: Int, compareRule: String, reversed: Boolean, failCount: Long = 0, status: CalculatorStatus = CalculatorStatus.Success, failMsg: String = "OK") extends RDDMetricCalculator with ReversibleRDDCalculator with Product with Serializable

    Calculates amount of strings with specific requested length

    Calculates amount of strings with specific requested length

    cnt

    Current count of filtered elements

    length

    Requested length

    compareRule

    Comparison rule. Could be:

    • "eq" - equals to,
    • "lt" - less than,
    • "lte" - less than or equals to,
    • "gt" - greater than,
    • "gte" - greater than or equals to.
    returns

    result map with keys: "STRING_LENGTH"

  15. case class StringOutDomainRDDMetricCalculator(cnt: Long, domain: Set[String], reversed: Boolean, failCount: Long = 0, status: CalculatorStatus = CalculatorStatus.Success, failMsg: String = "OK") extends RDDMetricCalculator with ReversibleRDDCalculator with Product with Serializable

    Calculates amount of strings out of provided domain

    Calculates amount of strings out of provided domain

    cnt

    Current count of filtered elements

    domain

    Set of strings that represents the requested domain

    returns

    result map with keys: "STRING_OUT_DOMAIN"

  16. case class StringValuesRDDMetricCalculator(cnt: Long, compareValue: String, reversed: Boolean, failCount: Long = 0, status: CalculatorStatus = CalculatorStatus.Success, failMsg: String = "OK") extends RDDMetricCalculator with ReversibleRDDCalculator with Product with Serializable

    Counts number of appearances of requested string in processed elements

    Counts number of appearances of requested string in processed elements

    cnt

    Current amount of appearances

    compareValue

    Requested string to find

    returns

    result map with keys: "STRING_VALUES"

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  6. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  7. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  8. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  9. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  10. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  11. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  12. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  13. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  14. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  15. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  16. def toString(): String
    Definition Classes
    AnyRef → Any
  17. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  18. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  19. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()

Inherited from AnyRef

Inherited from Any

Ungrouped