object BasicStringRDDMetrics
Basic metrics that can be applied to string (or string like) elements
- Alphabetic
- By Inheritance
- BasicStringRDDMetrics
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Type Members
-
case class
AvgStringRDDMetricCalculator(sum: Double, cnt: Long, failCount: Long = 0, status: CalculatorStatus = CalculatorStatus.Success, failMsg: String = "OK") extends RDDMetricCalculator with Product with Serializable
Calculates average length of processed elements
Calculates average length of processed elements
- sum
Current sum of lengths
- cnt
Current count of elements
- returns
result map with keys: "AVG_STRING"
- Note
Null values are omitted: For values: "foo", "bar-buz", null Metric result would be: (3 + 7) / 2 = 5
-
case class
CompletenessRDDMetricCalculator(nullCnt: Long, cellCnt: Long, includeEmptyStrings: Boolean, reversed: Boolean, failCount: Long = 0, status: CalculatorStatus = CalculatorStatus.Success, failMsg: String = "OK") extends RDDMetricCalculator with ReversibleRDDCalculator with Product with Serializable
Calculates completeness of values in the specified columns
Calculates completeness of values in the specified columns
- nullCnt
Current amount of null values.
- cellCnt
Current amount of cells.
- includeEmptyStrings
Flag which sets whether empty strings are considered in addition to null values.
- reversed
Boolean flag indicating whether error collection logic should be direct or reversed.
- returns
result map with keys: "COMPLETENESS"
-
case class
DistinctValuesRDDMetricCalculator(uniqueValues: Set[String] = Set.empty[String], failCount: Long = 0, status: CalculatorStatus = CalculatorStatus.Success, failMsg: String = "OK") extends RDDMetricCalculator with Product with Serializable
Calculates count of distinct values in processed elements WARNING: Uses set without any kind of trimming and hashing.
Calculates count of distinct values in processed elements WARNING: Uses set without any kind of trimming and hashing. Returns the exact count. So if a big diversion of elements needs to be processed and exact result is not mandatory, then it's better to use HyperLogLog version called "APPROXIMATE_DISTINCT_VALUES".
- uniqueValues
Set of processed values
- returns
result map with keys: "DISTINCT_VALUES"
-
case class
DuplicateValuesRDDMetricCalculator(numDuplicates: Long, uniqueValues: Set[String], failCount: Long = 0, status: CalculatorStatus = CalculatorStatus.Success, failMsg: String = "OK") extends RDDMetricCalculator with Product with Serializable
Calculates number of duplicate values for given column or tuple of columns.
Calculates number of duplicate values for given column or tuple of columns. WARNING: In order to find duplicates, the processed unique values are stored as a set without any kind of trimming and hashing. So if a big diversion of elements needs to be processed there is a risk of getting OOM error in cases when executors have insufficient memory allocation.
- numDuplicates
Number of found duplicates
- uniqueValues
Set of unique values obtained from already processed rows
-
case class
EmptinessRDDMetricCalculator(nullCnt: Long, cellCnt: Long, includeEmptyStrings: Boolean, reversed: Boolean, failCount: Long = 0, status: CalculatorStatus = CalculatorStatus.Success, failMsg: String = "OK") extends RDDMetricCalculator with ReversibleRDDCalculator with Product with Serializable
Calculates emptiness of values in the specified columns, i.e.
Calculates emptiness of values in the specified columns, i.e. percentage of null values or empty values (if configured to account for empty values).
- nullCnt
Current amount of null values.
- cellCnt
Current amount of cells.
- includeEmptyStrings
Flag which sets whether empty strings are considered in addition to null values.
- reversed
Boolean flag indicating whether error collection logic should be direct or reversed.
- returns
result map with keys: "EMPTINESS"
-
case class
EmptyValuesRDDMetricCalculator(cnt: Long, reversed: Boolean, failCount: Long = 0, status: CalculatorStatus = CalculatorStatus.Success, failMsg: String = "OK") extends RDDMetricCalculator with ReversibleRDDCalculator with Product with Serializable
Calculates amount of empty strings in processed elements.
Calculates amount of empty strings in processed elements.
- cnt
Current amount of empty strings.
- reversed
Boolean flag indicating whether error collection logic should be direct or reversed.
- returns
result map with keys: "EMPTY_VALUES"
-
case class
FormattedDateRDDMetricCalculator(cnt: Long, dateFormat: String, reversed: Boolean, failCount: Long = 0, status: CalculatorStatus = CalculatorStatus.Success, failMsg: String = "OK") extends RDDMetricCalculator with ReversibleRDDCalculator with Product with Serializable
Calculates amount of strings in provided date format
Calculates amount of strings in provided date format
- cnt
Current count of filtered elements
- dateFormat
Requested date format
- returns
result map with keys: "FORMATTED_DATE"
-
case class
MaxStringRDDMetricCalculator(strl: Int, failCount: Long = 0, status: CalculatorStatus = CalculatorStatus.Success, failMsg: String = "OK") extends RDDMetricCalculator with Product with Serializable
Calculates maximal length of processed elements
Calculates maximal length of processed elements
- strl
Current maximal string length
- returns
result map with keys: "MAX_STRING"
-
case class
MinStringRDDMetricCalculator(strl: Int, failCount: Long = 0, status: CalculatorStatus = CalculatorStatus.Success, failMsg: String = "OK") extends RDDMetricCalculator with Product with Serializable
Calculates minimal length of processed elements
Calculates minimal length of processed elements
- strl
Current minimal string length
- returns
result map with keys: "MIN_STRING"
-
case class
NullValuesRDDMetricCalculator(cnt: Long, reversed: Boolean, failCount: Long = 0, status: CalculatorStatus = CalculatorStatus.Success, failMsg: String = "OK") extends RDDMetricCalculator with ReversibleRDDCalculator with Product with Serializable
Calculates amount of null values in processed elements
Calculates amount of null values in processed elements
- cnt
Current amount of null values
- reversed
Boolean flag indicating whether error collection logic should be direct or reversed.
- returns
result map with keys: "NULL_VALUES"
-
case class
RegexMatchRDDMetricCalculator(cnt: Long, regex: String, reversed: Boolean, failCount: Long = 0, status: CalculatorStatus = CalculatorStatus.Success, failMsg: String = "OK") extends RDDMetricCalculator with ReversibleRDDCalculator with Product with Serializable
Calculates amount of values that match the provided regular expression
Calculates amount of values that match the provided regular expression
- cnt
Current counter
- regex
Regex pattern
- returns
result map with keys: "REGEX_MATCH"
-
case class
RegexMismatchRDDMetricCalculator(cnt: Long, regex: String, reversed: Boolean, failCount: Long = 0, status: CalculatorStatus = CalculatorStatus.Success, failMsg: String = "OK") extends RDDMetricCalculator with ReversibleRDDCalculator with Product with Serializable
Calculates amount of rows that do not match the provided regular expression
Calculates amount of rows that do not match the provided regular expression
- cnt
Current counter
- regex
Regex pattern
- returns
result map with keys: "REGEX_MISMATCH"
-
case class
StringInDomainRDDMetricCalculator(cnt: Long, domain: Set[String], reversed: Boolean, failCount: Long = 0, status: CalculatorStatus = CalculatorStatus.Success, failMsg: String = "OK") extends RDDMetricCalculator with ReversibleRDDCalculator with Product with Serializable
Calculates amount of strings from provided domain
Calculates amount of strings from provided domain
- cnt
Current count of filtered elements
- domain
Set of strings that represents the requested domain
- returns
result map with keys: "STRING_IN_DOMAIN"
-
case class
StringLengthRDDMetricCalculator(cnt: Long, length: Int, compareRule: String, reversed: Boolean, failCount: Long = 0, status: CalculatorStatus = CalculatorStatus.Success, failMsg: String = "OK") extends RDDMetricCalculator with ReversibleRDDCalculator with Product with Serializable
Calculates amount of strings with specific requested length
Calculates amount of strings with specific requested length
- cnt
Current count of filtered elements
- length
Requested length
- compareRule
Comparison rule. Could be:
- "eq" - equals to,
- "lt" - less than,
- "lte" - less than or equals to,
- "gt" - greater than,
- "gte" - greater than or equals to.
- returns
result map with keys: "STRING_LENGTH"
-
case class
StringOutDomainRDDMetricCalculator(cnt: Long, domain: Set[String], reversed: Boolean, failCount: Long = 0, status: CalculatorStatus = CalculatorStatus.Success, failMsg: String = "OK") extends RDDMetricCalculator with ReversibleRDDCalculator with Product with Serializable
Calculates amount of strings out of provided domain
Calculates amount of strings out of provided domain
- cnt
Current count of filtered elements
- domain
Set of strings that represents the requested domain
- returns
result map with keys: "STRING_OUT_DOMAIN"
-
case class
StringValuesRDDMetricCalculator(cnt: Long, compareValue: String, reversed: Boolean, failCount: Long = 0, status: CalculatorStatus = CalculatorStatus.Success, failMsg: String = "OK") extends RDDMetricCalculator with ReversibleRDDCalculator with Product with Serializable
Counts number of appearances of requested string in processed elements
Counts number of appearances of requested string in processed elements
- cnt
Current amount of appearances
- compareValue
Requested string to find
- returns
result map with keys: "STRING_VALUES"
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()