Packages

object Casting

Helpers used to convert values of type Any to desirable type.

The indent of these helpers is to manage values obtained from Spark Row to desired type for use in metric calculators.

As the Spark Row can stores elements of various type then we need to guess (pattern match) it to provide an appropriate conversion method.

For that purpose we will follow Spark SQL Type to Java types mapping:

  • BooleanType -> java.lang.Boolean
  • ByteType -> java.lang.Byte
  • ShortType -> java.lang.Short
  • IntegerType -> java.lang.Integer
  • LongType -> java.lang.Long
  • FloatType -> java.lang.Float
  • DoubleType -> java.lang.Double
  • StringType -> String
  • DecimalType -> java.math.BigDecimal
  • DateType -> java.sql.Date if spark.sql.datetime.java8API.enabled is false
  • DateType -> java.time.LocalDate if spark.sql.datetime.java8API.enabled is true
  • TimestampType -> java.sql.Timestamp if spark.sql.datetime.java8API.enabled is false
  • TimestampType -> java.time.Instant if spark.sql.datetime.java8API.enabled is true
  • BinaryType -> byte array
  • ArrayType -> scala.collection.Seq (use getList for java.util.List)
  • MapType -> scala.collection.Map (use getJavaMap for java.util.Map)
  • StructType -> org.apache.spark.sql.Row
Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. Casting
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  6. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  7. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  8. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  9. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  10. def getDoubleFromBytes(b: Array[Byte]): Option[Double]

    Tries to convert array of bytes to double.

    Tries to convert array of bytes to double. The approach on casting depends on size of array:

    • in case of empty array return None
    • in case of single byte return this byte converted to double
    • in case of two bytes retrieve short number and convert it to double
    • in case of four bytes retrieve integer number and convert it to double
    • in case of eight bytes retrieve double itself
    • for other lengths try to convert array to string and string to double.
    • None if none of the above was successful.
    b

    Byte array to convert to double

    returns

    Some double of conversion was successful or None

  11. def getLongFromBytes(b: Array[Byte]): Option[Long]

    Tries to convert array of bytes to long.

    Tries to convert array of bytes to long. The approach will differ on size of array:

    • in case of empty array return None
    • in case of single byte return this byte converted to long
    • in case of two bytes retrieve short number and convert it to long
    • in case of four bytes retrieve integer number and convert it to long
    • in case of eight bytes retrieve long itself
    • for other lengths try to convert array to string and string to long.
    • None if none of the above was successful.
    b

    Byte array to convert to long

    returns

    Some long of conversion was successful or None

  12. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  13. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  14. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  15. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  16. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  17. def primitiveValToDouble(value: Any): Option[Double]

    Converts primitive value to Double.

    Converts primitive value to Double.

    value

    Value to convert to double

    returns

    Some double value if conversion was successful or None

    Note

    Date and time related types are converted to Epoch and then to Double

  18. def primitiveValToLong(value: Any): Option[Long]

    Converts primitive value to Long

    Converts primitive value to Long

    value

    Value to convert to long

    returns

    Some long value if conversion was successful or None

    Note

    Date and time related types are converted to Epoch long

  19. def primitiveValToString(value: Any, dtAsLong: Boolean = false): String

    Converts value of primitive to string.

    Converts value of primitive to string.

    value

    value to convert to string

    dtAsLong

    Boolean flag indicating whether date and time related types should be converted to Epoch before converting to string.

    returns

    String representation of a value

  20. def seqToString(seq: Seq[_], acc: String = ""): String

    Recursive function to convert sequence of values (possibly may contain nested traversable structures) to string.

    Recursive function to convert sequence of values (possibly may contain nested traversable structures) to string. String representation is just a concatenation of all primitive values converted to string.

    seq

    Sequence to convert to string

    acc

    String accumulator used to store already converted elements.

    returns

    String representation of a sequence

    Annotations
    @tailrec()
    Note

    This kind of conversion is used in distinctValues and duplicateValues metric calculators which use Set to store all unique column tuples. Therefore, these tuples needs to be serialized as a single string to be properly put to Set. Alternatively, we could've proceed with serialization to byte array, but benchmarking showed that string serialization works better (mostly because of lower GC workload).

    ,

    Maps and Sets do not guarantee the order of traversing elements, therefore, concatenation of string representation if their elements could yield different result for the collection with the same elements. On the other hand, Scala guarantees that set or map with the same elements will yield the same hashcode. Thus, we chose that approach to represent maps and sets as string. This is sufficient for the purpose of finding unique column values.

  21. def stringToLocalDateTime(str: String, dateFormat: String): Option[LocalDateTime]

    Converts string to LocalDateTime object provided with format string.

    Converts string to LocalDateTime object provided with format string.

    str

    String to convert to LocalDateTime

    dateFormat

    Format string

    returns

    Some LocalDateTime instance if conversion was successful or None

  22. def stringToTimestamp(str: String, dateFormat: String): Option[Long]

    Coverts string to Timestamp using Spark TimestampFormatter provided with format string.

    Coverts string to Timestamp using Spark TimestampFormatter provided with format string.

    str

    String to convert to Timestamp

    dateFormat

    Format string

    returns

    Some long value if conversion was successful or None.

  23. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  24. def toString(): String
    Definition Classes
    AnyRef → Any
  25. def tryToDate(value: Any, dateFormat: String): Option[LocalDateTime]

    Tries to cast primitive value to LocalDateTime object for use in date-related metrics calculators.

    Tries to cast primitive value to LocalDateTime object for use in date-related metrics calculators.

    value

    Value to cast

    dateFormat

    Date format used for casting

    returns

    Optional LocalDateTime object (None if casting wasn't successful)

    Note

    Metric calculators are not intended to work with complex data types. Therefore, only primitive types can be converted to LocalDateTime as well as byte arrays. Attempt to convert complex data type such as Map or StructType will return None.

  26. def tryToDouble(value: Any): Option[Double]

    Tries to cast primitive value to Double.

    Tries to cast primitive value to Double. Used in metric calculators.

    value

    value to cast

    returns

    Optional Double value (None if casting wasn't successful)

    Note

    Metric calculators are not intended to work complex data types. Therefore, only primitive types are converted to double as well as byte arrays. Attempt to convert complex data type such as Map or StructType will return None.

  27. def tryToLong(value: Any): Option[Long]

    Tries to cast any value to Long.

    Tries to cast any value to Long. Used in metric calculators.

    value

    value to cast

    returns

    Optional Long value (None if casting wasn't successful)

    Note

    Metric calculators are not intended to work complex data types. Therefore, only primitive types are converted to long as well as byte arrays. Attempt to convert complex data type such as Map or StructType will return None.

  28. def tryToString(value: Any): Option[String]

    Tries to cast primitive value to String.

    Tries to cast primitive value to String. Used in metric calculators.

    value

    value to cast

    returns

    Optional of String value (None if casting wasn't successful)

    Note

    Metric calculators are not intended to work complex data types. Therefore, only primitive types are converted to string as well as byte arrays. Attempt to convert complex data type such as Map or StructType will return None.

  29. def tryToTimestamp(value: Any, dateFormat: String): Option[Long]

    Tries to cast primitive value to Timestamp long.

    Tries to cast primitive value to Timestamp long. Used in formattedDate metric to support checking not only full date or time string but also a part of date or time, e.g. parse year only with format string yyyy

    value

    Value to cast

    dateFormat

    Date format used for casting

    returns

    Optional timestamp long or None if casting wasn't successful.

    Note

    Metric calculators are not intended to work with complex data types. Therefore, only primitive types can be converted to Timestamp as well as byte arrays. Attempt to convert complex data type such as Map or StructType will return None.

  30. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  31. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  32. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()

Inherited from AnyRef

Inherited from Any

Ungrouped