object Casting
Helpers used to convert values of type Any to desirable type.
The indent of these helpers is to manage values obtained from Spark Row to desired type for use in metric calculators.
As the Spark Row can stores elements of various type then we need to guess (pattern match) it to provide an appropriate conversion method.
For that purpose we will follow Spark SQL Type to Java types mapping:
- BooleanType -> java.lang.Boolean
- ByteType -> java.lang.Byte
- ShortType -> java.lang.Short
- IntegerType -> java.lang.Integer
- LongType -> java.lang.Long
- FloatType -> java.lang.Float
- DoubleType -> java.lang.Double
- StringType -> String
- DecimalType -> java.math.BigDecimal
- DateType -> java.sql.Date if spark.sql.datetime.java8API.enabled is false
- DateType -> java.time.LocalDate if spark.sql.datetime.java8API.enabled is true
- TimestampType -> java.sql.Timestamp if spark.sql.datetime.java8API.enabled is false
- TimestampType -> java.time.Instant if spark.sql.datetime.java8API.enabled is true
- BinaryType -> byte array
- ArrayType -> scala.collection.Seq (use getList for java.util.List)
- MapType -> scala.collection.Map (use getJavaMap for java.util.Map)
- StructType -> org.apache.spark.sql.Row
- Alphabetic
- By Inheritance
- Casting
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
getDoubleFromBytes(b: Array[Byte]): Option[Double]
Tries to convert array of bytes to double.
Tries to convert array of bytes to double. The approach on casting depends on size of array:
- in case of empty array return None
- in case of single byte return this byte converted to double
- in case of two bytes retrieve short number and convert it to double
- in case of four bytes retrieve integer number and convert it to double
- in case of eight bytes retrieve double itself
- for other lengths try to convert array to string and string to double.
- None if none of the above was successful.
- b
Byte array to convert to double
- returns
Some double of conversion was successful or None
-
def
getLongFromBytes(b: Array[Byte]): Option[Long]
Tries to convert array of bytes to long.
Tries to convert array of bytes to long. The approach will differ on size of array:
- in case of empty array return None
- in case of single byte return this byte converted to long
- in case of two bytes retrieve short number and convert it to long
- in case of four bytes retrieve integer number and convert it to long
- in case of eight bytes retrieve long itself
- for other lengths try to convert array to string and string to long.
- None if none of the above was successful.
- b
Byte array to convert to long
- returns
Some long of conversion was successful or None
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
def
primitiveValToDouble(value: Any): Option[Double]
Converts primitive value to Double.
Converts primitive value to Double.
- value
Value to convert to double
- returns
Some double value if conversion was successful or None
- Note
Date and time related types are converted to Epoch and then to Double
-
def
primitiveValToLong(value: Any): Option[Long]
Converts primitive value to Long
Converts primitive value to Long
- value
Value to convert to long
- returns
Some long value if conversion was successful or None
- Note
Date and time related types are converted to Epoch long
-
def
primitiveValToString(value: Any, dtAsLong: Boolean = false): String
Converts value of primitive to string.
Converts value of primitive to string.
- value
value to convert to string
- dtAsLong
Boolean flag indicating whether date and time related types should be converted to Epoch before converting to string.
- returns
String representation of a value
-
def
seqToString(seq: Seq[_], acc: String = ""): String
Recursive function to convert sequence of values (possibly may contain nested traversable structures) to string.
Recursive function to convert sequence of values (possibly may contain nested traversable structures) to string. String representation is just a concatenation of all primitive values converted to string.
- seq
Sequence to convert to string
- acc
String accumulator used to store already converted elements.
- returns
String representation of a sequence
- Annotations
- @tailrec()
- Note
This kind of conversion is used in distinctValues and duplicateValues metric calculators which use Set to store all unique column tuples. Therefore, these tuples needs to be serialized as a single string to be properly put to Set. Alternatively, we could've proceed with serialization to byte array, but benchmarking showed that string serialization works better (mostly because of lower GC workload).
,Maps and Sets do not guarantee the order of traversing elements, therefore, concatenation of string representation if their elements could yield different result for the collection with the same elements. On the other hand, Scala guarantees that set or map with the same elements will yield the same hashcode. Thus, we chose that approach to represent maps and sets as string. This is sufficient for the purpose of finding unique column values.
-
def
stringToLocalDateTime(str: String, dateFormat: String): Option[LocalDateTime]
Converts string to LocalDateTime object provided with format string.
Converts string to LocalDateTime object provided with format string.
- str
String to convert to LocalDateTime
- dateFormat
Format string
- returns
Some LocalDateTime instance if conversion was successful or None
-
def
stringToTimestamp(str: String, dateFormat: String): Option[Long]
Coverts string to Timestamp using Spark TimestampFormatter provided with format string.
Coverts string to Timestamp using Spark TimestampFormatter provided with format string.
- str
String to convert to Timestamp
- dateFormat
Format string
- returns
Some long value if conversion was successful or None.
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
def
tryToDate(value: Any, dateFormat: String): Option[LocalDateTime]
Tries to cast primitive value to LocalDateTime object for use in date-related metrics calculators.
Tries to cast primitive value to LocalDateTime object for use in date-related metrics calculators.
- value
Value to cast
- dateFormat
Date format used for casting
- returns
Optional LocalDateTime object (None if casting wasn't successful)
- Note
Metric calculators are not intended to work with complex data types. Therefore, only primitive types can be converted to LocalDateTime as well as byte arrays. Attempt to convert complex data type such as Map or StructType will return None.
-
def
tryToDouble(value: Any): Option[Double]
Tries to cast primitive value to Double.
Tries to cast primitive value to Double. Used in metric calculators.
- value
value to cast
- returns
Optional Double value (None if casting wasn't successful)
- Note
Metric calculators are not intended to work complex data types. Therefore, only primitive types are converted to double as well as byte arrays. Attempt to convert complex data type such as Map or StructType will return None.
-
def
tryToLong(value: Any): Option[Long]
Tries to cast any value to Long.
Tries to cast any value to Long. Used in metric calculators.
- value
value to cast
- returns
Optional Long value (None if casting wasn't successful)
- Note
Metric calculators are not intended to work complex data types. Therefore, only primitive types are converted to long as well as byte arrays. Attempt to convert complex data type such as Map or StructType will return None.
-
def
tryToString(value: Any): Option[String]
Tries to cast primitive value to String.
Tries to cast primitive value to String. Used in metric calculators.
- value
value to cast
- returns
Optional of String value (None if casting wasn't successful)
- Note
Metric calculators are not intended to work complex data types. Therefore, only primitive types are converted to string as well as byte arrays. Attempt to convert complex data type such as Map or StructType will return None.
-
def
tryToTimestamp(value: Any, dateFormat: String): Option[Long]
Tries to cast primitive value to Timestamp long.
Tries to cast primitive value to Timestamp long. Used in formattedDate metric to support checking not only full date or time string but also a part of date or time, e.g. parse year only with format string
yyyy
- value
Value to cast
- dateFormat
Date format used for casting
- returns
Optional timestamp long or None if casting wasn't successful.
- Note
Metric calculators are not intended to work with complex data types. Therefore, only primitive types can be converted to Timestamp as well as byte arrays. Attempt to convert complex data type such as Map or StructType will return None.
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()