Sources

object Sources

Linear Supertypes

AnyRef, Any

Ordering

Alphabetic
By Inheritance

Inherited

Sources
AnyRef
Any

Hide All
Show All

Visibility

Public
All

Type Members

final case class AggregateVirtualSourceConfig(id: ID, description: Option[NonEmptyString], parentSources: SingleElemStringSeq, groupBy: NonEmptyStringSeq, expr: Refined[Seq[Column], NonEmpty], persist: Option[StorageLevel], save: Option[FileOutputConfig], keyFields: Seq[NonEmptyString] = Seq.empty, metadata: Seq[SparkParam] = Seq.empty) extends VirtualSourceConfig with Product with Serializable
Aggregate virtual source configuration
Aggregate virtual source configuration
id
Virtual source ID
description
Source description
parentSources
Sequence containing exactly one source.
groupBy
Non-empty sequence of columns by which to perform grouping
expr
Non-empty sequence of spark sql expression used to get aggregated columns. One expression per each resultant column
persist
Spark storage level in order to persist dataframe during job execution.
save
Configuration to save virtual source as a file.
keyFields
Sequence of key fields (columns that identify data row)
metadata
List of metadata parameters specific to this source
final case class AvroFileSourceConfig(id: ID, description: Option[NonEmptyString], path: URI, schema: Option[ID], persist: Option[StorageLevel], windowBy: StreamWindowing = ProcessingTime, options: Seq[SparkParam] = Seq.empty, keyFields: Seq[NonEmptyString] = Seq.empty, metadata: Seq[SparkParam] = Seq.empty) extends FileSourceConfig with AvroFileConfig with Product with Serializable
Avro file source configuration
Avro file source configuration
id
Source ID
description
Source description
path
Path to file
schema
Schema ID
persist
Spark storage level in order to persist dataframe during job execution.
windowBy
Source of timestamp used to build windows. Applicable only for streaming jobs! Default: processingTime - uses current timestamp at the moment when Spark processes row. Other options are:
- eventTime - uses column with name 'timestamp' (column must be of TimestampType).
- customTime(columnName) - uses arbitrary user-defined column (column must be of TimestampType)
options
List of additional spark options required to read the source (if any)
keyFields
Sequence of key fields (columns that identify data row)
metadata
List of metadata parameters specific to this source
final case class CustomSource(id: ID, description: Option[NonEmptyString], format: NonEmptyString, path: Option[URI], schema: Option[ID], persist: Option[StorageLevel], options: Seq[SparkParam] = Seq.empty, keyFields: Seq[NonEmptyString] = Seq.empty, metadata: Seq[SparkParam] = Seq.empty) extends SourceConfig with Product with Serializable
Custom source configuration: used to read from source types that are not supported explicitly.
Custom source configuration: used to read from source types that are not supported explicitly.
id
Source ID
description
Source description
format
Source format to set in spark reader.
path
Path to load the source from (if required)
schema
Explicit schema applied to source data (if required)
persist
Spark storage level in order to persist dataframe during job execution.
options
List of additional spark options required to read the source (if any)
keyFields
Sequence of key fields (columns that identify data row)
metadata
List of metadata parameters specific to this source
final case class DelimitedFileSourceConfig(id: ID, description: Option[NonEmptyString], path: URI, schema: Option[ID], persist: Option[StorageLevel], delimiter: NonEmptyString = ",", quote: NonEmptyString = "\"", escape: NonEmptyString = "\\", header: Boolean = false, windowBy: StreamWindowing = ProcessingTime, options: Seq[SparkParam] = Seq.empty, keyFields: Seq[NonEmptyString] = Seq.empty, metadata: Seq[SparkParam] = Seq.empty) extends FileSourceConfig with DelimitedFileConfig with Product with Serializable
Delimited file source configuration
Delimited file source configuration
id
Source ID
description
Source description
path
Path to file
schema
Schema ID (only if header = false)
persist
Spark storage level in order to persist dataframe during job execution.
delimiter
Column delimiter (default: ,)
quote
Quotation symbol (default: ")
escape
Escape symbol (default: \)
header
Boolean flag indicating whether schema should be read from file header (default: false)
windowBy
Source of timestamp used to build windows. Applicable only for streaming jobs! Default: processingTime - uses current timestamp at the moment when Spark processes row. Other options are:
- eventTime - uses column with name 'timestamp' (column must be of TimestampType).
- customTime(columnName) - uses arbitrary user-defined column (column must be of TimestampType)
options
List of additional spark options required to read the source (if any)
keyFields
Sequence of key fields (columns that identify data row)
metadata
List of metadata parameters specific to this source
sealed abstract class FileSourceConfig extends SourceConfig
Base class for file source configurations.
Base class for file source configurations. All file sources are streamable and therefore must contain windowBy parameter which defined source of timestamp used to build stream windows.
final case class FilterVirtualSourceConfig(id: ID, description: Option[NonEmptyString], parentSources: SingleElemStringSeq, expr: Refined[Seq[Column], NonEmpty], persist: Option[StorageLevel], save: Option[FileOutputConfig], windowBy: Option[StreamWindowing], keyFields: Seq[NonEmptyString] = Seq.empty, metadata: Seq[SparkParam] = Seq.empty) extends VirtualSourceConfig with Product with Serializable
Filter virtual source configuration
Filter virtual source configuration
id
Virtual source ID
description
Source description
parentSources
Sequence containing exactly one source.
expr
Non-empty sequence of spark sql expression used to filter source. All expressions must return boolean. Source is filtered using logical conjunction of all provided expressions.
persist
Spark storage level in order to persist dataframe during job execution.
save
Configuration to save virtual source as a file.
keyFields
Sequence of key fields (columns that identify data row)
metadata
List of metadata parameters specific to this source
final case class FixedFileSourceConfig(id: ID, description: Option[NonEmptyString], path: URI, schema: Option[ID], persist: Option[StorageLevel], windowBy: StreamWindowing = ProcessingTime, options: Seq[SparkParam] = Seq.empty, keyFields: Seq[NonEmptyString] = Seq.empty, metadata: Seq[SparkParam] = Seq.empty) extends FileSourceConfig with FixedFileConfig with Product with Serializable
Fixed-width file source configuration
Fixed-width file source configuration
id
Source ID
description
Source description
path
Path to file
schema
Schema ID (must be either fixedFull or fixedShort schema)
persist
Spark storage level in order to persist dataframe during job execution.
windowBy
Source of timestamp used to build windows. Applicable only for streaming jobs! Default: processingTime - uses current timestamp at the moment when Spark processes row. Other options are:
- eventTime - uses column with name 'timestamp' (column must be of TimestampType).
- customTime(columnName) - uses arbitrary user-defined column (column must be of TimestampType)
options
List of additional spark options required to read the source (if any)
keyFields
Sequence of key fields (columns that identify data row)
metadata
List of metadata parameters specific to this source
final case class GreenplumSourceConfig(id: ID, description: Option[NonEmptyString], connection: ID, table: Option[NonEmptyString], persist: Option[StorageLevel], options: Seq[SparkParam] = Seq.empty, keyFields: Seq[NonEmptyString] = Seq.empty, metadata: Seq[SparkParam] = Seq.empty) extends SourceConfig with Product with Serializable
Greenplum Table source configuration
Greenplum Table source configuration
id
Source ID
description
Source description
connection
Connection ID (must be pivotal connection)
table
Table to read
persist
Spark storage level in order to persist dataframe during job execution.
options
List of additional spark options required to read the source (if any)
keyFields
Sequence of key fields (columns that identify data row)
metadata
List of metadata parameters specific to this source
final case class HivePartition(name: NonEmptyString, expr: Option[Column], values: Seq[NonEmptyString] = Seq.empty) extends Product with Serializable
Configuration for Hive Table partition values to read.
Configuration for Hive Table partition values to read.
name
Name of partition column
expr
SQL Expression used to filter partitions to read
values
Sequence of partition values to read
final case class HiveSourceConfig(id: ID, description: Option[NonEmptyString], schema: NonEmptyString, table: NonEmptyString, persist: Option[StorageLevel], partitions: Seq[HivePartition] = Seq.empty, options: Seq[SparkParam] = Seq.empty, keyFields: Seq[NonEmptyString] = Seq.empty, metadata: Seq[SparkParam] = Seq.empty) extends SourceConfig with Product with Serializable
Hive table source configuration
Hive table source configuration
id
Source ID
description
Source description
schema
Hive schema
table
Hive table
persist
Spark storage level in order to persist dataframe during job execution.
partitions
Sequence of partitions to read. The order of partition columns should correspond to order in which partition columns are defined in hive table DDL.
options
Sequence of additional Kafka options
keyFields
Sequence of key fields (columns that identify data row)
metadata
List of metadata parameters specific to this source
final case class JoinVirtualSourceConfig(id: ID, description: Option[NonEmptyString], parentSources: DoubleElemStringSeq, joinBy: NonEmptyStringSeq, joinType: SparkJoinType, persist: Option[StorageLevel], save: Option[FileOutputConfig], keyFields: Seq[NonEmptyString] = Seq.empty, metadata: Seq[SparkParam] = Seq.empty) extends VirtualSourceConfig with Product with Serializable
Join virtual source configuration
Join virtual source configuration
id
Virtual source ID
description
Source description
parentSources
Sequence of exactly two parent sources.
joinBy
Non-empty sequence of columns to join by.
joinType
Spark join type.
persist
Spark storage level in order to persist dataframe during job execution.
save
Configuration to save virtual source as a file.
keyFields
Sequence of key fields (columns that identify data row)
metadata
List of metadata parameters specific to this source
final case class KafkaSourceConfig(id: ID, description: Option[NonEmptyString], connection: ID, topics: Seq[NonEmptyString] = Seq.empty, topicPattern: Option[NonEmptyString], startingOffsets: Option[NonEmptyString], endingOffsets: Option[NonEmptyString], persist: Option[StorageLevel], windowBy: StreamWindowing = ProcessingTime, keyFormat: KafkaTopicFormat = KafkaTopicFormat.String, valueFormat: KafkaTopicFormat = KafkaTopicFormat.String, keySchema: Option[ID] = None, valueSchema: Option[ID] = None, subtractSchemaId: Boolean = false, options: Seq[SparkParam] = Seq.empty, keyFields: Seq[NonEmptyString] = Seq.empty, metadata: Seq[SparkParam] = Seq.empty) extends SourceConfig with Product with Serializable
Kafka source configuration
Kafka source configuration
id
Source ID
description
Source description
connection
Connection ID (must be a Kafka Connection)
topics
Sequence of topics to read
topicPattern
Pattern that defined topics to read
startingOffsets
Json-string defining starting offsets. If none is set, then "earliest" is used in batch jobs and "latest is used in streaming jobs.
endingOffsets
Json-string defining ending offset. Applicable only to batch jobs. If none is set then "latest" is used.
persist
Spark storage level in order to persist dataframe during job execution.
windowBy
Source of timestamp used to build windows. Applicable only for streaming jobs! Default: processingTime - uses current timestamp at the moment when Spark processes row. Other options are:
- eventTime - uses Kafka message creation timestamp.
- customTime(columnName) - uses arbitrary user-defined column from kafka message (column must be of TimestampType)
keyFormat
Message key format. Default: string.
valueFormat
Message value format. Default: string.
keySchema
Schema ID. Used to parse message key. Ignored when keyFormat is string. Mandatory for other formats.
valueSchema
Schema ID. Used to parse message value. Ignored when valueFormat is string. Mandatory for other formats. Used to parse kafka message value.
subtractSchemaId
Boolean flag indicating whether a kafka message schema ID encoded into its value, i.e. [1 Magic Byte] + [4 Schema ID Bytes] + [Message Value Binary Data]. If set to true, then first five bytes are subtracted before value parsing. Default: false
options
Sequence of additional Kafka options
keyFields
Sequence of key fields (columns that identify data row)
metadata
List of metadata parameters specific to this source
final case class OrcFileSourceConfig(id: ID, description: Option[NonEmptyString], path: URI, schema: Option[ID], persist: Option[StorageLevel], windowBy: StreamWindowing = ProcessingTime, options: Seq[SparkParam] = Seq.empty, keyFields: Seq[NonEmptyString] = Seq.empty, metadata: Seq[SparkParam] = Seq.empty) extends FileSourceConfig with OrcFileConfig with Product with Serializable
Orc file source configuration
Orc file source configuration
id
Source ID
description
Source description
path
Path to file
schema
Schema ID
persist
Spark storage level in order to persist dataframe during job execution.
windowBy
Source of timestamp used to build windows. Applicable only for streaming jobs! Default: processingTime - uses current timestamp at the moment when Spark processes row. Other options are:
- eventTime - uses column with name 'timestamp' (column must be of TimestampType).
- customTime(columnName) - uses arbitrary user-defined column (column must be of TimestampType)
options
List of additional spark options required to read the source (if any)
keyFields
Sequence of key fields (columns that identify data row)
metadata
List of metadata parameters specific to this source
final case class ParquetFileSourceConfig(id: ID, description: Option[NonEmptyString], path: URI, schema: Option[ID], persist: Option[StorageLevel], windowBy: StreamWindowing = ProcessingTime, options: Seq[SparkParam] = Seq.empty, keyFields: Seq[NonEmptyString] = Seq.empty, metadata: Seq[SparkParam] = Seq.empty) extends FileSourceConfig with ParquetFileConfig with Product with Serializable
Parquet file source configuration
Parquet file source configuration
id
Source ID
description
Source description
path
Path to file
schema
Schema ID
persist
Spark storage level in order to persist dataframe during job execution.
windowBy
Source of timestamp used to build windows. Applicable only for streaming jobs! Default: processingTime - uses current timestamp at the moment when Spark processes row. Other options are:
- eventTime - uses column with name 'timestamp' (column must be of TimestampType).
- customTime(columnName) - uses arbitrary user-defined column (column must be of TimestampType)
options
List of additional spark options required to read the source (if any)
keyFields
Sequence of key fields (columns that identify data row)
metadata
List of metadata parameters specific to this source
final case class SelectVirtualSourceConfig(id: ID, description: Option[NonEmptyString], parentSources: SingleElemStringSeq, expr: Refined[Seq[Column], NonEmpty], persist: Option[StorageLevel], save: Option[FileOutputConfig], windowBy: Option[StreamWindowing], keyFields: Seq[NonEmptyString] = Seq.empty, metadata: Seq[SparkParam] = Seq.empty) extends VirtualSourceConfig with Product with Serializable
Select virtual source configuration
Select virtual source configuration
id
Virtual source ID
description
Source description
parentSources
Sequence containing exactly one source.
expr
Non-empty sequence of spark sql expression used select column from parent source. One expression per each resultant column
persist
Spark storage level in order to persist dataframe during job execution.
save
Configuration to save virtual source as a file.
keyFields
Sequence of key fields (columns that identify data row)
metadata
List of metadata parameters specific to this source
sealed abstract class SourceConfig extends JobConfigEntity
Base class for all source configurations.
Base class for all source configurations. All sources are described as DQ entities that might have an optional sequence of keyFields which will uniquely identify data row in error collection reports. It should also be indicated whether this source is streamable or not. Additionally, the persist field specifies an optional storage level in order to persist source during job execution.
final case class SourcesConfig(table: Seq[TableSourceConfig] = Seq.empty, hive: Seq[HiveSourceConfig] = Seq.empty, kafka: Seq[KafkaSourceConfig] = Seq.empty, greenplum: Seq[GreenplumSourceConfig] = Seq.empty, file: Seq[FileSourceConfig] = Seq.empty, custom: Seq[CustomSource] = Seq.empty) extends Product with Serializable
Data Quality job configuration section describing sources
Data Quality job configuration section describing sources
table
Sequence of table sources (read from JDBC connections)
hive
Sequence of Hive table sources
kafka
Sequence of sources based on Kafka topics
greenplum
Sequence of greenplum sources (read from pivotal connections)
file
Sequence of file sources
custom
Sequence of custom sources
final case class SqlVirtualSourceConfig(id: ID, description: Option[NonEmptyString], parentSources: NonEmptyStringSeq, query: NonEmptyString, persist: Option[StorageLevel], save: Option[FileOutputConfig], keyFields: Seq[NonEmptyString] = Seq.empty, metadata: Seq[SparkParam] = Seq.empty) extends VirtualSourceConfig with Product with Serializable
Sql virtual source configuration
Sql virtual source configuration
id
Virtual source ID
description
Source description
parentSources
Non-empty sequence of parent sources
query
SQL query to build virtual source from parent sources
persist
Spark storage level in order to persist dataframe during job execution.
save
Configuration to save virtual source as a file.
keyFields
Sequence of key fields (columns that identify data row)
metadata
List of metadata parameters specific to this source
final case class StreamSourcesConfig(kafka: Seq[KafkaSourceConfig] = Seq.empty, file: Seq[FileSourceConfig] = Seq.empty) extends Product with Serializable
Data Quality job configuration section describing streams
Data Quality job configuration section describing streams
kafka
Sequence of streams based on Kafka topics
file
Sequence of streams based on file sources
final case class TableSourceConfig(id: ID, description: Option[NonEmptyString], connection: ID, table: Option[NonEmptyString], query: Option[NonEmptyString], persist: Option[StorageLevel], options: Seq[SparkParam] = Seq.empty, keyFields: Seq[NonEmptyString] = Seq.empty, metadata: Seq[SparkParam] = Seq.empty) extends SourceConfig with Product with Serializable
JDBC Table source configuration
JDBC Table source configuration
id
Source ID
description
Source description
connection
Connection ID (must be JDBC connection)
table
Table to read
query
Query to execute
persist
Spark storage level in order to persist dataframe during job execution.
options
List of additional spark options required to read the source (if any)
keyFields
Sequence of key fields (columns that identify data row)
metadata
List of metadata parameters specific to this source

Note
Either table to read or query to execute must be defined but not both.
sealed abstract class VirtualSourceConfig extends SourceConfig
Base class for all virtual source configurations.
Base class for all virtual source configurations. In addition to basic source configuration, virtual sources might have following optional parameter:
- save configuration in order to save virtual source as a file.

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def asInstanceOf[T0]: T0

Definition Classes
Any
def clone(): AnyRef

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( ... ) @native()
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def finalize(): Unit

Attributes
protected[lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
Annotations
@native()
def hashCode(): Int

Definition Classes
AnyRef → Any
Annotations
@native()
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
Annotations
@native()
final def notifyAll(): Unit

Definition Classes
AnyRef
Annotations
@native()
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
AnyRef → Any
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... ) @native()

Packages

Sources

object Sources

Type Members

Value Members

Inherited from AnyRef

Inherited from Any

Ungrouped

Packages

Sources 

object Sources

Type Members

Value Members

Inherited from AnyRef

Inherited from Any

Ungrouped

Sources