object Sources
- Alphabetic
- By Inheritance
- Sources
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Type Members
-
final
case class
AggregateVirtualSourceConfig(id: ID, description: Option[NonEmptyString], parentSources: SingleElemStringSeq, groupBy: NonEmptyStringSeq, expr: Refined[Seq[Column], NonEmpty], persist: Option[StorageLevel], save: Option[FileOutputConfig], keyFields: Seq[NonEmptyString] = Seq.empty, metadata: Seq[SparkParam] = Seq.empty) extends VirtualSourceConfig with Product with Serializable
Aggregate virtual source configuration
Aggregate virtual source configuration
- id
Virtual source ID
- description
Source description
- parentSources
Sequence containing exactly one source.
- groupBy
Non-empty sequence of columns by which to perform grouping
- expr
Non-empty sequence of spark sql expression used to get aggregated columns. One expression per each resultant column
- persist
Spark storage level in order to persist dataframe during job execution.
- save
Configuration to save virtual source as a file.
- keyFields
Sequence of key fields (columns that identify data row)
- metadata
List of metadata parameters specific to this source
-
final
case class
AvroFileSourceConfig(id: ID, description: Option[NonEmptyString], path: URI, schema: Option[ID], persist: Option[StorageLevel], windowBy: StreamWindowing = ProcessingTime, options: Seq[SparkParam] = Seq.empty, keyFields: Seq[NonEmptyString] = Seq.empty, metadata: Seq[SparkParam] = Seq.empty) extends FileSourceConfig with AvroFileConfig with Product with Serializable
Avro file source configuration
Avro file source configuration
- id
Source ID
- description
Source description
- path
Path to file
- schema
Schema ID
- persist
Spark storage level in order to persist dataframe during job execution.
- windowBy
Source of timestamp used to build windows. Applicable only for streaming jobs! Default: processingTime - uses current timestamp at the moment when Spark processes row. Other options are:
- eventTime - uses column with name 'timestamp' (column must be of TimestampType).
- customTime(columnName) - uses arbitrary user-defined column (column must be of TimestampType)
- options
List of additional spark options required to read the source (if any)
- keyFields
Sequence of key fields (columns that identify data row)
- metadata
List of metadata parameters specific to this source
-
final
case class
CustomSource(id: ID, description: Option[NonEmptyString], format: NonEmptyString, path: Option[URI], schema: Option[ID], persist: Option[StorageLevel], options: Seq[SparkParam] = Seq.empty, keyFields: Seq[NonEmptyString] = Seq.empty, metadata: Seq[SparkParam] = Seq.empty) extends SourceConfig with Product with Serializable
Custom source configuration: used to read from source types that are not supported explicitly.
Custom source configuration: used to read from source types that are not supported explicitly.
- id
Source ID
- description
Source description
- format
Source format to set in spark reader.
- path
Path to load the source from (if required)
- schema
Explicit schema applied to source data (if required)
- persist
Spark storage level in order to persist dataframe during job execution.
- options
List of additional spark options required to read the source (if any)
- keyFields
Sequence of key fields (columns that identify data row)
- metadata
List of metadata parameters specific to this source
-
final
case class
DelimitedFileSourceConfig(id: ID, description: Option[NonEmptyString], path: URI, schema: Option[ID], persist: Option[StorageLevel], delimiter: NonEmptyString = ",", quote: NonEmptyString = "\"", escape: NonEmptyString = "\\", header: Boolean = false, windowBy: StreamWindowing = ProcessingTime, options: Seq[SparkParam] = Seq.empty, keyFields: Seq[NonEmptyString] = Seq.empty, metadata: Seq[SparkParam] = Seq.empty) extends FileSourceConfig with DelimitedFileConfig with Product with Serializable
Delimited file source configuration
Delimited file source configuration
- id
Source ID
- description
Source description
- path
Path to file
- schema
Schema ID (only if header = false)
- persist
Spark storage level in order to persist dataframe during job execution.
- delimiter
Column delimiter (default: ,)
- quote
Quotation symbol (default: ")
- escape
Escape symbol (default: \)
- header
Boolean flag indicating whether schema should be read from file header (default: false)
- windowBy
Source of timestamp used to build windows. Applicable only for streaming jobs! Default: processingTime - uses current timestamp at the moment when Spark processes row. Other options are:
- eventTime - uses column with name 'timestamp' (column must be of TimestampType).
- customTime(columnName) - uses arbitrary user-defined column (column must be of TimestampType)
- options
List of additional spark options required to read the source (if any)
- keyFields
Sequence of key fields (columns that identify data row)
- metadata
List of metadata parameters specific to this source
-
sealed abstract
class
FileSourceConfig extends SourceConfig
Base class for file source configurations.
Base class for file source configurations. All file sources are streamable and therefore must contain windowBy parameter which defined source of timestamp used to build stream windows.
-
final
case class
FilterVirtualSourceConfig(id: ID, description: Option[NonEmptyString], parentSources: SingleElemStringSeq, expr: Refined[Seq[Column], NonEmpty], persist: Option[StorageLevel], save: Option[FileOutputConfig], windowBy: Option[StreamWindowing], keyFields: Seq[NonEmptyString] = Seq.empty, metadata: Seq[SparkParam] = Seq.empty) extends VirtualSourceConfig with Product with Serializable
Filter virtual source configuration
Filter virtual source configuration
- id
Virtual source ID
- description
Source description
- parentSources
Sequence containing exactly one source.
- expr
Non-empty sequence of spark sql expression used to filter source. All expressions must return boolean. Source is filtered using logical conjunction of all provided expressions.
- persist
Spark storage level in order to persist dataframe during job execution.
- save
Configuration to save virtual source as a file.
- keyFields
Sequence of key fields (columns that identify data row)
- metadata
List of metadata parameters specific to this source
-
final
case class
FixedFileSourceConfig(id: ID, description: Option[NonEmptyString], path: URI, schema: Option[ID], persist: Option[StorageLevel], windowBy: StreamWindowing = ProcessingTime, options: Seq[SparkParam] = Seq.empty, keyFields: Seq[NonEmptyString] = Seq.empty, metadata: Seq[SparkParam] = Seq.empty) extends FileSourceConfig with FixedFileConfig with Product with Serializable
Fixed-width file source configuration
Fixed-width file source configuration
- id
Source ID
- description
Source description
- path
Path to file
- schema
Schema ID (must be either fixedFull or fixedShort schema)
- persist
Spark storage level in order to persist dataframe during job execution.
- windowBy
Source of timestamp used to build windows. Applicable only for streaming jobs! Default: processingTime - uses current timestamp at the moment when Spark processes row. Other options are:
- eventTime - uses column with name 'timestamp' (column must be of TimestampType).
- customTime(columnName) - uses arbitrary user-defined column (column must be of TimestampType)
- options
List of additional spark options required to read the source (if any)
- keyFields
Sequence of key fields (columns that identify data row)
- metadata
List of metadata parameters specific to this source
-
final
case class
GreenplumSourceConfig(id: ID, description: Option[NonEmptyString], connection: ID, table: Option[NonEmptyString], persist: Option[StorageLevel], options: Seq[SparkParam] = Seq.empty, keyFields: Seq[NonEmptyString] = Seq.empty, metadata: Seq[SparkParam] = Seq.empty) extends SourceConfig with Product with Serializable
Greenplum Table source configuration
Greenplum Table source configuration
- id
Source ID
- description
Source description
- connection
Connection ID (must be pivotal connection)
- table
Table to read
- persist
Spark storage level in order to persist dataframe during job execution.
- options
List of additional spark options required to read the source (if any)
- keyFields
Sequence of key fields (columns that identify data row)
- metadata
List of metadata parameters specific to this source
-
final
case class
HivePartition(name: NonEmptyString, expr: Option[Column], values: Seq[NonEmptyString] = Seq.empty) extends Product with Serializable
Configuration for Hive Table partition values to read.
Configuration for Hive Table partition values to read.
- name
Name of partition column
- expr
SQL Expression used to filter partitions to read
- values
Sequence of partition values to read
-
final
case class
HiveSourceConfig(id: ID, description: Option[NonEmptyString], schema: NonEmptyString, table: NonEmptyString, persist: Option[StorageLevel], partitions: Seq[HivePartition] = Seq.empty, options: Seq[SparkParam] = Seq.empty, keyFields: Seq[NonEmptyString] = Seq.empty, metadata: Seq[SparkParam] = Seq.empty) extends SourceConfig with Product with Serializable
Hive table source configuration
Hive table source configuration
- id
Source ID
- description
Source description
- schema
Hive schema
- table
Hive table
- persist
Spark storage level in order to persist dataframe during job execution.
- partitions
Sequence of partitions to read. The order of partition columns should correspond to order in which partition columns are defined in hive table DDL.
- options
Sequence of additional Kafka options
- keyFields
Sequence of key fields (columns that identify data row)
- metadata
List of metadata parameters specific to this source
-
final
case class
JoinVirtualSourceConfig(id: ID, description: Option[NonEmptyString], parentSources: DoubleElemStringSeq, joinBy: NonEmptyStringSeq, joinType: SparkJoinType, persist: Option[StorageLevel], save: Option[FileOutputConfig], keyFields: Seq[NonEmptyString] = Seq.empty, metadata: Seq[SparkParam] = Seq.empty) extends VirtualSourceConfig with Product with Serializable
Join virtual source configuration
Join virtual source configuration
- id
Virtual source ID
- description
Source description
- parentSources
Sequence of exactly two parent sources.
- joinBy
Non-empty sequence of columns to join by.
- joinType
Spark join type.
- persist
Spark storage level in order to persist dataframe during job execution.
- save
Configuration to save virtual source as a file.
- keyFields
Sequence of key fields (columns that identify data row)
- metadata
List of metadata parameters specific to this source
-
final
case class
KafkaSourceConfig(id: ID, description: Option[NonEmptyString], connection: ID, topics: Seq[NonEmptyString] = Seq.empty, topicPattern: Option[NonEmptyString], startingOffsets: Option[NonEmptyString], endingOffsets: Option[NonEmptyString], persist: Option[StorageLevel], windowBy: StreamWindowing = ProcessingTime, keyFormat: KafkaTopicFormat = KafkaTopicFormat.String, valueFormat: KafkaTopicFormat = KafkaTopicFormat.String, keySchema: Option[ID] = None, valueSchema: Option[ID] = None, subtractSchemaId: Boolean = false, options: Seq[SparkParam] = Seq.empty, keyFields: Seq[NonEmptyString] = Seq.empty, metadata: Seq[SparkParam] = Seq.empty) extends SourceConfig with Product with Serializable
Kafka source configuration
Kafka source configuration
- id
Source ID
- description
Source description
- connection
Connection ID (must be a Kafka Connection)
- topics
Sequence of topics to read
- topicPattern
Pattern that defined topics to read
- startingOffsets
Json-string defining starting offsets. If none is set, then "earliest" is used in batch jobs and "latest is used in streaming jobs.
- endingOffsets
Json-string defining ending offset. Applicable only to batch jobs. If none is set then "latest" is used.
- persist
Spark storage level in order to persist dataframe during job execution.
- windowBy
Source of timestamp used to build windows. Applicable only for streaming jobs! Default: processingTime - uses current timestamp at the moment when Spark processes row. Other options are:
- eventTime - uses Kafka message creation timestamp.
- customTime(columnName) - uses arbitrary user-defined column from kafka message (column must be of TimestampType)
- keyFormat
Message key format. Default: string.
- valueFormat
Message value format. Default: string.
- keySchema
Schema ID. Used to parse message key. Ignored when keyFormat is string. Mandatory for other formats.
- valueSchema
Schema ID. Used to parse message value. Ignored when valueFormat is string. Mandatory for other formats. Used to parse kafka message value.
- subtractSchemaId
Boolean flag indicating whether a kafka message schema ID encoded into its value, i.e. [1 Magic Byte] + [4 Schema ID Bytes] + [Message Value Binary Data]. If set to
true
, then first five bytes are subtracted before value parsing. Default:false
- options
Sequence of additional Kafka options
- keyFields
Sequence of key fields (columns that identify data row)
- metadata
List of metadata parameters specific to this source
-
final
case class
OrcFileSourceConfig(id: ID, description: Option[NonEmptyString], path: URI, schema: Option[ID], persist: Option[StorageLevel], windowBy: StreamWindowing = ProcessingTime, options: Seq[SparkParam] = Seq.empty, keyFields: Seq[NonEmptyString] = Seq.empty, metadata: Seq[SparkParam] = Seq.empty) extends FileSourceConfig with OrcFileConfig with Product with Serializable
Orc file source configuration
Orc file source configuration
- id
Source ID
- description
Source description
- path
Path to file
- schema
Schema ID
- persist
Spark storage level in order to persist dataframe during job execution.
- windowBy
Source of timestamp used to build windows. Applicable only for streaming jobs! Default: processingTime - uses current timestamp at the moment when Spark processes row. Other options are:
- eventTime - uses column with name 'timestamp' (column must be of TimestampType).
- customTime(columnName) - uses arbitrary user-defined column (column must be of TimestampType)
- options
List of additional spark options required to read the source (if any)
- keyFields
Sequence of key fields (columns that identify data row)
- metadata
List of metadata parameters specific to this source
-
final
case class
ParquetFileSourceConfig(id: ID, description: Option[NonEmptyString], path: URI, schema: Option[ID], persist: Option[StorageLevel], windowBy: StreamWindowing = ProcessingTime, options: Seq[SparkParam] = Seq.empty, keyFields: Seq[NonEmptyString] = Seq.empty, metadata: Seq[SparkParam] = Seq.empty) extends FileSourceConfig with ParquetFileConfig with Product with Serializable
Parquet file source configuration
Parquet file source configuration
- id
Source ID
- description
Source description
- path
Path to file
- schema
Schema ID
- persist
Spark storage level in order to persist dataframe during job execution.
- windowBy
Source of timestamp used to build windows. Applicable only for streaming jobs! Default: processingTime - uses current timestamp at the moment when Spark processes row. Other options are:
- eventTime - uses column with name 'timestamp' (column must be of TimestampType).
- customTime(columnName) - uses arbitrary user-defined column (column must be of TimestampType)
- options
List of additional spark options required to read the source (if any)
- keyFields
Sequence of key fields (columns that identify data row)
- metadata
List of metadata parameters specific to this source
-
final
case class
SelectVirtualSourceConfig(id: ID, description: Option[NonEmptyString], parentSources: SingleElemStringSeq, expr: Refined[Seq[Column], NonEmpty], persist: Option[StorageLevel], save: Option[FileOutputConfig], windowBy: Option[StreamWindowing], keyFields: Seq[NonEmptyString] = Seq.empty, metadata: Seq[SparkParam] = Seq.empty) extends VirtualSourceConfig with Product with Serializable
Select virtual source configuration
Select virtual source configuration
- id
Virtual source ID
- description
Source description
- parentSources
Sequence containing exactly one source.
- expr
Non-empty sequence of spark sql expression used select column from parent source. One expression per each resultant column
- persist
Spark storage level in order to persist dataframe during job execution.
- save
Configuration to save virtual source as a file.
- keyFields
Sequence of key fields (columns that identify data row)
- metadata
List of metadata parameters specific to this source
-
sealed abstract
class
SourceConfig extends JobConfigEntity
Base class for all source configurations.
Base class for all source configurations. All sources are described as DQ entities that might have an optional sequence of keyFields which will uniquely identify data row in error collection reports. It should also be indicated whether this source is streamable or not. Additionally, the persist field specifies an optional storage level in order to persist source during job execution.
-
final
case class
SourcesConfig(table: Seq[TableSourceConfig] = Seq.empty, hive: Seq[HiveSourceConfig] = Seq.empty, kafka: Seq[KafkaSourceConfig] = Seq.empty, greenplum: Seq[GreenplumSourceConfig] = Seq.empty, file: Seq[FileSourceConfig] = Seq.empty, custom: Seq[CustomSource] = Seq.empty) extends Product with Serializable
Data Quality job configuration section describing sources
Data Quality job configuration section describing sources
- table
Sequence of table sources (read from JDBC connections)
- hive
Sequence of Hive table sources
- kafka
Sequence of sources based on Kafka topics
- greenplum
Sequence of greenplum sources (read from pivotal connections)
- file
Sequence of file sources
- custom
Sequence of custom sources
-
final
case class
SqlVirtualSourceConfig(id: ID, description: Option[NonEmptyString], parentSources: NonEmptyStringSeq, query: NonEmptyString, persist: Option[StorageLevel], save: Option[FileOutputConfig], keyFields: Seq[NonEmptyString] = Seq.empty, metadata: Seq[SparkParam] = Seq.empty) extends VirtualSourceConfig with Product with Serializable
Sql virtual source configuration
Sql virtual source configuration
- id
Virtual source ID
- description
Source description
- parentSources
Non-empty sequence of parent sources
- query
SQL query to build virtual source from parent sources
- persist
Spark storage level in order to persist dataframe during job execution.
- save
Configuration to save virtual source as a file.
- keyFields
Sequence of key fields (columns that identify data row)
- metadata
List of metadata parameters specific to this source
-
final
case class
StreamSourcesConfig(kafka: Seq[KafkaSourceConfig] = Seq.empty, file: Seq[FileSourceConfig] = Seq.empty) extends Product with Serializable
Data Quality job configuration section describing streams
Data Quality job configuration section describing streams
- kafka
Sequence of streams based on Kafka topics
- file
Sequence of streams based on file sources
-
final
case class
TableSourceConfig(id: ID, description: Option[NonEmptyString], connection: ID, table: Option[NonEmptyString], query: Option[NonEmptyString], persist: Option[StorageLevel], options: Seq[SparkParam] = Seq.empty, keyFields: Seq[NonEmptyString] = Seq.empty, metadata: Seq[SparkParam] = Seq.empty) extends SourceConfig with Product with Serializable
JDBC Table source configuration
JDBC Table source configuration
- id
Source ID
- description
Source description
- connection
Connection ID (must be JDBC connection)
- table
Table to read
- query
Query to execute
- persist
Spark storage level in order to persist dataframe during job execution.
- options
List of additional spark options required to read the source (if any)
- keyFields
Sequence of key fields (columns that identify data row)
- metadata
List of metadata parameters specific to this source
- Note
Either table to read or query to execute must be defined but not both.
-
sealed abstract
class
VirtualSourceConfig extends SourceConfig
Base class for all virtual source configurations.
Base class for all virtual source configurations. In addition to basic source configuration, virtual sources might have following optional parameter:
- save configuration in order to save virtual source as a file.
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()