Packages

object Sources

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. Sources
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. final case class AggregateVirtualSourceConfig(id: ID, description: Option[NonEmptyString], parentSources: SingleElemStringSeq, groupBy: NonEmptyStringSeq, expr: Refined[Seq[Column], NonEmpty], persist: Option[StorageLevel], save: Option[FileOutputConfig], keyFields: Seq[NonEmptyString] = Seq.empty, metadata: Seq[SparkParam] = Seq.empty) extends VirtualSourceConfig with Product with Serializable

    Aggregate virtual source configuration

    Aggregate virtual source configuration

    id

    Virtual source ID

    description

    Source description

    parentSources

    Sequence containing exactly one source.

    groupBy

    Non-empty sequence of columns by which to perform grouping

    expr

    Non-empty sequence of spark sql expression used to get aggregated columns. One expression per each resultant column

    persist

    Spark storage level in order to persist dataframe during job execution.

    save

    Configuration to save virtual source as a file.

    keyFields

    Sequence of key fields (columns that identify data row)

    metadata

    List of metadata parameters specific to this source

  2. final case class AvroFileSourceConfig(id: ID, description: Option[NonEmptyString], path: URI, schema: Option[ID], persist: Option[StorageLevel], windowBy: StreamWindowing = ProcessingTime, options: Seq[SparkParam] = Seq.empty, keyFields: Seq[NonEmptyString] = Seq.empty, metadata: Seq[SparkParam] = Seq.empty) extends FileSourceConfig with AvroFileConfig with Product with Serializable

    Avro file source configuration

    Avro file source configuration

    id

    Source ID

    description

    Source description

    path

    Path to file

    schema

    Schema ID

    persist

    Spark storage level in order to persist dataframe during job execution.

    windowBy

    Source of timestamp used to build windows. Applicable only for streaming jobs! Default: processingTime - uses current timestamp at the moment when Spark processes row. Other options are:

    • eventTime - uses column with name 'timestamp' (column must be of TimestampType).
    • customTime(columnName) - uses arbitrary user-defined column (column must be of TimestampType)
    options

    List of additional spark options required to read the source (if any)

    keyFields

    Sequence of key fields (columns that identify data row)

    metadata

    List of metadata parameters specific to this source

  3. final case class CustomSource(id: ID, description: Option[NonEmptyString], format: NonEmptyString, path: Option[URI], schema: Option[ID], persist: Option[StorageLevel], options: Seq[SparkParam] = Seq.empty, keyFields: Seq[NonEmptyString] = Seq.empty, metadata: Seq[SparkParam] = Seq.empty) extends SourceConfig with Product with Serializable

    Custom source configuration: used to read from source types that are not supported explicitly.

    Custom source configuration: used to read from source types that are not supported explicitly.

    id

    Source ID

    description

    Source description

    format

    Source format to set in spark reader.

    path

    Path to load the source from (if required)

    schema

    Explicit schema applied to source data (if required)

    persist

    Spark storage level in order to persist dataframe during job execution.

    options

    List of additional spark options required to read the source (if any)

    keyFields

    Sequence of key fields (columns that identify data row)

    metadata

    List of metadata parameters specific to this source

  4. final case class DelimitedFileSourceConfig(id: ID, description: Option[NonEmptyString], path: URI, schema: Option[ID], persist: Option[StorageLevel], delimiter: NonEmptyString = ",", quote: NonEmptyString = "\"", escape: NonEmptyString = "\\", header: Boolean = false, windowBy: StreamWindowing = ProcessingTime, options: Seq[SparkParam] = Seq.empty, keyFields: Seq[NonEmptyString] = Seq.empty, metadata: Seq[SparkParam] = Seq.empty) extends FileSourceConfig with DelimitedFileConfig with Product with Serializable

    Delimited file source configuration

    Delimited file source configuration

    id

    Source ID

    description

    Source description

    path

    Path to file

    schema

    Schema ID (only if header = false)

    persist

    Spark storage level in order to persist dataframe during job execution.

    delimiter

    Column delimiter (default: ,)

    quote

    Quotation symbol (default: ")

    escape

    Escape symbol (default: \)

    header

    Boolean flag indicating whether schema should be read from file header (default: false)

    windowBy

    Source of timestamp used to build windows. Applicable only for streaming jobs! Default: processingTime - uses current timestamp at the moment when Spark processes row. Other options are:

    • eventTime - uses column with name 'timestamp' (column must be of TimestampType).
    • customTime(columnName) - uses arbitrary user-defined column (column must be of TimestampType)
    options

    List of additional spark options required to read the source (if any)

    keyFields

    Sequence of key fields (columns that identify data row)

    metadata

    List of metadata parameters specific to this source

  5. sealed abstract class FileSourceConfig extends SourceConfig

    Base class for file source configurations.

    Base class for file source configurations. All file sources are streamable and therefore must contain windowBy parameter which defined source of timestamp used to build stream windows.

  6. final case class FilterVirtualSourceConfig(id: ID, description: Option[NonEmptyString], parentSources: SingleElemStringSeq, expr: Refined[Seq[Column], NonEmpty], persist: Option[StorageLevel], save: Option[FileOutputConfig], windowBy: Option[StreamWindowing], keyFields: Seq[NonEmptyString] = Seq.empty, metadata: Seq[SparkParam] = Seq.empty) extends VirtualSourceConfig with Product with Serializable

    Filter virtual source configuration

    Filter virtual source configuration

    id

    Virtual source ID

    description

    Source description

    parentSources

    Sequence containing exactly one source.

    expr

    Non-empty sequence of spark sql expression used to filter source. All expressions must return boolean. Source is filtered using logical conjunction of all provided expressions.

    persist

    Spark storage level in order to persist dataframe during job execution.

    save

    Configuration to save virtual source as a file.

    keyFields

    Sequence of key fields (columns that identify data row)

    metadata

    List of metadata parameters specific to this source

  7. final case class FixedFileSourceConfig(id: ID, description: Option[NonEmptyString], path: URI, schema: Option[ID], persist: Option[StorageLevel], windowBy: StreamWindowing = ProcessingTime, options: Seq[SparkParam] = Seq.empty, keyFields: Seq[NonEmptyString] = Seq.empty, metadata: Seq[SparkParam] = Seq.empty) extends FileSourceConfig with FixedFileConfig with Product with Serializable

    Fixed-width file source configuration

    Fixed-width file source configuration

    id

    Source ID

    description

    Source description

    path

    Path to file

    schema

    Schema ID (must be either fixedFull or fixedShort schema)

    persist

    Spark storage level in order to persist dataframe during job execution.

    windowBy

    Source of timestamp used to build windows. Applicable only for streaming jobs! Default: processingTime - uses current timestamp at the moment when Spark processes row. Other options are:

    • eventTime - uses column with name 'timestamp' (column must be of TimestampType).
    • customTime(columnName) - uses arbitrary user-defined column (column must be of TimestampType)
    options

    List of additional spark options required to read the source (if any)

    keyFields

    Sequence of key fields (columns that identify data row)

    metadata

    List of metadata parameters specific to this source

  8. final case class GreenplumSourceConfig(id: ID, description: Option[NonEmptyString], connection: ID, table: Option[NonEmptyString], persist: Option[StorageLevel], options: Seq[SparkParam] = Seq.empty, keyFields: Seq[NonEmptyString] = Seq.empty, metadata: Seq[SparkParam] = Seq.empty) extends SourceConfig with Product with Serializable

    Greenplum Table source configuration

    Greenplum Table source configuration

    id

    Source ID

    description

    Source description

    connection

    Connection ID (must be pivotal connection)

    table

    Table to read

    persist

    Spark storage level in order to persist dataframe during job execution.

    options

    List of additional spark options required to read the source (if any)

    keyFields

    Sequence of key fields (columns that identify data row)

    metadata

    List of metadata parameters specific to this source

  9. final case class HivePartition(name: NonEmptyString, expr: Option[Column], values: Seq[NonEmptyString] = Seq.empty) extends Product with Serializable

    Configuration for Hive Table partition values to read.

    Configuration for Hive Table partition values to read.

    name

    Name of partition column

    expr

    SQL Expression used to filter partitions to read

    values

    Sequence of partition values to read

  10. final case class HiveSourceConfig(id: ID, description: Option[NonEmptyString], schema: NonEmptyString, table: NonEmptyString, persist: Option[StorageLevel], partitions: Seq[HivePartition] = Seq.empty, options: Seq[SparkParam] = Seq.empty, keyFields: Seq[NonEmptyString] = Seq.empty, metadata: Seq[SparkParam] = Seq.empty) extends SourceConfig with Product with Serializable

    Hive table source configuration

    Hive table source configuration

    id

    Source ID

    description

    Source description

    schema

    Hive schema

    table

    Hive table

    persist

    Spark storage level in order to persist dataframe during job execution.

    partitions

    Sequence of partitions to read. The order of partition columns should correspond to order in which partition columns are defined in hive table DDL.

    options

    Sequence of additional Kafka options

    keyFields

    Sequence of key fields (columns that identify data row)

    metadata

    List of metadata parameters specific to this source

  11. final case class JoinVirtualSourceConfig(id: ID, description: Option[NonEmptyString], parentSources: DoubleElemStringSeq, joinBy: NonEmptyStringSeq, joinType: SparkJoinType, persist: Option[StorageLevel], save: Option[FileOutputConfig], keyFields: Seq[NonEmptyString] = Seq.empty, metadata: Seq[SparkParam] = Seq.empty) extends VirtualSourceConfig with Product with Serializable

    Join virtual source configuration

    Join virtual source configuration

    id

    Virtual source ID

    description

    Source description

    parentSources

    Sequence of exactly two parent sources.

    joinBy

    Non-empty sequence of columns to join by.

    joinType

    Spark join type.

    persist

    Spark storage level in order to persist dataframe during job execution.

    save

    Configuration to save virtual source as a file.

    keyFields

    Sequence of key fields (columns that identify data row)

    metadata

    List of metadata parameters specific to this source

  12. final case class KafkaSourceConfig(id: ID, description: Option[NonEmptyString], connection: ID, topics: Seq[NonEmptyString] = Seq.empty, topicPattern: Option[NonEmptyString], startingOffsets: Option[NonEmptyString], endingOffsets: Option[NonEmptyString], persist: Option[StorageLevel], windowBy: StreamWindowing = ProcessingTime, keyFormat: KafkaTopicFormat = KafkaTopicFormat.String, valueFormat: KafkaTopicFormat = KafkaTopicFormat.String, keySchema: Option[ID] = None, valueSchema: Option[ID] = None, subtractSchemaId: Boolean = false, options: Seq[SparkParam] = Seq.empty, keyFields: Seq[NonEmptyString] = Seq.empty, metadata: Seq[SparkParam] = Seq.empty) extends SourceConfig with Product with Serializable

    Kafka source configuration

    Kafka source configuration

    id

    Source ID

    description

    Source description

    connection

    Connection ID (must be a Kafka Connection)

    topics

    Sequence of topics to read

    topicPattern

    Pattern that defined topics to read

    startingOffsets

    Json-string defining starting offsets. If none is set, then "earliest" is used in batch jobs and "latest is used in streaming jobs.

    endingOffsets

    Json-string defining ending offset. Applicable only to batch jobs. If none is set then "latest" is used.

    persist

    Spark storage level in order to persist dataframe during job execution.

    windowBy

    Source of timestamp used to build windows. Applicable only for streaming jobs! Default: processingTime - uses current timestamp at the moment when Spark processes row. Other options are:

    • eventTime - uses Kafka message creation timestamp.
    • customTime(columnName) - uses arbitrary user-defined column from kafka message (column must be of TimestampType)
    keyFormat

    Message key format. Default: string.

    valueFormat

    Message value format. Default: string.

    keySchema

    Schema ID. Used to parse message key. Ignored when keyFormat is string. Mandatory for other formats.

    valueSchema

    Schema ID. Used to parse message value. Ignored when valueFormat is string. Mandatory for other formats. Used to parse kafka message value.

    subtractSchemaId

    Boolean flag indicating whether a kafka message schema ID encoded into its value, i.e. [1 Magic Byte] + [4 Schema ID Bytes] + [Message Value Binary Data]. If set to true, then first five bytes are subtracted before value parsing. Default: false

    options

    Sequence of additional Kafka options

    keyFields

    Sequence of key fields (columns that identify data row)

    metadata

    List of metadata parameters specific to this source

  13. final case class OrcFileSourceConfig(id: ID, description: Option[NonEmptyString], path: URI, schema: Option[ID], persist: Option[StorageLevel], windowBy: StreamWindowing = ProcessingTime, options: Seq[SparkParam] = Seq.empty, keyFields: Seq[NonEmptyString] = Seq.empty, metadata: Seq[SparkParam] = Seq.empty) extends FileSourceConfig with OrcFileConfig with Product with Serializable

    Orc file source configuration

    Orc file source configuration

    id

    Source ID

    description

    Source description

    path

    Path to file

    schema

    Schema ID

    persist

    Spark storage level in order to persist dataframe during job execution.

    windowBy

    Source of timestamp used to build windows. Applicable only for streaming jobs! Default: processingTime - uses current timestamp at the moment when Spark processes row. Other options are:

    • eventTime - uses column with name 'timestamp' (column must be of TimestampType).
    • customTime(columnName) - uses arbitrary user-defined column (column must be of TimestampType)
    options

    List of additional spark options required to read the source (if any)

    keyFields

    Sequence of key fields (columns that identify data row)

    metadata

    List of metadata parameters specific to this source

  14. final case class ParquetFileSourceConfig(id: ID, description: Option[NonEmptyString], path: URI, schema: Option[ID], persist: Option[StorageLevel], windowBy: StreamWindowing = ProcessingTime, options: Seq[SparkParam] = Seq.empty, keyFields: Seq[NonEmptyString] = Seq.empty, metadata: Seq[SparkParam] = Seq.empty) extends FileSourceConfig with ParquetFileConfig with Product with Serializable

    Parquet file source configuration

    Parquet file source configuration

    id

    Source ID

    description

    Source description

    path

    Path to file

    schema

    Schema ID

    persist

    Spark storage level in order to persist dataframe during job execution.

    windowBy

    Source of timestamp used to build windows. Applicable only for streaming jobs! Default: processingTime - uses current timestamp at the moment when Spark processes row. Other options are:

    • eventTime - uses column with name 'timestamp' (column must be of TimestampType).
    • customTime(columnName) - uses arbitrary user-defined column (column must be of TimestampType)
    options

    List of additional spark options required to read the source (if any)

    keyFields

    Sequence of key fields (columns that identify data row)

    metadata

    List of metadata parameters specific to this source

  15. final case class SelectVirtualSourceConfig(id: ID, description: Option[NonEmptyString], parentSources: SingleElemStringSeq, expr: Refined[Seq[Column], NonEmpty], persist: Option[StorageLevel], save: Option[FileOutputConfig], windowBy: Option[StreamWindowing], keyFields: Seq[NonEmptyString] = Seq.empty, metadata: Seq[SparkParam] = Seq.empty) extends VirtualSourceConfig with Product with Serializable

    Select virtual source configuration

    Select virtual source configuration

    id

    Virtual source ID

    description

    Source description

    parentSources

    Sequence containing exactly one source.

    expr

    Non-empty sequence of spark sql expression used select column from parent source. One expression per each resultant column

    persist

    Spark storage level in order to persist dataframe during job execution.

    save

    Configuration to save virtual source as a file.

    keyFields

    Sequence of key fields (columns that identify data row)

    metadata

    List of metadata parameters specific to this source

  16. sealed abstract class SourceConfig extends JobConfigEntity

    Base class for all source configurations.

    Base class for all source configurations. All sources are described as DQ entities that might have an optional sequence of keyFields which will uniquely identify data row in error collection reports. It should also be indicated whether this source is streamable or not. Additionally, the persist field specifies an optional storage level in order to persist source during job execution.

  17. final case class SourcesConfig(table: Seq[TableSourceConfig] = Seq.empty, hive: Seq[HiveSourceConfig] = Seq.empty, kafka: Seq[KafkaSourceConfig] = Seq.empty, greenplum: Seq[GreenplumSourceConfig] = Seq.empty, file: Seq[FileSourceConfig] = Seq.empty, custom: Seq[CustomSource] = Seq.empty) extends Product with Serializable

    Data Quality job configuration section describing sources

    Data Quality job configuration section describing sources

    table

    Sequence of table sources (read from JDBC connections)

    hive

    Sequence of Hive table sources

    kafka

    Sequence of sources based on Kafka topics

    greenplum

    Sequence of greenplum sources (read from pivotal connections)

    file

    Sequence of file sources

    custom

    Sequence of custom sources

  18. final case class SqlVirtualSourceConfig(id: ID, description: Option[NonEmptyString], parentSources: NonEmptyStringSeq, query: NonEmptyString, persist: Option[StorageLevel], save: Option[FileOutputConfig], keyFields: Seq[NonEmptyString] = Seq.empty, metadata: Seq[SparkParam] = Seq.empty) extends VirtualSourceConfig with Product with Serializable

    Sql virtual source configuration

    Sql virtual source configuration

    id

    Virtual source ID

    description

    Source description

    parentSources

    Non-empty sequence of parent sources

    query

    SQL query to build virtual source from parent sources

    persist

    Spark storage level in order to persist dataframe during job execution.

    save

    Configuration to save virtual source as a file.

    keyFields

    Sequence of key fields (columns that identify data row)

    metadata

    List of metadata parameters specific to this source

  19. final case class StreamSourcesConfig(kafka: Seq[KafkaSourceConfig] = Seq.empty, file: Seq[FileSourceConfig] = Seq.empty) extends Product with Serializable

    Data Quality job configuration section describing streams

    Data Quality job configuration section describing streams

    kafka

    Sequence of streams based on Kafka topics

    file

    Sequence of streams based on file sources

  20. final case class TableSourceConfig(id: ID, description: Option[NonEmptyString], connection: ID, table: Option[NonEmptyString], query: Option[NonEmptyString], persist: Option[StorageLevel], options: Seq[SparkParam] = Seq.empty, keyFields: Seq[NonEmptyString] = Seq.empty, metadata: Seq[SparkParam] = Seq.empty) extends SourceConfig with Product with Serializable

    JDBC Table source configuration

    JDBC Table source configuration

    id

    Source ID

    description

    Source description

    connection

    Connection ID (must be JDBC connection)

    table

    Table to read

    query

    Query to execute

    persist

    Spark storage level in order to persist dataframe during job execution.

    options

    List of additional spark options required to read the source (if any)

    keyFields

    Sequence of key fields (columns that identify data row)

    metadata

    List of metadata parameters specific to this source

    Note

    Either table to read or query to execute must be defined but not both.

  21. sealed abstract class VirtualSourceConfig extends SourceConfig

    Base class for all virtual source configurations.

    Base class for all virtual source configurations. In addition to basic source configuration, virtual sources might have following optional parameter:

    • save configuration in order to save virtual source as a file.

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  6. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  7. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  8. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  9. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  10. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  11. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  12. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  13. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  14. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  15. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  16. def toString(): String
    Definition Classes
    AnyRef → Any
  17. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  18. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  19. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()

Inherited from AnyRef

Inherited from Any

Ungrouped