Key Concepts
Data Source

Data Source

A data source represents some external data storage system.

Supported system currently include

  • File Source (including S3, HDFS, PARQUET)
  • Snowflake
  • RedShift
  • SQL (JDBC)
main.py
from glacius import SnowflakeSource
 
item_engagement_data_source = SnowflakeSource(
    name = "GLOBAL_ITEM_ENGAGEMENT_DATA",
    description = "item engagement data",
    timestamp_col = "timestamp",
    table = "STREAMING_DATA_BENCHMARK_M",
    database = "gradiently",
    schema="public" 
)

Optionally, you can also specify a SparkSQL query to the datasource to transform it before processing.

main.py
from glacius import SnowflakeSource
 
item_engagement_data_source = SnowflakeSource(
    name = "GLOBAL_ITEM_ENGAGEMENT_DATA",
    description = "item engagement data",
    timestamp_col = "timestamp",
    table = "STREAMING_DATA_BENCHMARK_M",
    database = "gradiently",
    schema="public" 
    query = "SELECT * FROM table WHERE request_date >= '2023-01-01'"
 
)