Data Source
A data source represents some external data storage system.
Supported system currently include
- File Source (including S3, HDFS, PARQUET)
- Snowflake
- RedShift
- SQL (JDBC)
main.py
from glacius import SnowflakeSource
item_engagement_data_source = SnowflakeSource(
name = "GLOBAL_ITEM_ENGAGEMENT_DATA",
description = "item engagement data",
timestamp_col = "timestamp",
table = "STREAMING_DATA_BENCHMARK_M",
database = "gradiently",
schema="public"
)
Optionally, you can also specify a SparkSQL query to the datasource to transform it before processing.
main.py
from glacius import SnowflakeSource
item_engagement_data_source = SnowflakeSource(
name = "GLOBAL_ITEM_ENGAGEMENT_DATA",
description = "item engagement data",
timestamp_col = "timestamp",
table = "STREAMING_DATA_BENCHMARK_M",
database = "gradiently",
schema="public"
query = "SELECT * FROM table WHERE request_date >= '2023-01-01'"
)