Quick Start Guide¶
This guide will help you get started with PSR Lakehouse client.
Basic Usage¶
Import and Fetch Data¶
from psr.lakehouse import client
# Fetch CCEE spot price data
df = client.fetch_dataframe(
table_name="ccee_spot_price",
data_columns=["reference_date", "subsystem", "spot_price"],
start_reference_date="2023-05-01",
end_reference_date="2023-05-02",
)
print(df)
The result is a pandas DataFrame with the query results.
Filtering Data¶
Filter by Column Values¶
Use the filters parameter to filter by exact column values:
# Filter by subsystem
df = client.fetch_dataframe(
table_name="ons_stored_energy_subsystem",
data_columns=["reference_date", "subsystem", "max_stored_energy", "verified_stored_energy_percentage"],
start_reference_date="2023-05-01",
end_reference_date="2023-05-02",
filters={"subsystem": "SOUTHEAST"},
)
Date Range Filtering¶
Use start_reference_date and end_reference_date to filter data by date:
df = client.fetch_dataframe(
table_name="ccee_spot_price",
data_columns=["reference_date", "spot_price"],
start_reference_date="2023-01-01", # inclusive
end_reference_date="2023-02-01", # exclusive
)
Aggregating Data¶
Group and Aggregate¶
Use group_by and aggregation_method to aggregate data:
# Group by subsystem and calculate average spot price
df = client.fetch_dataframe(
table_name="ccee_spot_price",
data_columns=["spot_price"],
start_reference_date="2023-01-01",
end_reference_date="2023-02-01",
group_by=["subsystem"],
aggregation_method="avg",
)
Available aggregation methods:
sum- Sum of valuesavg- Average of valuesmin- Minimum valuemax- Maximum value
Temporal Aggregation¶
Use datetime_granularity to aggregate datetime data to a coarser granularity:
# Aggregate hourly data to daily
df = client.fetch_dataframe(
table_name="ons_power_plant_hourly_generation",
data_columns=["plant_type", "generation"],
start_reference_date="2025-01-01",
end_reference_date="2025-01-31",
group_by=["reference_date", "plant_type"],
aggregation_method="sum",
datetime_granularity="day", # Options: hour, day, week, month
)
Ordering Results¶
Sort By Multiple Columns¶
Use the order_by parameter to sort results:
df = client.fetch_dataframe(
table_name="ons_power_plant_hourly_generation",
data_columns=["plant_type", "generation"],
group_by=["reference_date", "plant_type"],
aggregation_method="sum",
datetime_granularity="day",
order_by=[
{"column": "reference_date", "direction": "desc"},
{"column": "plant_type", "direction": "asc"},
],
)
Schema Discovery¶
List Available Tables¶
tables = client.list_tables()
print(tables)
# Output: ['CCEESpotPrice', 'ONSEnergyLoadDaily', 'ONSStoredEnergySubsystem', ...]
Get Table Schema¶
Retrieve detailed schema information including field types, descriptions, and allowed values:
schema = client.get_schema("ccee_spot_price")
print(schema)
Example schema output:
{
'id': {
'type': 'integer',
'nullable': True,
'title': 'Id'
},
'reference_date': {
'type': 'string',
'nullable': False,
'format': 'date-time',
'title': 'Reference Date',
'description': 'Timestamp of the spot price'
},
'subsystem': {
'type': 'enum',
'nullable': True,
'description': 'Subsystem identifier',
'enum_values': ['NORTE', 'NORDESTE', 'SUDESTE', 'SUL', 'SISTEMA INTERLIGADO NACIONAL']
},
'spot_price': {
'type': 'number',
'nullable': False,
'title': 'Spot Price',
'description': 'Spot price in R$/MWh'
}
}
Next Steps¶
Check out the Advanced Examples page for more advanced use cases
Read the API Reference reference for complete method documentation
Explore available tables using
client.list_tables()