Quick Start Guide¶

This guide will help you get started with PSR Lakehouse client.

Basic Usage¶

Import and Fetch Data¶

from psr.lakehouse import client

# Fetch CCEE spot price data
df = client.fetch_dataframe(
    table_name="ccee_spot_price",
    data_columns=["reference_date", "subsystem", "spot_price"],
    start_reference_date="2023-05-01",
    end_reference_date="2023-05-02",
)
print(df)

The result is a pandas DataFrame with the query results.

Filtering Data¶

Filter by Column Values¶

Use the filters parameter to filter by exact column values:

# Filter by subsystem
df = client.fetch_dataframe(
    table_name="ons_stored_energy_subsystem",
    data_columns=["reference_date", "subsystem", "max_stored_energy", "verified_stored_energy_percentage"],
    start_reference_date="2023-05-01",
    end_reference_date="2023-05-02",
    filters={"subsystem": "SOUTHEAST"},
)

Date Range Filtering¶

Use start_reference_date and end_reference_date to filter data by date:

df = client.fetch_dataframe(
    table_name="ccee_spot_price",
    data_columns=["reference_date", "spot_price"],
    start_reference_date="2023-01-01",  # inclusive
    end_reference_date="2023-02-01",     # exclusive
)

Aggregating Data¶

Group and Aggregate¶

Use group_by and aggregation_method to aggregate data:

# Group by subsystem and calculate average spot price
df = client.fetch_dataframe(
    table_name="ccee_spot_price",
    data_columns=["spot_price"],
    start_reference_date="2023-01-01",
    end_reference_date="2023-02-01",
    group_by=["subsystem"],
    aggregation_method="avg",
)

Available aggregation methods:

sum - Sum of values
avg - Average of values
min - Minimum value
max - Maximum value

Temporal Aggregation¶

Use datetime_granularity to aggregate datetime data to a coarser granularity:

# Aggregate hourly data to daily
df = client.fetch_dataframe(
    table_name="ons_power_plant_hourly_generation",
    data_columns=["plant_type", "generation"],
    start_reference_date="2025-01-01",
    end_reference_date="2025-01-31",
    group_by=["reference_date", "plant_type"],
    aggregation_method="sum",
    datetime_granularity="day",  # Options: hour, day, week, month
)

Ordering Results¶

Sort By Multiple Columns¶

Use the order_by parameter to sort results:

df = client.fetch_dataframe(
    table_name="ons_power_plant_hourly_generation",
    data_columns=["plant_type", "generation"],
    group_by=["reference_date", "plant_type"],
    aggregation_method="sum",
    datetime_granularity="day",
    order_by=[
        {"column": "reference_date", "direction": "desc"},
        {"column": "plant_type", "direction": "asc"},
    ],
)

Schema Discovery¶

List Available Tables¶

tables = client.list_tables()
print(tables)
# Output: ['CCEESpotPrice', 'ONSEnergyLoadDaily', 'ONSStoredEnergySubsystem', ...]

Get Table Schema¶

Retrieve detailed schema information including field types, descriptions, and allowed values:

schema = client.get_schema("ccee_spot_price")
print(schema)

Example schema output:

{
    'id': {
        'type': 'integer',
        'nullable': True,
        'title': 'Id'
    },
    'reference_date': {
        'type': 'string',
        'nullable': False,
        'format': 'date-time',
        'title': 'Reference Date',
        'description': 'Timestamp of the spot price'
    },
    'subsystem': {
        'type': 'enum',
        'nullable': True,
        'description': 'Subsystem identifier',
        'enum_values': ['NORTE', 'NORDESTE', 'SUDESTE', 'SUL', 'SISTEMA INTERLIGADO NACIONAL']
    },
    'spot_price': {
        'type': 'number',
        'nullable': False,
        'title': 'Spot Price',
        'description': 'Spot price in R$/MWh'
    }
}

Next Steps¶

Check out the Advanced Examples page for more advanced use cases
Read the API Reference reference for complete method documentation
Explore available tables using client.list_tables()