Time Series
It is possible to store time series data in your database. Time series in PSRDatabaseSQLite
are very flexible. You can have missing values, and you can have sparse data.
There is a specific table format that must be followed. Consider the following example:
CREATE TABLE Resource (
id INTEGER PRIMARY KEY AUTOINCREMENT,
label TEXT UNIQUE NOT NULL
) STRICT;
CREATE TABLE Resource_time_series_group1 (
id INTEGER,
date_time TEXT NOT NULL,
some_vector1 REAL,
some_vector2 REAL,
FOREIGN KEY(id) REFERENCES Resource(id) ON DELETE CASCADE ON UPDATE CASCADE,
PRIMARY KEY (id, date_time)
) STRICT;
It is mandatory for a time series to be indexed by a date_time
column with the following format: YYYY-MM-DD HH:MM:SS
. You can use the Dates.jl
package for handling this format.
using Dates
date = DateTime(2024, 3, 1) # 2024-03-01T00:00:00 (March 1st, 2024)
Notice that in this example, there are two value columns some_vector1
and some_vector2
. You can have as many value columns as you want. You can also separate the time series data into different tables, by creating a table Resource_time_series_group2
for example.
It is also possible to add more dimensions to your time series, such as block
and scenario
.
CREATE TABLE Resource_time_series_group2 (
id INTEGER,
date_time TEXT NOT NULL,
block INTEGER NOT NULL,
some_vector3 REAL,
some_vector4 REAL,
FOREIGN KEY(id) REFERENCES Resource(id) ON DELETE CASCADE ON UPDATE CASCADE,
PRIMARY KEY (id, date_time, block)
) STRICT;
Rules
Time series in PSRDatabaseSQLite
are very flexible. You can have missing values, and you can have sparse data.
If you are querying for a time series row entry that has a missing value, it first checks if there is a data with a date_time
earlier than the queried date_time
. If there is, it returns the value of the previous data. If there is no data earlier than the queried date_time
, it returns a specified value according to the type of data you are querying.
- For
Float64
, it returnsNaN
. - For
Int64
, it returnstypemin(Int)
. - For
String
, it returns""
(empty String). - For
DateTime
, it returnstypemin(DateTime)
.
For example, if you have the following data:
Date | some_vector1(Float64) | some_vector2(Float64) |
---|---|---|
2020 | 1.0 | missing |
2021 | missing | 1.0 |
2022 | 3.0 | missing |
- If you query for
some_vector1
at2020
, it returns1.0
. - If you query for
some_vector2
at2020
, it returnsNaN
. - If you query for
some_vector1
at2021
, it returns1.0
. - If you query for
some_vector2
at2021
, it returns1.0
. - If you query for
some_vector1
at2022
, it returns3.0
. - If you query for
some_vector2
at2022
, it returns1.0
.
Inserting data
When creating a new element that has a time series, you can pass this information via a DataFrame
. Consider the collection Resource
with the two time series tables Resource_time_series_group1
and Resource_time_series_group2
.
using DataFrames
using Dates
using PSRClassesInterface
PSRDatabaseSQLite = PSRClassesInterface.PSRDatabaseSQLite
db = PSRDatabaseSQLite.create_empty_db_from_schema(db_path, path_schema; force = true)
PSRDatabaseSQLite.create_element!(db, "Configuration"; label = "Toy Case", value1 = 1.0)
df_group1 = DataFrame(;
date_time = [DateTime(2000), DateTime(2001), DateTime(2002)],
some_vector1 = [missing, 1.0, 2.0],
some_vector2 = [1.0, missing, 5.0],
)
df_group2 = DataFrame(;
date_time = [
DateTime(2000),
DateTime(2000),
DateTime(2000),
DateTime(2000),
DateTime(2001),
DateTime(2001),
DateTime(2001),
DateTime(2009),
],
block = [1, 1, 1, 1, 2, 2, 2, 2],
some_vector3 = [1.0, 2.0, 3.0, 4.0, 1, 2, 3, 4],
some_vector4 = [1.0, 2.0, 3.0, 4.0, 1, 2, 3, 4],
)
PSRDatabaseSQLite.create_element!(
db,
"Resource";
label = "Resource 1",
group1 = df_group1,
group2 = df_group2,
)
It is also possible to insert a single row of a time series. This is useful when you want to insert a specific dimension entry. This way of inserting time series is less efficient than inserting a whole DataFrame
.
using DataFrames
using Dates
using PSRClassesInterface
PSRDatabaseSQLite = PSRClassesInterface.PSRDatabaseSQLite
db = PSRDatabaseSQLite.create_empty_db_from_schema(db_path, path_schema; force = true)
PSRDatabaseSQLite.create_element!(db, "Configuration"; label = "Toy Case", value1 = 1.0)
PSRDatabaseSQLite.create_element!(
db,
"Resource";
label = "Resource 1"
)
PSRDatabaseSQLite.add_time_series_row!(
db,
"Resource",
"some_vector1",
"Resource 1",
10.0; # new value
date_time = DateTime(2000)
)
PSRDatabaseSQLite.add_time_series_row!(
db,
"Resource",
"some_vector1",
"Resource 1",
11.0; # new value
date_time = DateTime(2001)
)
Reading data
You can read the information from the time series in two different ways.
Reading as a table
First, you can read the whole time series table for a given value, as a DataFrame
.
df = PSRDatabaseSQLite.read_time_series_table(
db,
"Resource",
"some_vector1",
"Resource 1",
)
Reading a single row
It is also possible to read a single row of the time series in the form of an array. This is useful when you want to query a specific dimension entry. For this function, there are performance improvements when reading the data via caching the previous and next non-missing values.
values = PSRDatabaseSQLite.read_time_series_row(
db,
"Resource",
"some_vector1",
Float64;
date_time = DateTime(2020)
)
When querying a row, all values should non-missing. However, if there is a missing value, the function will return the previous non-missing value. And if even the previous value is missing, it will return a specified value according to the type of data you are querying.
- For
Float64
, it returnsNaN
. - For
Int64
, it returnstypemin(Int)
. - For
String
, it returns""
(empty String). - For
DateTime
, it returnstypemin(DateTime)
.
For example, if you have the following data for the time series some_vector1
:
Date | Resource 1 | Resource 2 |
---|---|---|
2020 | 1.0 | missing |
2021 | missing | 1.0 |
2022 | 3.0 | missing |
- If you query at
2020
, it returns[1.0, NaN]
. - If you query at
2021
, it returns[1.0, 1.0]
. - If you query at
2022
, it returns[3.0, 1.0]
.
Updating data
When updating one of the entries of a time series for a given element and attribute, you need to specify the exact dimension values of the row you want to update.
For example, consider a time series that has block
and data_time
dimensions.
PSRDatabaseSQLite.update_time_series_row!(
db,
"Resource",
"some_vector3",
"Resource 1",
10.0; # new value
date_time = DateTime(2000),
block = 1
)
Deleting data
You can delete the whole time series of an element for a given time series group. Consider the following table:
CREATE TABLE Resource_time_series_group1 (
id INTEGER,
date_time TEXT NOT NULL,
some_vector1 REAL,
some_vector2 REAL,
FOREIGN KEY(id) REFERENCES Resource(id) ON DELETE CASCADE ON UPDATE CASCADE,
PRIMARY KEY (id, date_time)
) STRICT;
This table represents a "group" that stores two time series some_vector1
and some_vector2
. You can delete all the data from this group by calling the following function:
PSRDatabaseSQLite.delete_time_series!(
db,
"Resource",
"group1",
"Resource 1",
)
When trying to read a time series that has been deleted, the function will return an empty DataFrame
.