Introducing Jupyter Notebooks in Sphinx

This notebook showcases very basic functionality of rendering your jupyter notebooks as tutorials inside your sphinx documentation.

As part of the LINCC Frameworks python project template, your notebooks will be executed AND rendered at document build time.

You can read more about Sphinx, ReadTheDocs, and building notebooks in LINCC’s documentation

# Create a source table and pack it into nested structures and lists
[1]:
import numpy as np
import pandas as pd
from pandas_ts.packer import pack_flat, pack_dfs


# Adopted from
# https://github.com/lincc-frameworks/tape/blob/6a694c4c138aadb1508c2a96de4fa63f90319331/tests/tape_tests/conftest.py#L15
def create_test_rows():
    num_points = 1000
    all_bands = np.array(["g", "r", "i", "z"])

    rows = {
        "id": 8000 + (np.arange(num_points) % 5),
        "time": np.arange(num_points),
        "flux": np.arange(num_points) % len(all_bands),
        "band": np.repeat(all_bands, num_points / len(all_bands)),
        "err": 0.1 * (np.arange(num_points) % 10),
        "count": np.arange(num_points),
        # Not sure that I'm ready for Nones
        # "something_else": np.full(num_points, None),
    }

    return rows


sources = pd.DataFrame(create_test_rows())
sources.set_index("id", inplace=True)
sources
[1]:
time flux band err count
id
8000 0 0 g 0.0 0
8001 1 1 g 0.1 1
8002 2 2 g 0.2 2
8003 3 3 g 0.3 3
8004 4 0 g 0.4 4
... ... ... ... ... ...
8000 995 3 z 0.5 995
8001 996 0 z 0.6 996
8002 997 1 z 0.7 997
8003 998 2 z 0.8 998
8004 999 3 z 0.9 999

1000 rows × 5 columns

[2]:
packed = pack_flat(sources, name="sources")
packed
[2]:
8000         time  flux band  err  count
0       0    ...
8001         time  flux band  err  count
0       1    ...
8002         time  flux band  err  count
0       2    ...
8003         time  flux band  err  count
0       3    ...
8004         time  flux band  err  count
0       4    ...
Name: sources, dtype: ts<time: [int64], flux: [int64], band: [string], err: [double], count: [int64]>

Single item of the packed series is returned as a new DataFrame

[3]:
packed.iloc[0]
[3]:
time flux band err count
0 0 0 g 0.0 0
1 5 1 g 0.5 5
2 10 2 g 0.0 10
3 15 3 g 0.5 15
4 20 0 g 0.0 20
... ... ... ... ... ...
195 975 3 z 0.5 975
196 980 0 z 0.0 980
197 985 1 z 0.5 985
198 990 2 z 0.0 990
199 995 3 z 0.5 995

200 rows × 5 columns

[4]:
# Get the linearly interpolated flux for time=10
packed.apply(lambda df: np.interp(10.0, df["time"], df["flux"]))
[4]:
8000    2.0
8001    2.8
8002    1.2
8003    0.4
8004    1.2
Name: sources, dtype: float64

Get packed sources series and play with .ts accessor

This series is a collection of structures, each structure consist of multiple fields, and each field is a “list” of values.

[5]:
packed.ts.to_flat()
[5]:
time flux band err count
8000 0 0 g 0.0 0
8000 5 1 g 0.5 5
8000 10 2 g 0.0 10
8000 15 3 g 0.5 15
8000 20 0 g 0.0 20
... ... ... ... ... ...
8004 979 3 z 0.9 979
8004 984 0 z 0.4 984
8004 989 1 z 0.9 989
8004 994 2 z 0.4 994
8004 999 3 z 0.9 999

1000 rows × 5 columns

[6]:
packed.ts.to_lists()
[6]:
time flux band err count
8000 [ 0 5 10 15 20 25 30 35 40 45 50 ... [0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2... ['g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' '... [0. 0.5 0. 0.5 0. 0.5 0. 0.5 0. 0.5 0. 0... [ 0 5 10 15 20 25 30 35 40 45 50 ...
8001 [ 1 6 11 16 21 26 31 36 41 46 51 ... [1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3... ['g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' '... [0.1 0.6 0.1 0.6 0.1 0.6 0.1 0.6 0.1 0.6 0.1 0... [ 1 6 11 16 21 26 31 36 41 46 51 ...
8002 [ 2 7 12 17 22 27 32 37 42 47 52 ... [2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0... ['g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' '... [0.2 0.7 0.2 0.7 0.2 0.7 0.2 0.7 0.2 0.7 0.2 0... [ 2 7 12 17 22 27 32 37 42 47 52 ...
8003 [ 3 8 13 18 23 28 33 38 43 48 53 ... [3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1... ['g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' '... [0.3 0.8 0.3 0.8 0.3 0.8 0.3 0.8 0.3 0.8 0.3 0... [ 3 8 13 18 23 28 33 38 43 48 53 ...
8004 [ 4 9 14 19 24 29 34 39 44 49 54 ... [0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2... ['g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' '... [0.4 0.9 0.4 0.9 0.4 0.9 0.4 0.9 0.4 0.9 0.4 0... [ 4 9 14 19 24 29 34 39 44 49 54 ...
[7]:
packed.ts["flux"]
[7]:
8000    0
8000    1
8000    2
8000    3
8000    0
       ..
8004    3
8004    0
8004    1
8004    2
8004    3
Name: flux, Length: 1000, dtype: int64[pyarrow]
[8]:
packed.ts[["time", "flux"]]
[8]:
8000         time  flux
0       0     0
1       5     ...
8001         time  flux
0       1     1
1       6     ...
8002         time  flux
0       2     2
1       7     ...
8003         time  flux
0       3     3
1       8     ...
8004         time  flux
0       4     0
1       9     ...
Name: sources, dtype: ts<time: [int64], flux: [int64]>
[9]:
packed.dtype
[9]:
ts<time: [int64], flux: [int64], band: [string], err: [double], count: [int64]>

Modify underlying fields with .ts accessor

[10]:
# Change flux in place with flat arrays
packed.ts["flux"] = -2 * packed.ts["flux"]
packed.ts["flux"]
[10]:
8000     0
8000    -2
8000    -4
8000    -6
8000     0
        ..
8004    -6
8004     0
8004    -2
8004    -4
8004    -6
Name: flux, Length: 1000, dtype: int64[pyarrow]
[11]:
# Change errors for object 8003
light_curve = packed.loc[8003]
light_curve["err"] += 25
# packed.lpc[8003] = ... does not work
packed.iloc[3:4] = [light_curve]
packed.iloc[0]
[11]:
time flux band err count
0 0 0 g 0.0 0
1 5 -2 g 0.5 5
2 10 -4 g 0.0 10
3 15 -6 g 0.5 15
4 20 0 g 0.0 20
... ... ... ... ... ...
195 975 -6 z 0.5 975
196 980 0 z 0.0 980
197 985 -2 z 0.5 985
198 990 -4 z 0.0 990
199 995 -6 z 0.5 995

200 rows × 5 columns

[12]:
# Delete field and add new one
del packed.ts["count"]
packed.ts["filters"] = "lsst_" + packed.ts.pop_field("band")
packed
[12]:
8000         time  flux  err filters
0       0     0  ...
8001         time  flux  err filters
0       1    -2  ...
8002         time  flux  err filters
0       2    -4  ...
8003         time  flux   err filters
0       3    -6 ...
8004         time  flux  err filters
0       4     0  ...
Name: sources, dtype: ts<time: [int64], flux: [int64], band: [string], err: [double], count: [int64]>

Change all items and pack to a new Series

[13]:
# Subsample light curves
dfs = packed.apply(lambda df: df.iloc[::50])
subsampled = pack_dfs(dfs, name="subsampled")
packed.loc[8000], subsampled.loc[8000]
[13]:
(     time  flux  err filters
 0       0     0  0.0  lsst_g
 1       5    -2  0.5  lsst_g
 2      10    -4  0.0  lsst_g
 3      15    -6  0.5  lsst_g
 4      20     0  0.0  lsst_g
 ..    ...   ...  ...     ...
 195   975    -6  0.5  lsst_z
 196   980     0  0.0  lsst_z
 197   985    -2  0.5  lsst_z
 198   990    -4  0.0  lsst_z
 199   995    -6  0.5  lsst_z

 [200 rows x 4 columns],
    time  flux  err filters
 0     0     0  0.0  lsst_g
 1   250    -4  0.0  lsst_r
 2   500     0  0.0  lsst_i
 3   750    -4  0.0  lsst_z)
[14]:
# Query sources
# Currently, empty objects will be removed from the packed series
packed.ts.query_flat("err < 0.5")
[14]:
8000        time  flux  err filters
0      0     0  0....
8001        time  flux  err filters
0      1    -2  0....
8002        time  flux  err filters
0      2    -4  0....
8004        time  flux  err filters
0      4     0  0....
dtype: ts<time: [int64], flux: [int64], err: [double], filters: [string]>
[ ]: