Introducing Jupyter Notebooks in Sphinx

This notebook showcases very basic functionality of rendering your jupyter notebooks as tutorials inside your sphinx documentation.

As part of the LINCC Frameworks python project template, your notebooks will be executed AND rendered at document build time.

You can read more about Sphinx, ReadTheDocs, and building notebooks in LINCC’s documentation

# Create a source table and pack it into nested structures and lists

[1]:

import numpy as np
import pandas as pd
from pandas_ts.packer import pack_flat, pack_dfs


# Adopted from
# https://github.com/lincc-frameworks/tape/blob/6a694c4c138aadb1508c2a96de4fa63f90319331/tests/tape_tests/conftest.py#L15
def create_test_rows():
    num_points = 1000
    all_bands = np.array(["g", "r", "i", "z"])

    rows = {
        "id": 8000 + (np.arange(num_points) % 5),
        "time": np.arange(num_points),
        "flux": np.arange(num_points) % len(all_bands),
        "band": np.repeat(all_bands, num_points / len(all_bands)),
        "err": 0.1 * (np.arange(num_points) % 10),
        "count": np.arange(num_points),
        # Not sure that I'm ready for Nones
        # "something_else": np.full(num_points, None),
    }

    return rows


sources = pd.DataFrame(create_test_rows())
sources.set_index("id", inplace=True)
sources

[1]:

	time	flux	band	err	count
id
8000	0	0	g	0.0	0
8001	1	1	g	0.1	1
8002	2	2	g	0.2	2
8003	3	3	g	0.3	3
8004	4	0	g	0.4	4
...	...	...	...	...	...
8000	995	3	z	0.5	995
8001	996	0	z	0.6	996
8002	997	1	z	0.7	997
8003	998	2	z	0.8	998
8004	999	3	z	0.9	999

1000 rows × 5 columns

[2]:

packed = pack_flat(sources, name="sources")
packed

[2]:

8000         time  flux band  err  count
0       0    ...
8001         time  flux band  err  count
0       1    ...
8002         time  flux band  err  count
0       2    ...
8003         time  flux band  err  count
0       3    ...
8004         time  flux band  err  count
0       4    ...
Name: sources, dtype: ts<time: [int64], flux: [int64], band: [string], err: [double], count: [int64]>

Single item of the packed series is returned as a new DataFrame

[3]:

packed.iloc[0]

[3]:

	time	flux	band	err	count
0	0	0	g	0.0	0
1	5	1	g	0.5	5
2	10	2	g	0.0	10
3	15	3	g	0.5	15
4	20	0	g	0.0	20
...	...	...	...	...	...
195	975	3	z	0.5	975
196	980	0	z	0.0	980
197	985	1	z	0.5	985
198	990	2	z	0.0	990
199	995	3	z	0.5	995

200 rows × 5 columns

[4]:

# Get the linearly interpolated flux for time=10
packed.apply(lambda df: np.interp(10.0, df["time"], df["flux"]))

[4]:

8000    2.0
8001    2.8
8002    1.2
8003    0.4
8004    1.2
Name: sources, dtype: float64

Get packed sources series and play with `.ts` accessor

This series is a collection of structures, each structure consist of multiple fields, and each field is a “list” of values.

[5]:

packed.ts.to_flat()

[5]:

	time	flux	band	err	count
8000	0	0	g	0.0	0
8000	5	1	g	0.5	5
8000	10	2	g	0.0	10
8000	15	3	g	0.5	15
8000	20	0	g	0.0	20
...	...	...	...	...	...
8004	979	3	z	0.9	979
8004	984	0	z	0.4	984
8004	989	1	z	0.9	989
8004	994	2	z	0.4	994
8004	999	3	z	0.9	999

1000 rows × 5 columns

[6]:

packed.ts.to_lists()

[6]:

	time	flux	band	err	count
8000	[ 0 5 10 15 20 25 30 35 40 45 50 ...	[0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2...	['g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' '...	[0. 0.5 0. 0.5 0. 0.5 0. 0.5 0. 0.5 0. 0...	[ 0 5 10 15 20 25 30 35 40 45 50 ...
8001	[ 1 6 11 16 21 26 31 36 41 46 51 ...	[1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3...	['g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' '...	[0.1 0.6 0.1 0.6 0.1 0.6 0.1 0.6 0.1 0.6 0.1 0...	[ 1 6 11 16 21 26 31 36 41 46 51 ...
8002	[ 2 7 12 17 22 27 32 37 42 47 52 ...	[2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0...	['g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' '...	[0.2 0.7 0.2 0.7 0.2 0.7 0.2 0.7 0.2 0.7 0.2 0...	[ 2 7 12 17 22 27 32 37 42 47 52 ...
8003	[ 3 8 13 18 23 28 33 38 43 48 53 ...	[3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1...	['g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' '...	[0.3 0.8 0.3 0.8 0.3 0.8 0.3 0.8 0.3 0.8 0.3 0...	[ 3 8 13 18 23 28 33 38 43 48 53 ...
8004	[ 4 9 14 19 24 29 34 39 44 49 54 ...	[0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2...	['g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' '...	[0.4 0.9 0.4 0.9 0.4 0.9 0.4 0.9 0.4 0.9 0.4 0...	[ 4 9 14 19 24 29 34 39 44 49 54 ...

[7]:

packed.ts["flux"]

[7]:

8000    0
8000    1
8000    2
8000    3
8000    0
       ..
8004    3
8004    0
8004    1
8004    2
8004    3
Name: flux, Length: 1000, dtype: int64[pyarrow]

[8]:

packed.ts[["time", "flux"]]

[8]:

8000         time  flux
0       0     0
1       5     ...
8001         time  flux
0       1     1
1       6     ...
8002         time  flux
0       2     2
1       7     ...
8003         time  flux
0       3     3
1       8     ...
8004         time  flux
0       4     0
1       9     ...
Name: sources, dtype: ts<time: [int64], flux: [int64]>

[9]:

packed.dtype

[9]:

ts<time: [int64], flux: [int64], band: [string], err: [double], count: [int64]>

Modify underlying fields with `.ts` accessor

[10]:

# Change flux in place with flat arrays
packed.ts["flux"] = -2 * packed.ts["flux"]
packed.ts["flux"]

[10]:

8000     0
8000    -2
8000    -4
8000    -6
8000     0
        ..
8004    -6
8004     0
8004    -2
8004    -4
8004    -6
Name: flux, Length: 1000, dtype: int64[pyarrow]

[11]:

# Change errors for object 8003
light_curve = packed.loc[8003]
light_curve["err"] += 25
# packed.lpc[8003] = ... does not work
packed.iloc[3:4] = [light_curve]
packed.iloc[0]

[11]:

	time	flux	band	err	count
0	0	0	g	0.0	0
1	5	-2	g	0.5	5
2	10	-4	g	0.0	10
3	15	-6	g	0.5	15
4	20	0	g	0.0	20
...	...	...	...	...	...
195	975	-6	z	0.5	975
196	980	0	z	0.0	980
197	985	-2	z	0.5	985
198	990	-4	z	0.0	990
199	995	-6	z	0.5	995

200 rows × 5 columns

[12]:

# Delete field and add new one
del packed.ts["count"]
packed.ts["filters"] = "lsst_" + packed.ts.pop_field("band")
packed

[12]:

8000         time  flux  err filters
0       0     0  ...
8001         time  flux  err filters
0       1    -2  ...
8002         time  flux  err filters
0       2    -4  ...
8003         time  flux   err filters
0       3    -6 ...
8004         time  flux  err filters
0       4     0  ...
Name: sources, dtype: ts<time: [int64], flux: [int64], band: [string], err: [double], count: [int64]>

Change all items and pack to a new Series

[13]:

# Subsample light curves
dfs = packed.apply(lambda df: df.iloc[::50])
subsampled = pack_dfs(dfs, name="subsampled")
packed.loc[8000], subsampled.loc[8000]

[13]:

(     time  flux  err filters
 0       0     0  0.0  lsst_g
 1       5    -2  0.5  lsst_g
 2      10    -4  0.0  lsst_g
 3      15    -6  0.5  lsst_g
 4      20     0  0.0  lsst_g
 ..    ...   ...  ...     ...
 195   975    -6  0.5  lsst_z
 196   980     0  0.0  lsst_z
 197   985    -2  0.5  lsst_z
 198   990    -4  0.0  lsst_z
 199   995    -6  0.5  lsst_z

 [200 rows x 4 columns],
    time  flux  err filters
 0     0     0  0.0  lsst_g
 1   250    -4  0.0  lsst_r
 2   500     0  0.0  lsst_i
 3   750    -4  0.0  lsst_z)

[14]:

# Query sources
# Currently, empty objects will be removed from the packed series
packed.ts.query_flat("err < 0.5")

[14]:

8000        time  flux  err filters
0      0     0  0....
8001        time  flux  err filters
0      1    -2  0....
8002        time  flux  err filters
0      2    -4  0....
8004        time  flux  err filters
0      4     0  0....
dtype: ts<time: [int64], flux: [int64], err: [double], filters: [string]>

[ ]:

Introducing Jupyter Notebooks in Sphinx

Single item of the packed series is returned as a new DataFrame

Get packed sources series and play with .ts accessor

Modify underlying fields with .ts accessor

Change all items and pack to a new Series

Get packed sources series and play with `.ts` accessor

Modify underlying fields with `.ts` accessor