Introducing Jupyter Notebooks in Sphinx
This notebook showcases very basic functionality of rendering your jupyter notebooks as tutorials inside your sphinx documentation.
As part of the LINCC Frameworks python project template, your notebooks will be executed AND rendered at document build time.
You can read more about Sphinx, ReadTheDocs, and building notebooks in LINCC’s documentation
# Create a source table and pack it into nested structures and lists[1]:
import numpy as np
import pandas as pd
from pandas_ts.packer import pack_flat, pack_dfs
# Adopted from
# https://github.com/lincc-frameworks/tape/blob/6a694c4c138aadb1508c2a96de4fa63f90319331/tests/tape_tests/conftest.py#L15
def create_test_rows():
num_points = 1000
all_bands = np.array(["g", "r", "i", "z"])
rows = {
"id": 8000 + (np.arange(num_points) % 5),
"time": np.arange(num_points),
"flux": np.arange(num_points) % len(all_bands),
"band": np.repeat(all_bands, num_points / len(all_bands)),
"err": 0.1 * (np.arange(num_points) % 10),
"count": np.arange(num_points),
# Not sure that I'm ready for Nones
# "something_else": np.full(num_points, None),
}
return rows
sources = pd.DataFrame(create_test_rows())
sources.set_index("id", inplace=True)
sources
[1]:
| time | flux | band | err | count | |
|---|---|---|---|---|---|
| id | |||||
| 8000 | 0 | 0 | g | 0.0 | 0 |
| 8001 | 1 | 1 | g | 0.1 | 1 |
| 8002 | 2 | 2 | g | 0.2 | 2 |
| 8003 | 3 | 3 | g | 0.3 | 3 |
| 8004 | 4 | 0 | g | 0.4 | 4 |
| ... | ... | ... | ... | ... | ... |
| 8000 | 995 | 3 | z | 0.5 | 995 |
| 8001 | 996 | 0 | z | 0.6 | 996 |
| 8002 | 997 | 1 | z | 0.7 | 997 |
| 8003 | 998 | 2 | z | 0.8 | 998 |
| 8004 | 999 | 3 | z | 0.9 | 999 |
1000 rows × 5 columns
[2]:
packed = pack_flat(sources, name="sources")
packed
[2]:
8000 time flux band err count
0 0 ...
8001 time flux band err count
0 1 ...
8002 time flux band err count
0 2 ...
8003 time flux band err count
0 3 ...
8004 time flux band err count
0 4 ...
Name: sources, dtype: ts<time: [int64], flux: [int64], band: [string], err: [double], count: [int64]>
Single item of the packed series is returned as a new DataFrame
[3]:
packed.iloc[0]
[3]:
| time | flux | band | err | count | |
|---|---|---|---|---|---|
| 0 | 0 | 0 | g | 0.0 | 0 |
| 1 | 5 | 1 | g | 0.5 | 5 |
| 2 | 10 | 2 | g | 0.0 | 10 |
| 3 | 15 | 3 | g | 0.5 | 15 |
| 4 | 20 | 0 | g | 0.0 | 20 |
| ... | ... | ... | ... | ... | ... |
| 195 | 975 | 3 | z | 0.5 | 975 |
| 196 | 980 | 0 | z | 0.0 | 980 |
| 197 | 985 | 1 | z | 0.5 | 985 |
| 198 | 990 | 2 | z | 0.0 | 990 |
| 199 | 995 | 3 | z | 0.5 | 995 |
200 rows × 5 columns
[4]:
# Get the linearly interpolated flux for time=10
packed.apply(lambda df: np.interp(10.0, df["time"], df["flux"]))
[4]:
8000 2.0
8001 2.8
8002 1.2
8003 0.4
8004 1.2
Name: sources, dtype: float64
Get packed sources series and play with .ts accessor
This series is a collection of structures, each structure consist of multiple fields, and each field is a “list” of values.
[5]:
packed.ts.to_flat()
[5]:
| time | flux | band | err | count | |
|---|---|---|---|---|---|
| 8000 | 0 | 0 | g | 0.0 | 0 |
| 8000 | 5 | 1 | g | 0.5 | 5 |
| 8000 | 10 | 2 | g | 0.0 | 10 |
| 8000 | 15 | 3 | g | 0.5 | 15 |
| 8000 | 20 | 0 | g | 0.0 | 20 |
| ... | ... | ... | ... | ... | ... |
| 8004 | 979 | 3 | z | 0.9 | 979 |
| 8004 | 984 | 0 | z | 0.4 | 984 |
| 8004 | 989 | 1 | z | 0.9 | 989 |
| 8004 | 994 | 2 | z | 0.4 | 994 |
| 8004 | 999 | 3 | z | 0.9 | 999 |
1000 rows × 5 columns
[6]:
packed.ts.to_lists()
[6]:
| time | flux | band | err | count | |
|---|---|---|---|---|---|
| 8000 | [ 0 5 10 15 20 25 30 35 40 45 50 ... | [0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2... | ['g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' '... | [0. 0.5 0. 0.5 0. 0.5 0. 0.5 0. 0.5 0. 0... | [ 0 5 10 15 20 25 30 35 40 45 50 ... |
| 8001 | [ 1 6 11 16 21 26 31 36 41 46 51 ... | [1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3... | ['g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' '... | [0.1 0.6 0.1 0.6 0.1 0.6 0.1 0.6 0.1 0.6 0.1 0... | [ 1 6 11 16 21 26 31 36 41 46 51 ... |
| 8002 | [ 2 7 12 17 22 27 32 37 42 47 52 ... | [2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0... | ['g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' '... | [0.2 0.7 0.2 0.7 0.2 0.7 0.2 0.7 0.2 0.7 0.2 0... | [ 2 7 12 17 22 27 32 37 42 47 52 ... |
| 8003 | [ 3 8 13 18 23 28 33 38 43 48 53 ... | [3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1... | ['g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' '... | [0.3 0.8 0.3 0.8 0.3 0.8 0.3 0.8 0.3 0.8 0.3 0... | [ 3 8 13 18 23 28 33 38 43 48 53 ... |
| 8004 | [ 4 9 14 19 24 29 34 39 44 49 54 ... | [0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2... | ['g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' 'g' '... | [0.4 0.9 0.4 0.9 0.4 0.9 0.4 0.9 0.4 0.9 0.4 0... | [ 4 9 14 19 24 29 34 39 44 49 54 ... |
[7]:
packed.ts["flux"]
[7]:
8000 0
8000 1
8000 2
8000 3
8000 0
..
8004 3
8004 0
8004 1
8004 2
8004 3
Name: flux, Length: 1000, dtype: int64[pyarrow]
[8]:
packed.ts[["time", "flux"]]
[8]:
8000 time flux
0 0 0
1 5 ...
8001 time flux
0 1 1
1 6 ...
8002 time flux
0 2 2
1 7 ...
8003 time flux
0 3 3
1 8 ...
8004 time flux
0 4 0
1 9 ...
Name: sources, dtype: ts<time: [int64], flux: [int64]>
[9]:
packed.dtype
[9]:
ts<time: [int64], flux: [int64], band: [string], err: [double], count: [int64]>
Modify underlying fields with .ts accessor
[10]:
# Change flux in place with flat arrays
packed.ts["flux"] = -2 * packed.ts["flux"]
packed.ts["flux"]
[10]:
8000 0
8000 -2
8000 -4
8000 -6
8000 0
..
8004 -6
8004 0
8004 -2
8004 -4
8004 -6
Name: flux, Length: 1000, dtype: int64[pyarrow]
[11]:
# Change errors for object 8003
light_curve = packed.loc[8003]
light_curve["err"] += 25
# packed.lpc[8003] = ... does not work
packed.iloc[3:4] = [light_curve]
packed.iloc[0]
[11]:
| time | flux | band | err | count | |
|---|---|---|---|---|---|
| 0 | 0 | 0 | g | 0.0 | 0 |
| 1 | 5 | -2 | g | 0.5 | 5 |
| 2 | 10 | -4 | g | 0.0 | 10 |
| 3 | 15 | -6 | g | 0.5 | 15 |
| 4 | 20 | 0 | g | 0.0 | 20 |
| ... | ... | ... | ... | ... | ... |
| 195 | 975 | -6 | z | 0.5 | 975 |
| 196 | 980 | 0 | z | 0.0 | 980 |
| 197 | 985 | -2 | z | 0.5 | 985 |
| 198 | 990 | -4 | z | 0.0 | 990 |
| 199 | 995 | -6 | z | 0.5 | 995 |
200 rows × 5 columns
[12]:
# Delete field and add new one
del packed.ts["count"]
packed.ts["filters"] = "lsst_" + packed.ts.pop_field("band")
packed
[12]:
8000 time flux err filters
0 0 0 ...
8001 time flux err filters
0 1 -2 ...
8002 time flux err filters
0 2 -4 ...
8003 time flux err filters
0 3 -6 ...
8004 time flux err filters
0 4 0 ...
Name: sources, dtype: ts<time: [int64], flux: [int64], band: [string], err: [double], count: [int64]>
Change all items and pack to a new Series
[13]:
# Subsample light curves
dfs = packed.apply(lambda df: df.iloc[::50])
subsampled = pack_dfs(dfs, name="subsampled")
packed.loc[8000], subsampled.loc[8000]
[13]:
( time flux err filters
0 0 0 0.0 lsst_g
1 5 -2 0.5 lsst_g
2 10 -4 0.0 lsst_g
3 15 -6 0.5 lsst_g
4 20 0 0.0 lsst_g
.. ... ... ... ...
195 975 -6 0.5 lsst_z
196 980 0 0.0 lsst_z
197 985 -2 0.5 lsst_z
198 990 -4 0.0 lsst_z
199 995 -6 0.5 lsst_z
[200 rows x 4 columns],
time flux err filters
0 0 0 0.0 lsst_g
1 250 -4 0.0 lsst_r
2 500 0 0.0 lsst_i
3 750 -4 0.0 lsst_z)
[14]:
# Query sources
# Currently, empty objects will be removed from the packed series
packed.ts.query_flat("err < 0.5")
[14]:
8000 time flux err filters
0 0 0 0....
8001 time flux err filters
0 1 -2 0....
8002 time flux err filters
0 2 -4 0....
8004 time flux err filters
0 4 0 0....
dtype: ts<time: [int64], flux: [int64], err: [double], filters: [string]>
[ ]: