{ "cells": [ { "cell_type": "markdown", "id": "859eeb76-f3cc-4954-9416-ef12a8a61138", "metadata": {}, "source": [ "# Getting started with Diffsky mocks\n", "\n", "This notebook shows how to load diffsky data, and compute photometry and high-res SEDs.\n", "\n", "## Diffsky-native data format\n", "The mock data natively produced by diffsky is stored as a collection of flat hdf5 files.\n", "Each hdf5 file has a name such as:\n", "\n", "`lc_cores-{stepnum}.{lc_patch}.diffsky_gals.hdf5`,\n", "\n", "where `stepnum` specifies the N-body simulation snapshot, and `lc_patch` specifies the patch of sky. Galaxies with the same `stepnum` occupy the same narrow range of redshift, and galaxies with the same `lc_patch` fall within the same contiguous patch of `{ra, dec}`.\n", "\n", "The next cell shows how to load all the mock data stored in a single such file. (If you only need a subset of the columns, you can pass the `keys` keyward argument to the load_diffsky_lc_patch)." ] }, { "cell_type": "code", "execution_count": null, "id": "bb379d27-7415-4316-a5c9-fcd61dff1942", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "from matplotlib import pyplot as plt\n", "\n", "import os\n", "drn_mock = \"mock_download_dir\"\n", "bn_mock = \"lc_cores-453.0.diffsky_gals.hdf5\"\n", "fn_mock = os.path.join(drn_mock, bn_mock)\n", "\n", "from diffsky.data_loaders.hacc_utils import load_lc_mock as llcm\n", "diffsky_lc_patch, metadata = llcm.load_diffsky_lc_patch(fn_mock)" ] }, { "cell_type": "markdown", "id": "db72f4d3-4849-473e-8271-45172b9f0d70", "metadata": {}, "source": [ "### OpenCosmo-formatted data\n", "\n", "There are separate [OpenCosmo](https://opencosmo.readthedocs.io/en/stable/)-formatted data products that enable efficient querying and filtering with the OpenCosmo toolkit.\n", "\n", "This notebook demonstrates how to load and analyze diffsky-native data. A separate OpenCosmo-based tutorial is coming soon." ] }, { "cell_type": "markdown", "id": "874d4f92-e125-4f53-be7d-952ec5d3675f", "metadata": {}, "source": [ "### Inspect the mock data\n", "\n", "The `diffsky_lc_patch` dictionary stores every column of the mock, along with supplementary metadata needed to compute photometry.\n", "\n", "In this next cell we'll take a look at a histogram of specific star formation rate, sSFR = log10(Mstar/SFR)." ] }, { "cell_type": "code", "execution_count": null, "id": "1f336449-8222-49e8-a08a-0840cb182cf0", "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(1, 1)\n", "__=ax.hist(diffsky_lc_patch['logssfr_obs'], bins=40)\n", "xlabel = ax.set_xlabel('sSFR')" ] }, { "cell_type": "markdown", "id": "5e91c519-7ea7-4b43-bae7-bc63d13628ea", "metadata": {}, "source": [ "## Examining mock photometry\n", "\n", "The data stored in `diffsky_lc_patch` and `metadata` have all that is needed in order to recompute the photometry in the mock with the `compute_phot_from_mock` function. In the next few cells we'll see how to recompute the photometry for a subset of the transmission curves used to make the mock.\n", "\n", "The `tcurves` entry of `metadata` is a `namedtuple` that stores _all_ of the transmission curves that were used to generate the mock photometry. You can see what bandpasses are present in the mock by inspecting the fields:" ] }, { "cell_type": "code", "execution_count": null, "id": "b214299c-c273-4350-8b31-ccc3b2050624", "metadata": {}, "outputs": [], "source": [ "print(metadata['tcurves']._fields)" ] }, { "cell_type": "markdown", "id": "b7215a7a-24ca-4a76-b100-f19ba0510356", "metadata": {}, "source": [ "For each bandpass, there is a corresponding column of mock data storing the photometry through that filter:" ] }, { "cell_type": "code", "execution_count": null, "id": "8f230618-b553-4de8-98de-c68e44380afa", "metadata": {}, "outputs": [], "source": [ "for bandpass_name in metadata['tcurves']._fields:\n", " assert bandpass_name in diffsky_lc_patch.keys()" ] }, { "cell_type": "markdown", "id": "8e857001-7e37-4951-b1e6-ffb6cb195bac", "metadata": {}, "source": [ "## Recomputing mock photometry\n", "\n", "In the next few cells, we'll recompute photometry through the `lsst_u` and `lsst_i` bands, and then compare the recomputed result to the corresponding columns of the mock.\n", "\n", "### Calculating photometry/SEDs in batches\n", "Depending on your available computing resources, you may need to compute photometry for only a chunk of the data at a time to avoid overflowing memory. \n", "\n", "**Important note:** Use the `get_lc_mock_chunk` function to grab a chunk of mock data. If you attempt to compute photometry or SEDs for an arbitrarily subdivided `diffsky_lc_patch`, you will get incorrect results, because the data chunk needs to have a complete set of satellite galaxies for each central." ] }, { "cell_type": "code", "execution_count": null, "id": "f65c26e5-e189-41a6-ba9d-35338de33275", "metadata": {}, "outputs": [], "source": [ "batch_size = 20\n", "nchunks = llcm.estimate_nchunks(fn_mock, batch_size)\n", "\n", "chunknum = 0\n", "diffsky_lc_patch_chunk = llcm.get_lc_mock_chunk(\n", " diffsky_lc_patch, metadata,\n", " nchunks=nchunks, chunknum=chunknum)" ] }, { "cell_type": "markdown", "id": "67384bbe-dfda-47b3-bbba-cda5d9a0742e", "metadata": {}, "source": [ "By default, the `compute_phot_from_mock` function assumes you want to recompute photometry through all available bands. You can use the `tcurves` argument to select a subset of transmission curves for which to compute photometry. In this next cell, we'll pick just two bands: *u* and *i*, and show that our recomputed photometry agrees with the corresponding columns in the mock." ] }, { "cell_type": "code", "execution_count": null, "id": "adf6eb8f-59d7-4f42-b69d-c0b777b46710", "metadata": {}, "outputs": [], "source": [ "all_tcurves = metadata['tcurves']\n", "\n", "from collections import namedtuple\n", "Tcurves = namedtuple(\"Tcurves\", (\"lsst_u\", \"lsst_i\"))\n", "tcurves = Tcurves(all_tcurves.lsst_u, all_tcurves.lsst_i)\n", "\n", "from diffsky.data_loaders.hacc_utils import sed_from_mock\n", "phot_info = sed_from_mock.compute_phot_from_mock(\n", " diffsky_lc_patch_chunk, metadata, tcurves=tcurves\n", ")\n", "\n", "assert np.allclose(\n", " diffsky_lc_patch_chunk['lsst_u'], \n", " phot_info['obs_mags'][:, 0], rtol=1e-4)\n", "assert np.allclose(\n", " diffsky_lc_patch_chunk['lsst_i'], \n", " phot_info['obs_mags'][:, 1], rtol=1e-4)" ] }, { "cell_type": "markdown", "id": "c78850f0-b279-4cbc-b60c-6acb8dc3a717", "metadata": {}, "source": [ "### Recompute photometry of disk/bulge/knot decomposition\n", "\n", "The photometry of each morphological component can be calculated using the `compute_dbk_phot_from_diffsky_mock` function." ] }, { "cell_type": "code", "execution_count": null, "id": "18e3fe54-69c9-4e3b-a3c7-7b6c50538460", "metadata": {}, "outputs": [], "source": [ "dbk_phot_info, dbk_weights = sed_from_mock.compute_dbk_phot_from_mock(\n", " diffsky_lc_patch_chunk, metadata, tcurves=tcurves)\n", "\n", "assert np.allclose(\n", " diffsky_lc_patch_chunk['lsst_u_bulge'], \n", " dbk_phot_info['obs_mags_bulge'][:, 0], rtol=1e-4)\n", "assert np.allclose(\n", " diffsky_lc_patch_chunk['lsst_i_bulge'], \n", " dbk_phot_info['obs_mags_bulge'][:, 1], rtol=1e-4)" ] }, { "cell_type": "markdown", "id": "0927f1b6-d8db-4ead-9dc7-1741b3784aa0", "metadata": {}, "source": [ "### Compute SED of diffsky galaxies\n", "\n", "There is also a convenience function for computing the high-resolution SED of each diffsky galaxy." ] }, { "cell_type": "code", "execution_count": null, "id": "b001e9a6-803c-4e34-bcc7-1e7e08bb37c1", "metadata": {}, "outputs": [], "source": [ "sed_info = sed_from_mock.compute_sed_from_mock(\n", " diffsky_lc_patch_chunk, metadata)" ] }, { "cell_type": "code", "execution_count": null, "id": "a8990858-99eb-4eb6-b955-902621c564ce", "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(1, 1)\n", "__=ax.loglog()\n", "xlim = ax.set_xlim(1_100, 9_000)\n", "ylim = ax.set_ylim(1e-8, 1e-3)\n", "\n", "igal = 5\n", "__=ax.plot(metadata['ssp_data'].ssp_wave, \n", " sed_info['rest_sed'][igal, :])\n", "xlabel = ax.set_xlabel('wavelength [angstroms]')\n", "ylabel = ax.set_ylabel('Lsun/Hz')" ] }, { "cell_type": "markdown", "id": "b5c4768e-2f88-488b-b059-fd9d8c6f5eb5", "metadata": {}, "source": [ "#### Compute SED of disk/bulge/knot components\n", "\n", "There is an additional convenience function for computing the high-resolution SED of each morphological component." ] }, { "cell_type": "code", "execution_count": null, "id": "3ee44a56-3ad8-4345-a013-c2c23c24f863", "metadata": {}, "outputs": [], "source": [ "dbk_sed_info = sed_from_mock.compute_dbk_sed_from_mock(\n", " diffsky_lc_patch_chunk, metadata)" ] }, { "cell_type": "code", "execution_count": null, "id": "1b6591a1-0296-4a96-9dc8-aba40e77be81", "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(1, 1)\n", "__=ax.loglog()\n", "xlim = ax.set_xlim(1_100, 9_000)\n", "ylim = ax.set_ylim(1e-8, 1e-3)\n", "\n", "igal = 12\n", "__=ax.plot(\n", " metadata['ssp_data'].ssp_wave, \n", " dbk_sed_info['rest_sed_disk'][igal, :], label='disk')\n", "__=ax.plot(\n", " metadata['ssp_data'].ssp_wave, \n", " dbk_sed_info['rest_sed_bulge'][igal, :], label='bulge')\n", "__=ax.plot(\n", " metadata['ssp_data'].ssp_wave, \n", " dbk_sed_info['rest_sed_knots'][igal, :], label='star-forming knots')\n", "xlabel = ax.set_xlabel('wavelength [angstroms]')\n", "ylabel = ax.set_ylabel('Lsun/Hz')\n", "leg = ax.legend()" ] }, { "cell_type": "markdown", "id": "dd1f342c-1033-47b5-81f6-9a6eceb5fb1f", "metadata": {}, "source": [ "#### Computing photometry in other bands\n", "\n", "You can compute photometry in other bandpasses by using the `tcurves` argument to store any sequence of transmission curves.\n", "Each individual transmission curve must be a namedtuple with two fields: `wave` and `transmission`;\n", "your sequence of transmission curves also needs to be formatted as a namedtuple, \n", "using whatever names you want to name each bandpass. \n", "\n", "The next few cells show how to compute photometry through two arbitrary bandpasses." ] }, { "cell_type": "code", "execution_count": null, "id": "af7f14f7-f0aa-4115-9d28-c6fac1df5581", "metadata": {}, "outputs": [], "source": [ "from jax.scipy.stats import norm as jnorm\n", "from collections import namedtuple\n", "TransmissionCurve = namedtuple(\n", " \"TransmissionCurve\", (\"wave\", \"transmission\"))\n", "\n", "wave = np.linspace(200, 8_000, 500)\n", "\n", "fake_tcurve1 = jnorm.pdf(wave, loc=3_000.0, scale=500.0)\n", "fake_tcurve1 = TransmissionCurve(\n", " wave, fake_tcurve1/fake_tcurve1.max())\n", "\n", "fake_tcurve2 = jnorm.pdf(wave, loc=5_000.0, scale=500.0)\n", "fake_tcurve2 = TransmissionCurve(\n", " wave, fake_tcurve2/fake_tcurve2.max())\n", "\n", "fig, ax = plt.subplots(1, 1)\n", "xlabel = ax.set_xlabel('wavelength [angstroms]')\n", "ylim = ax.set_ylim(0, 1.5)\n", "__=ax.plot(fake_tcurve1.wave, \n", " fake_tcurve1.transmission, label='fake_tcurve1')\n", "__=ax.plot(fake_tcurve2.wave, \n", " fake_tcurve2.transmission, label='fake_tcurve2')\n", "leg = ax.legend()" ] }, { "cell_type": "code", "execution_count": null, "id": "aec15422-b26a-4851-bb78-64ba0e4d08e0", "metadata": {}, "outputs": [], "source": [ "Tcurves = namedtuple(\"Tcurves\", (\"fake_tcurve1\", \"fake_tcurve2\"))\n", "fake_tcurves = Tcurves(fake_tcurve1, fake_tcurve2)\n", "\n", "dbk_phot_info, dbk_weights = sed_from_mock.compute_dbk_phot_from_mock(\n", " diffsky_lc_patch_chunk, metadata, tcurves=fake_tcurves)" ] }, { "cell_type": "code", "execution_count": null, "id": "74e92570-3213-46f9-b352-6b2f003f60bc", "metadata": {}, "outputs": [], "source": [ "print(dbk_phot_info['obs_mags'].shape)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.11" } }, "nbformat": 4, "nbformat_minor": 5 }