5_BNE_workflow.py — BNE Batch Script

This script computes the Bond-Network Entropy for a configurable range of LAE sizes and saves one HDF5 file per size. It is the data source for Notebook 6 (BNE growth-rate analysis).

When to run: before opening Notebook 6, unless you are using the pre-computed reference data already included in tutorials/bond_network_entropy/data/.

cd tutorials/bond_network_entropy
python 5_BNE_workflow.py

Output files are written to data/bond_network_entropy/structure_0/entropy_number_<N>.hdf5. To change the LAE size range, edit the local_environment_nat list in the if __name__ == "__main__": block at the bottom of the script.

  1"""
  25_BNE_workflow.py — Compute Bond-Network Entropy for a range of LAE sizes
  3==========================================================================
  4
  5This script calculates the Bond-Network Entropy (BNE) for a silica glass
  6structure and saves the results to HDF5 files.  It is the data source for
  7notebook 6 (BNE growth rate analysis).
  8
  9What is BNE?
 10------------
 11BNE is a set of numbers dependent on the sizes of local environments that 
 12describe how variable local environments are within a structure. Its growth rate 
 13can be used to quantify how disordered an atomic structure is.
 14It is computed from the H1 persistent homology barcodes of each atom's local
 15atomic environment (LAE).  
 16A barcode is a fingerprint of the local environment that finds 
 17algebraically independent rings in it.  
 18Overall, BNE is the Shannon entropy of the
 19distribution of these fingerprints across all atoms:
 20
 21    BNE = -sum_i  p_i * log(p_i)
 22
 23where p_i is the fraction of atoms that share the same barcode fingerprint.
 24Higher BNE means more variety of local environments → more topological disorder.
 25
 26How to run
 27----------
 28From the examples/ directory:
 29
 30    python 5_BNE_workflow.py
 31
 32Pre-computed data for LAE sizes 1-80 is already in
 33examples/data/bond_network_entropy/ and were calculated using this script.
 34To change the LAE size range, edit `local_environment_nat` in the ``__main__``
 35block at the bottom of this file.
 36
 37Output
 38------
 39One HDF5 file per LAE size, saved as:
 40    data/bond_network_entropy/structure_0/entropy_number_<N>.hdf5
 41
 42Each file stores:
 43    - "entropy"         : scalar BNE value
 44    - "probabilities"   : array of barcode class probabilities
 45    - "number_of_atoms" : LAE size N used for this calculation
 46"""
 47
 48import os
 49
 50import numpy as np
 51import h5py
 52from tqdm import tqdm
 53
 54# obtain_distances_big_structures is the fast manual MIC implementation suited
 55# for large structures.  It operates directly on NumPy arrays (positions and
 56# lattice vectors) rather than an ASE Atoms object, which avoids per-atom
 57# Python overhead inside the ASE call.
 58#
 59# WARNING — known limitations of this implementation:
 60#   1. It is only *exact* for orthorhombic cells (a, b, c mutually perpendicular).
 61#   2. For monoclinic or triclinic cells the single-image MIC wrap should only be accurate
 62#.     for distances shorter than the half the smallest cell dimension (d < min(|a|, |b|, |c|) / 2).
 63#      Distances beyond that threshold may be assigned to the wrong periodic image.
 64#      For the Si–O bond cutoff of 2.1 Å and an orthorhombic silica unit cell this is fine,
 65#      but always verify before applying to a different structure.
 66#
 67# For non-orthorhombic cells and large cutoffs, use obtain_distances_ase instead.
 68from smooth_disorder.structural import obtain_distances_big_structures, obtain_positions_and_lattice_vectors
 69
 70
 71from smooth_disorder.barcode import (
 72    obtain_local_number_environment_big_structures,
 73    obtain_H1_barcode,
 74    reduce_barcode,
 75    mu,
 76)
 77
 78
 79# ---------------------------------------------------------------------------
 80# Saving results
 81# ---------------------------------------------------------------------------
 82
 83def save_bne_data_to_files(filename, probabilities, entropy, LE_nat, filepath):
 84    """
 85    Save BNE results for one LAE size to an HDF5 file.
 86
 87    Parameters
 88    ----------
 89    filename : str
 90        Output file path without the .hdf5 extension.
 91    probabilities : np.ndarray
 92        Empirical probability of each barcode equivalence class.
 93    entropy : float
 94        Shannon entropy (BNE) computed from the probabilities.
 95    LE_nat : int
 96        Number of atoms in the local atomic environment (LAE size).
 97    filepath : str
 98        Path to the input structure file (stored for reference).
 99    """
100    with h5py.File(f"{filename}.hdf5", "w") as w:
101        w.create_dataset("probabilities", data=probabilities, compression="gzip")
102        w.create_dataset("entropy", data=np.array([entropy]), compression="gzip")
103        w.create_dataset("number_of_atoms", data=np.array([LE_nat]), compression="gzip")
104
105
106# ---------------------------------------------------------------------------
107# Main calculation
108# ---------------------------------------------------------------------------
109
110def calculate_barcode(distances, idx_distances, adjacency_matrix, n_atoms, save_folder, structure_idx, filepath, local_environment_nat):
111    """
112    Compute BNE for a range of LAE sizes and save results to disk.
113
114    For each LAE size N:
115      1. Extract the N-atom local environment around every atom using BFS.
116      2. Compute the H1 barcode (ring fingerprint) for each local environment.
117      3. Count how many atoms share the same barcode.
118      4. Convert counts to probabilities and compute Shannon entropy = BNE.
119      5. Save results to an HDF5 file.
120
121    Parameters
122    ----------
123    distances : np.ndarray, shape (n_atoms, n_smallest)
124        Pre-computed sorted nearest-neighbour distances (Ångström).
125    idx_distances : np.ndarray, shape (n_atoms, n_smallest)
126        Global atom indices corresponding to those distances.
127    adjacency_matrix : np.ndarray, shape (n_atoms, n_smallest)
128        Binary matrix: entry [i, j] = 1 if atoms i and idx_distances[i, j] are bonded.
129    n_atoms : int
130        Total number of atoms in the structure.
131    save_folder : str
132        Root directory for output HDF5 files.
133    structure_idx : int
134        Index label for this structure (used in the output subdirectory name).
135    filepath : str
136        Path to the original structure file.
137    local_environment_nat : list of int
138        LAE sizes to compute BNE for.  Defined in the ``__main__`` block.
139    """
140    # Create the output directory if it does not exist yet.
141    os.makedirs(f"{save_folder}/structure_{structure_idx}", exist_ok=True)
142
143    for LE_nat in local_environment_nat:
144        print(f"Current LAE size = {LE_nat}")
145
146        # Gs collects barcodes grouped by their shape (dimensions of the array).
147        # Two reduced barcodes with different shapes can never be equal, so grouping by
148        # shape first makes the equality checks much faster.
149        Gs = {}
150
151        for atom_index in tqdm(range(n_atoms), desc="Atoms"):
152
153            # Step 1 — extract the local atomic environment (LAE).
154            # Starting from atom_index, we grow outward by BFS until we have
155            # included LE_nat atoms.  The result is a small sub-graph.
156            local_adjacency_matrix, layers, local_atom_index, global_index = \
157                obtain_local_number_environment_big_structures(
158                    adjacency_matrix=adjacency_matrix,
159                    atom_index=atom_index,
160                    distances=distances,
161                    idx_distances=idx_distances,
162                    n_environment_atoms=LE_nat,
163                )
164
165            # Step 2 — compute the H1 barcode of the local sub-graph.
166            G, F = obtain_H1_barcode(
167                adjacency_matrix=local_adjacency_matrix,
168                layers=layers,
169                mu=mu,
170            )
171
172            # Step 3 — reduce the barcode to its canonical form so that two
173            # topologically identical environments always give the same array.
174            G = reduce_barcode(G)
175
176            # Accumulate barcodes, grouped by shape for efficient comparison.
177            key = G.shape
178            Gs.setdefault(key, []).append(G)
179
180        # Step 4 — count equivalence classes (atoms with identical barcodes).
181        # np.unique(..., axis=0) finds the distinct rows across all barcodes
182        # that share the same shape, and counts how many times each barcode appears.
183        class_counts = []
184        for key, value in Gs.items():
185            _, class_count = np.unique(value, return_counts=True, axis=0)
186            class_counts.extend(class_count)
187
188        class_counts = np.array(class_counts)
189
190        # Convert raw counts to probabilities (must sum to 1).
191        probabilities = class_counts / class_counts.sum()
192
193        # Step 5 — Shannon entropy: BNE = -sum_i p_i * log(p_i).
194        entropy = -(probabilities * np.log(probabilities)).sum()
195
196        # Save results for this LAE size.
197        BNE_save_filename = f"{save_folder}/structure_{structure_idx}/entropy_number_{LE_nat}"
198        save_bne_data_to_files(BNE_save_filename, probabilities, entropy, LE_nat, filepath)
199
200
201# ---------------------------------------------------------------------------
202# Entry point
203# ---------------------------------------------------------------------------
204
205if __name__ == "__main__":
206
207    # Output directory (relative to examples/).
208    save_directory = "./data"
209    BNE_folder = "bond_network_entropy"
210    save_folder = f"{save_directory}/{BNE_folder}"
211
212    # Input structure: silica glass with 5184 atoms.
213    # Downloaded from https://www.pnas.org/doi/abs/10.1073/pnas.2422763122
214    structure_filename = "./data/structural/silica_glass_5184_atoms/POSCAR"
215    atomic_positions, lattice_vectors = obtain_positions_and_lattice_vectors(structure_filename)
216    n_atoms = len(atomic_positions)
217
218    # Compute nearest-neighbour distances for every atom.
219    # n_smallest=300 is well above the LAE sizes we compute (up to 80 atoms),
220    # so the pre-computed neighbour list is always large enough.
221    distances, idx_distances = obtain_distances_big_structures(atomic_positions, lattice_vectors, n_smallest=300)
222
223    # Build the adjacency matrix: two atoms are bonded if their distance is
224    # between 0.1 Å (avoids self-distance = 0) and the cutoff.
225    # 2.1 Å is the Si-O bond cutoff appropriate for silica glass.
226    cutoff = 2.1
227    adjacency_matrix = ((distances < cutoff) & (distances > 0.1)).astype(int)
228
229    # Label for this structure (used in the output directory name).
230    structure_idx = 0
231
232    # LAE sizes to compute — edit this list to change the sweep range.
233    # Pre-computed data covers 1–80; extend or restrict as needed.
234    local_environment_nat = np.arange(1, 81, 1).tolist()
235
236    calculate_barcode(distances, idx_distances, adjacency_matrix, n_atoms, save_folder, structure_idx, structure_filename, local_environment_nat)