5_BNE_workflow.py — BNE Batch Script¶
This script computes the Bond-Network Entropy for a configurable range of LAE sizes and saves one HDF5 file per size. It is the data source for Notebook 6 (BNE growth-rate analysis).
When to run: before opening Notebook 6, unless you are using the pre-computed
reference data already included in tutorials/bond_network_entropy/data/.
cd tutorials/bond_network_entropy
python 5_BNE_workflow.py
Output files are written to data/bond_network_entropy/structure_0/entropy_number_<N>.hdf5.
To change the LAE size range, edit the local_environment_nat list in the
if __name__ == "__main__": block at the bottom of the script.
1"""
25_BNE_workflow.py — Compute Bond-Network Entropy for a range of LAE sizes
3==========================================================================
4
5This script calculates the Bond-Network Entropy (BNE) for a silica glass
6structure and saves the results to HDF5 files. It is the data source for
7notebook 6 (BNE growth rate analysis).
8
9What is BNE?
10------------
11BNE is a set of numbers dependent on the sizes of local environments that
12describe how variable local environments are within a structure. Its growth rate
13can be used to quantify how disordered an atomic structure is.
14It is computed from the H1 persistent homology barcodes of each atom's local
15atomic environment (LAE).
16A barcode is a fingerprint of the local environment that finds
17algebraically independent rings in it.
18Overall, BNE is the Shannon entropy of the
19distribution of these fingerprints across all atoms:
20
21 BNE = -sum_i p_i * log(p_i)
22
23where p_i is the fraction of atoms that share the same barcode fingerprint.
24Higher BNE means more variety of local environments → more topological disorder.
25
26How to run
27----------
28From the examples/ directory:
29
30 python 5_BNE_workflow.py
31
32Pre-computed data for LAE sizes 1-80 is already in
33examples/data/bond_network_entropy/ and were calculated using this script.
34To change the LAE size range, edit `local_environment_nat` in the ``__main__``
35block at the bottom of this file.
36
37Output
38------
39One HDF5 file per LAE size, saved as:
40 data/bond_network_entropy/structure_0/entropy_number_<N>.hdf5
41
42Each file stores:
43 - "entropy" : scalar BNE value
44 - "probabilities" : array of barcode class probabilities
45 - "number_of_atoms" : LAE size N used for this calculation
46"""
47
48import os
49
50import numpy as np
51import h5py
52from tqdm import tqdm
53
54# obtain_distances_big_structures is the fast manual MIC implementation suited
55# for large structures. It operates directly on NumPy arrays (positions and
56# lattice vectors) rather than an ASE Atoms object, which avoids per-atom
57# Python overhead inside the ASE call.
58#
59# WARNING — known limitations of this implementation:
60# 1. It is only *exact* for orthorhombic cells (a, b, c mutually perpendicular).
61# 2. For monoclinic or triclinic cells the single-image MIC wrap should only be accurate
62#. for distances shorter than the half the smallest cell dimension (d < min(|a|, |b|, |c|) / 2).
63# Distances beyond that threshold may be assigned to the wrong periodic image.
64# For the Si–O bond cutoff of 2.1 Å and an orthorhombic silica unit cell this is fine,
65# but always verify before applying to a different structure.
66#
67# For non-orthorhombic cells and large cutoffs, use obtain_distances_ase instead.
68from smooth_disorder.structural import obtain_distances_big_structures, obtain_positions_and_lattice_vectors
69
70
71from smooth_disorder.barcode import (
72 obtain_local_number_environment_big_structures,
73 obtain_H1_barcode,
74 reduce_barcode,
75 mu,
76)
77
78
79# ---------------------------------------------------------------------------
80# Saving results
81# ---------------------------------------------------------------------------
82
83def save_bne_data_to_files(filename, probabilities, entropy, LE_nat, filepath):
84 """
85 Save BNE results for one LAE size to an HDF5 file.
86
87 Parameters
88 ----------
89 filename : str
90 Output file path without the .hdf5 extension.
91 probabilities : np.ndarray
92 Empirical probability of each barcode equivalence class.
93 entropy : float
94 Shannon entropy (BNE) computed from the probabilities.
95 LE_nat : int
96 Number of atoms in the local atomic environment (LAE size).
97 filepath : str
98 Path to the input structure file (stored for reference).
99 """
100 with h5py.File(f"{filename}.hdf5", "w") as w:
101 w.create_dataset("probabilities", data=probabilities, compression="gzip")
102 w.create_dataset("entropy", data=np.array([entropy]), compression="gzip")
103 w.create_dataset("number_of_atoms", data=np.array([LE_nat]), compression="gzip")
104
105
106# ---------------------------------------------------------------------------
107# Main calculation
108# ---------------------------------------------------------------------------
109
110def calculate_barcode(distances, idx_distances, adjacency_matrix, n_atoms, save_folder, structure_idx, filepath, local_environment_nat):
111 """
112 Compute BNE for a range of LAE sizes and save results to disk.
113
114 For each LAE size N:
115 1. Extract the N-atom local environment around every atom using BFS.
116 2. Compute the H1 barcode (ring fingerprint) for each local environment.
117 3. Count how many atoms share the same barcode.
118 4. Convert counts to probabilities and compute Shannon entropy = BNE.
119 5. Save results to an HDF5 file.
120
121 Parameters
122 ----------
123 distances : np.ndarray, shape (n_atoms, n_smallest)
124 Pre-computed sorted nearest-neighbour distances (Ångström).
125 idx_distances : np.ndarray, shape (n_atoms, n_smallest)
126 Global atom indices corresponding to those distances.
127 adjacency_matrix : np.ndarray, shape (n_atoms, n_smallest)
128 Binary matrix: entry [i, j] = 1 if atoms i and idx_distances[i, j] are bonded.
129 n_atoms : int
130 Total number of atoms in the structure.
131 save_folder : str
132 Root directory for output HDF5 files.
133 structure_idx : int
134 Index label for this structure (used in the output subdirectory name).
135 filepath : str
136 Path to the original structure file.
137 local_environment_nat : list of int
138 LAE sizes to compute BNE for. Defined in the ``__main__`` block.
139 """
140 # Create the output directory if it does not exist yet.
141 os.makedirs(f"{save_folder}/structure_{structure_idx}", exist_ok=True)
142
143 for LE_nat in local_environment_nat:
144 print(f"Current LAE size = {LE_nat}")
145
146 # Gs collects barcodes grouped by their shape (dimensions of the array).
147 # Two reduced barcodes with different shapes can never be equal, so grouping by
148 # shape first makes the equality checks much faster.
149 Gs = {}
150
151 for atom_index in tqdm(range(n_atoms), desc="Atoms"):
152
153 # Step 1 — extract the local atomic environment (LAE).
154 # Starting from atom_index, we grow outward by BFS until we have
155 # included LE_nat atoms. The result is a small sub-graph.
156 local_adjacency_matrix, layers, local_atom_index, global_index = \
157 obtain_local_number_environment_big_structures(
158 adjacency_matrix=adjacency_matrix,
159 atom_index=atom_index,
160 distances=distances,
161 idx_distances=idx_distances,
162 n_environment_atoms=LE_nat,
163 )
164
165 # Step 2 — compute the H1 barcode of the local sub-graph.
166 G, F = obtain_H1_barcode(
167 adjacency_matrix=local_adjacency_matrix,
168 layers=layers,
169 mu=mu,
170 )
171
172 # Step 3 — reduce the barcode to its canonical form so that two
173 # topologically identical environments always give the same array.
174 G = reduce_barcode(G)
175
176 # Accumulate barcodes, grouped by shape for efficient comparison.
177 key = G.shape
178 Gs.setdefault(key, []).append(G)
179
180 # Step 4 — count equivalence classes (atoms with identical barcodes).
181 # np.unique(..., axis=0) finds the distinct rows across all barcodes
182 # that share the same shape, and counts how many times each barcode appears.
183 class_counts = []
184 for key, value in Gs.items():
185 _, class_count = np.unique(value, return_counts=True, axis=0)
186 class_counts.extend(class_count)
187
188 class_counts = np.array(class_counts)
189
190 # Convert raw counts to probabilities (must sum to 1).
191 probabilities = class_counts / class_counts.sum()
192
193 # Step 5 — Shannon entropy: BNE = -sum_i p_i * log(p_i).
194 entropy = -(probabilities * np.log(probabilities)).sum()
195
196 # Save results for this LAE size.
197 BNE_save_filename = f"{save_folder}/structure_{structure_idx}/entropy_number_{LE_nat}"
198 save_bne_data_to_files(BNE_save_filename, probabilities, entropy, LE_nat, filepath)
199
200
201# ---------------------------------------------------------------------------
202# Entry point
203# ---------------------------------------------------------------------------
204
205if __name__ == "__main__":
206
207 # Output directory (relative to examples/).
208 save_directory = "./data"
209 BNE_folder = "bond_network_entropy"
210 save_folder = f"{save_directory}/{BNE_folder}"
211
212 # Input structure: silica glass with 5184 atoms.
213 # Downloaded from https://www.pnas.org/doi/abs/10.1073/pnas.2422763122
214 structure_filename = "./data/structural/silica_glass_5184_atoms/POSCAR"
215 atomic_positions, lattice_vectors = obtain_positions_and_lattice_vectors(structure_filename)
216 n_atoms = len(atomic_positions)
217
218 # Compute nearest-neighbour distances for every atom.
219 # n_smallest=300 is well above the LAE sizes we compute (up to 80 atoms),
220 # so the pre-computed neighbour list is always large enough.
221 distances, idx_distances = obtain_distances_big_structures(atomic_positions, lattice_vectors, n_smallest=300)
222
223 # Build the adjacency matrix: two atoms are bonded if their distance is
224 # between 0.1 Å (avoids self-distance = 0) and the cutoff.
225 # 2.1 Å is the Si-O bond cutoff appropriate for silica glass.
226 cutoff = 2.1
227 adjacency_matrix = ((distances < cutoff) & (distances > 0.1)).astype(int)
228
229 # Label for this structure (used in the output directory name).
230 structure_idx = 0
231
232 # LAE sizes to compute — edit this list to change the sweep range.
233 # Pre-computed data covers 1–80; extend or restrict as needed.
234 local_environment_nat = np.arange(1, 81, 1).tolist()
235
236 calculate_barcode(distances, idx_distances, adjacency_matrix, n_atoms, save_folder, structure_idx, structure_filename, local_environment_nat)