Materials swelling revealed through automated semantic segmentation of cavities in electron microscopy images

Scientific Reports volume 13, Article number: 5178 (2023) Cite this article

786 Accesses

1 Citations

Metrics details

Accurately quantifying swelling of alloys that have undergone irradiation is essential for understanding alloy performance in a nuclear reactor and critical for the safe and reliable operation of reactor facilities. However, typical practice is for radiation-induced defects in electron microscopy images of alloys to be manually quantified by domain-expert researchers. Here, we employ an end-to-end deep learning approach using the Mask Regional Convolutional Neural Network (Mask R-CNN) model to detect and quantify nanoscale cavities in irradiated alloys. We have assembled a database of labeled cavity images which includes 400 images, > 34 k discrete cavities, and numerous alloy compositions and irradiation conditions. We have evaluated both statistical (precision, recall, and F1 scores) and materials property-centric (cavity size, density, and swelling) metrics of model performance, and performed targeted analysis of materials swelling assessments. We find our model gives assessments of material swelling with an average (standard deviation) swelling mean absolute error based on random leave-out cross-validation of 0.30 (0.03) percent swelling. This result demonstrates our approach can accurately provide swelling metrics on a per-image and per-condition basis, which can provide helpful insight into material design (e.g., alloy refinement) and impact of service conditions (e.g., temperature, irradiation dose) on swelling. Finally, we find there are cases of test images with poor statistical metrics, but small errors in swelling, pointing to the need for moving beyond traditional classification-based metrics to evaluate object detection models in the context of materials domain applications.

Metal alloys used in nuclear reactor cores and surrounding structures undergo irradiation, causing damage to the material which can result in the production of extended defects such as dislocation loops, precipitates, and cavities (sometimes called voids when they do not contain gas or bubbles when they do contain gas) that, in turn, have a deleterious impact on the mechanical properties via hardening, embrittlement and swelling1,2,3,4,5. Bias-driven growth of cavities leading to unconstrained swelling under neutron irradiation generally occurs via the presence of helium (produced from nuclear transmutation) that stabilizes the cavities3,6. Significant swelling can result in material degradation and failure, hence, understanding the interplay of alloy composition, microstructure, and reactor conditions such as operating temperature and irradiation dose are important for informing safe and reliable reactor operation7. Bulk measurement methods of reactor components, such as the Archimedes method, are typically easiest to conduct to obtain information on the total volumetric swelling response of a material8. However, Transmission and Scanning Transmission Electron Microscopy (S/TEM) methods are also commonly employed in materials research and development evaluations for ex situ characterization of alloy microstructure and swelling quantification. TEM methods have an advantage over bulk measurement methods as they enable one to obtain the strict swelling response from the presence of cavities, eliminating swelling contributions from other factors such as creep, secondary phase formation, and phase densification at high temperature.

TEM analysis can also be used to identify swelling responses locally, e.g., as is seen during ion irradiations or in complex microstructures due to localized microstructural effects on the helium and defect formation energetics and kinetics. Finally, TEM analysis can be used to help understand early-stage irradiation response, e.g., the nucleation and growth process of cavities, which initiates before significant macroscopic swelling has occurred. Such microscale characterization thus enables detailed mechanistic understanding important for the design of swelling resistant alloys, and enables researchers to understand linkages between material microstructure, composition, and swelling response as a function of key operational variables such as temperature, irradiation type (e.g., neutron vs. ion), dose rate, and total dose9. This information is in turn useful for informing materials modeling of swelling in different regimes (i.e., incubation, transient, and steady state swelling) and can help inform operational limits of a material in a nuclear reactor5.

It is important to note that quantification of swelling from S/TEM analysis reveals the perceived swelling of the material, which may differ from its actual swelling, e.g., due to effects arising from the TEM imaging conditions and sample themselves10,11. For example, multi-slice simulation results have shown a dramatic change in the perceived size of a cavity with increasing underfocus using Fresnel contrast imaging and the imaging fringe used as the size reference11. This occurs due to the phase shift caused by the difference of the mean inner potential between the cavity and encapsulating crystal12. Changes in the microscope configuration such as the accelerating voltage and object lens focus, as well as material factors such as composition will alter the degree of phase shift and thus the localized contrast at and near a cavity as captured by the multi-slice simulations.

The current approach for determination of swelling via TEM analysis is to exploit the physics of contrast formation via Fresnel contrast imaging at small under focus conditions (< − 1 μm), and by knowing the focusing condition an inference can be made on the true size and thus actual swelling of cavity features in materials. By restricting to relatively small under focus, the characteristic white centroid and dark fringe contrast of cavities due to the phase shift can be observed while the degree of displacement of the black fringe is maintained sufficiently low so that a robust measurement of cavity size can be developed. Given this, a small offset is known and more significant for cavities with diameters less than 10 nm10. The errors arising from complex imaging factors due to the Fresnel contrast imaging then compounds with other factors such as imaging resolution and human accuracy in placing the physical or digital measuring device. It is even further compounded by other factors such as sample tilt, where background contrast can vary depending on the tilt used and the magnification(s) used. The result is high quality quantification of TEM perceived swelling requires detailed consideration of the imaging and material factors and correlations to actual swelling are strongly correlated, with variances reducing with increasing cavity size (e.g., swelling in the material), but deconvoluting the correlation requires knowledge of the variances and the imaging conditions that form the perceived swelling quantification and is essential for understanding the true swelling response of a material.

At present, swelling quantification from TEM samples is typically performed by considering a handful of TEM images and manually counting and measuring individual cavities in each image, for example using image analysis programs such as ImageJ13. This approach typically treats relatively small sample sizes due to (1) the time and resource-intensive nature of TEM sample preparation and (2) the cavity labeling and counting analysis. Regarding the first issue, recent advances in TEM sample preparation, including high-throughput focused ion beam (FIB) methods (e.g., plasma FIB) and flash polishing, can be used to generate an extensive library of TEM samples14,15. Therefore, sample preparation limitations are rapidly being overcome. We note also that modern TEM instruments have undergone exponential growth in data acquisition rates with the development of new detector technologies, resulting in higher resolution images and larger overall data sizes16,17,18,19. Therefore, it is clear that manual labeling and measurement of cavities will not be able to keep pace with the scaling of TEM dataset sizes. Thus, the second issue above is rapidly becoming the bottleneck in scaling up image-based analysis capabilities. An automated method that can quickly analyze large TEM datasets, automatically detect and quantify cavities, and then assess material swelling would enable researchers to evaluate many more areas of interest on a given sample, providing more robust statistics, quantification of effects of heterogeneity, and in-depth evaluations of cavity properties and material swelling.

In the past decade, deep learning methods have witnessed significant advancement. They have resulted in revolutionary changes to the field of computer vision. Specifically, in the context of object detection, deep convolutional neural networks (CNNs) such as ResNet50, ResNet101 and VGG16 are used to extract detailed underlying feature sets from tens of thousands of images in canonical databases such as ImageNet20 and Common Objects in Context (CoCo)21. These so-called “backbone” networks are implemented in CNN-based object detection frameworks such as the Faster Regional Convolutional Neural Network22 (R-CNN) and Mask R-CNN models23, which contain additional neural networks that suggest regions of interest in the image and classify and segment individual objects within each region of interest24,25. There has been a growing body of work applying object detection methods to electron microscopy images in materials science26, with applications ranging from detecting various defects (e.g., dislocations, precipitates, black dot defects) in irradiated metal alloys27,28,29 to quantifying micro and nanoparticles30,31 and finding individual atoms in high-resolution STEM images32,33. Most relevant to the present work, Anderson et al. used the Faster R-CNN model to detect cavities in Ni-based X-750 alloys34. Their Faster R-CNN model effectively found cavities, with reported F1 scores in the range of 0.7–0.8. Because the Faster R-CNN model does not provide pixel-level segmentation information, additional post-processing methods separate from the deep learning model were used to extract the cavity size information from the predicted bounding boxes. The present work employs the Mask R-CNN model to realize a fully end-to-end deep learning cavity detector. We include the publicly available data used in the work of Anderson et al. from the Canadian Nuclear Laboratory (CNL), which we refer to as the CNL dataset in this work, and significantly expand the previously available cavity image database to include images comprising a greater range of alloy compositions and irradiation conditions by including new images from the Nuclear Oriented Materials & Examination (NOME) Laboratory at the University of Michigan, which we refer to as the NOME dataset in this work (see section "Data and methods" for more information). The inclusion of the NOME database, which significantly expands the training data to include a wide range of materials and imaging conditions then enables the evaluation of CNNs to perform image quantification automation across a broad feature domain within a single feature class (cavities) compared to the prior works which performed evaluations on a single feature domain. Two examples of images from each of the CNL and NOME datasets are shown in Fig. 1. These particular images were chosen as they qualitatively show the large differences in cavity sizes, densities, and physical appearance between underfocused and overfocused images in these databases. In Fig. 1, “ground truth” denotes the domain-expert(s) cavity annotations used to train and assess the performance of our model. More information on the image databases and their labeling can be found in section "Data and methods". Note, the images in Fig. 1 represent images taken within individual experiments of imaging cavities, and additional imaging conditions such as increasing/decreasing magnification and image focus could improve the overall capability to visualize cavities during these experiments but are not presented or considered here.”

Example raw images (left column) with ground truth cavities labeled (middle column) and corresponding Mask R-CNN model predictions (right column). (A) CNL overfocused image with F1 = 0.82. (B) CNL underfocused image with F1 = 0.63. (C) NOME overfocused image with F1 = 0.81. (D) NOME underfocused image with F1 = 0.83. The width of the images shown in (A), (B), (C) and (D) are approximately 195 nm, 768 nm, 315 nm, and 168 nm, respectively.

There are many possible ways of assessing a segmentation machine learning model for defects. One level is how the model performs as a classification algorithm, which can be done for any object classified by the model. A typical model provides classification for pixels (in or out of the defect), defects (found or not found), and defect types (for cases with multiple defect types). Such classification performance is generally characterized by metrics such as precision (P), recall (R), accuracy, and F1 (harmonic mean of precision and recall) scores. A second level of assessment is how the model performs for defect properties, which might include basic properties (e.g., size distribution, mean size, density, shape, position, etc.) and evolutions or correlations associated with those basic properties (e.g., growth rate, diffusivity, pair distribution function, etc.). A third level of assessment is materials properties, which for irradiated alloys are generally swelling or hardening predictions based on physical models and properties of the observed defects. Assessments like those just listed can generally be done with different groupings of the data, e.g., for a fixed area, on a per image basis, or for a specific set of images. Also, since assessments are generally done on left-out test data, those test data sets can be generated by different methods, the most common being choosing them at random (e.g., k-fold cross-validation) or removing specific groups of data with select properties to represent likely use cases for the model. In this work, we focus assessment on classification scores for finding defects, defect size distribution and density, and material swelling. We do this on both a per-image basis and averaged over multiple images. Together, these assessments explore the accuracy of the model for the information typically utilized by the radiation effects community.

Throughout this work, we focus our model assessment on its ability to assess material swelling and investigate the primary sources of error in material swelling. Here, we first benchmark the performance of Mask R-CNN models trained and tested on different random subsets of our complete CNL + NOME cavity database (see section "Data and methods"). Evaluation with random cross-validation forms a baseline for how well the model is expected to perform on test images that, at least qualitatively, are drawn from the same domain as the training set. Figure 2 contains a parity plot comparing model predicted vs. true values of average per-image material swelling for five different train/test splits of the CNL + NOME database. Key fit statistics of coefficient of determination (R2), mean absolute error (MAE), mean absolute percentage error (MAPE), root mean squared error (RMSE), and RMSE divided by the true dataset standard deviation (reduced RMSE, RMSE/σ) are included. A summary of the key classification metrics and materials property metrics for each split, together with the average and standard deviation across all five splits, can be found in SI Note 1. Regarding material swelling in Fig. 2, across the five splits examined, the average MAE is 0.30 percent swelling with a standard deviation of 0.03 percent swelling. The best split was the CNL + NOME initial split with an MAE = 0.26 percent swelling, while the worst split was CV split 1 with an MAE = 0.35 percent swelling. In addition to assessments of material swelling, we have also provided a detailed examination of model assessments of per-image average cavity size and cavity areal density, which is provided in Figure S1 of SI Note 1. The model can assess the average per-image cavity size with high accuracy, with an average (standard deviation) MAE of just 1.02 (0.14) nm, which corresponds to an average (standard deviation) MAPE value of 8.94% (0.84%) error in cavity size, which is a similar error level as our previous work28. Our model has the highest errors in assessing cavity density, particularly for images with high cavity densities (> 20 × 10−4 nm−2), where the model has a clear bias to lower values. The interplay of cavity size and density with regard to swelling assessments is discussed in section "Understanding model errors of swelling assessment". Overall, the Mask R-CNN model can assess the material swelling well with a typical mean absolute error of about 0.30 percent swelling, which is a small enough error for the model to discern changes in swelling responses based on material design (e.g., alloy refinement) and service conditions (e.g., temperature, dpa) and thus readily provides an accelerated means to assess these factors in TEM-based swelling quantification workflows.

Parity plot of true and predicted material swelling. The different symbols correspond to different cross validation train/test splits. The fit statistics in black text denote the average + / − standard deviation across all five splits for each metric.

From the above discussion, the model trained on our complete CNL + NOME dataset yielded accurate assessments of material swelling for randomly left-out test images, constituting a test of model performance on images drawn qualitatively from the same domain as the training set. A more demanding test of the ability of our Mask R-CNN models to assess material swelling involves testing the model on images quite distinct from those encountered in training by the use of leave-out group cross validation. While there are many ways one can leave out physically motivated groups of data, here we focus on the practical scenario of applying our trained Mask R-CNN to cavity images belonging to a distinct dataset from that used in training. To do this, we train a model solely on the CNL data, use it to predict CNL and NOME test data, and compare it to the model trained on the combined CNL + NOME dataset from section "Benchmarking model performance of assessing material swelling". Likewise, we also train a model only on the NOME data, and use it to predict CNL and NOME test data.

Figure 3 contains parity plots of material swelling assessment for our leave out group cross validation test. A detailed summary of the materials property statistics (cavity size, density, and swelling values) for the tests shown in Fig. 3 can be found in SI Note 2. In Fig. 3A, the model is trained on only CNL data and is used to predict swelling on CNL test images (blue points) and NOME test images (red points). The CNL and NOME test points are separated based on whether the test images correspond to overfocused (circle symbols) or underfocused (triangle symbols) imaging conditions where the different conditions invert the contrast modulation of the cavities present in the material. In Fig. 3A, we see that the model trained on CNL images demonstrates good assessment of material swelling on the CNL image test set with an MAE of 0.40 percent swelling. The model performs better on underfocused images compared to overfocused images from the standpoint of MAE, where the swelling MAE values on underfocused (overfocused) images are 0.33 (0.53) percent swelling, respectively (see SI Note 2). The improved performance on underfocused images is likely due to their being more underfocused versus overfocused cavities in the CNL database. A similar response was observed in our previous work using Mask R-CNN to detect dislocation loops in FeCrAl alloys, where our learning curves showed best model performance on the defect types present in highest quantity in the training data28. In Fig. 3A, we can also see that the model trained on CNL data performs poorly on the NOME test set. While at first glance the MAE value of 0.66 percent swelling on the NOME test set does not appear much worse than the MAE of 0.40 percent swelling on the CNL test set, the range of swelling values for the NOME data are much smaller, and the higher error, in this case, is better exemplified by inspecting the MAPE value of about 215% for testing on NOME vs. just under 20% for testing on CNL, as well as the reduced RMSE value which is much higher (lower) than unity for the NOME (CNL) test set.

Parity plots assessing Mask R-CNN per-image performance of predicting materials swelling. (A) CNL initial split, with model trained on CNL and tested on CNL (blue data) and trained on CNL and tested on NOME (red data). (B) NOME initial split, with model trained on NOME and tested on CNL (blue data), and trained on NOME and tested on NOME (red data). In both plots, the circle and triangle points denote overfocused and underfocused images, respectively, and the color-coded fit statistics coincide with the corresponding set of points of like color.

In Fig. 3B, we perform the test case where the model is trained only on the NOME data and separately tested on the CNL and NOME test sets. The model trained and tested on NOME data shows an excellent overall ability to assess swelling, with an MAE of just 0.15 percent swelling (MAPE = 37.97%). In contrast, the model performs poorly on assessments of the CNL test set, with large swelling MAE (MAPE) values of 1.98 percent swelling (76.25%), respectively, and essentially no ability to assess swelling of samples with true swelling values greater than about 1.5 percent swelling. This result makes sense through the lens of model applicability domain. While the NOME dataset constitutes a more diverse set of alloy compositions and irradiation conditions, the swelling present in the NOME images has a maximum of about 2.5 percent swelling (with all but one test image having less than 1.5 percent swelling), in contrast to the large swellings of some CNL images of up to nearly 7 percent swelling.

We reiterate that by training a model which uses both the CNL and NOME data (Fig. 2 and Figure S2 in SI Note 2), the model provides an accurate assessment of material swelling both on the separate CNL (MAE = 0.44 percent swelling) and NOME (MAE = 0.15 percent swelling) test sets, and collectively shows an MAE of 0.26 percent swelling. The model trained on CNL and NOME data shows virtually unchanged performance on each test subset compared to individually trained models shown in Fig. 3 and Figure S2, indicating that the combined model has a larger applicability domain. The combined CNL + NOME model shows approximately identical performance predicting swelling of overfocused (0.26 percent swelling) vs. underfocused (0.27 percent swelling) images, though from the standpoint of MAPE the model performs better on underfocused images (29.03%) compared to overfocused images (39.13%) (see Table S2 in SI Note 2). In addition to the materials property statistics summarized here, we have collected the classification statistics of overall P, R, and F1 scores and average per-image P, R, and F1 scores for the tests discussed above. We find that the conclusions regarding model performance in the context of material swelling generally persist when considering the overall and average per-image F1 scores. Additional discussion of the classification metrics and a table of their values can be found in SI Note 2. Overall, our results demonstrate that it is preferable to simply train one model with training images from both datasets, as the model domain is widened without loss in classification or materials property metric performance within any given single dataset.

Here, we seek to better understand the source of error in the model swelling assessments. Based on the equation to calculate material swelling (Eq. 1, see section "Data and methods"), it is intuitive that cavity size (cubic scaling) has a larger impact than cavity density (linear scaling) to determine the swelling (see SI Note 3 for a visualization of this fact using our present database). Given the detailed data obtained from the Mask R-CNN model output, we show this effect in practice and quantify potential problematic areas of model use more precisely. Figure 4A shows the relationship between the true per-image cavity size and the model error in the cavity density. In Fig. 4A the sizes of the data points scale with the model error in the swelling. What we learn from Fig. 4A is that the images with the highest density errors are those with small cavities, at least on average. The small sizes of the points with high density errors indicate that these images with poor density assessments also have minor swelling errors. From the standpoint of desiring a model which produces accurate swelling assessments, the fact that at times the model shows poor assessments of cavity density are not necessarily concerning, as the poor density assessments coincide with small swelling errors, at least for the images analyzed in our present database. It is worth noting that our model is largely unbiased with regard to cavity size predictions (see Figure S1A in SI Note 1), biased to underpredict cavity densities (see Figure S1B in SI Note 1), resulting in essentially no bias in the swelling errors (see Fig. 2), which is due to the fact that small cavities (in absolute units) have a small impact on the swelling values, and tend to be the cavities that are undercounted in the density predictions. In Fig. 4B, we plot the average absolute swelling error as a function of the true per-image cavity size, binned based on ranges of cavity sizes. The sizes of the points in Fig. 4B correspond to the number of test images contained in each cavity size bin. The error bars denote the standard error in the mean of the average absolute swelling error in each cavity size bin. As an example, to obtain the first data point of the 0–5 nm binned NOME data, the sizes of the red square points in Fig. 4A that are between 0–5 nm on the y-axis are averaged to obtain the average absolute swelling error in Fig. 4B, the error bar is the standard error in the mean of those same points, and the size of the point in Fig. 4B scales with the number of data points in the 0–5 nm size bin (note this is why larger points tend to have smaller error bars). In Fig. 4B, we can see that the CNL (NOME) images with average cavity sizes greater than 10 nm (15 nm) have higher average swelling errors than the overall MAE of 0.3 percent swelling from random cross validation. Taken together, the analysis shown in Fig. 4 points to images with large cavities being the most susceptible to high swelling errors, with errors potentially twice as high as that obtained from our random cross validation test. As a further piece of analysis, in Figure S4 in SI Note 3 we have additional plots like that shown in Fig. 4B, except we plot the average absolute swelling error as a function of the (binned) true swelling, for the cases of all test images together as well as split out by CNL and NOME subsets. This analysis indicates we have smaller (larger) absolute swelling errors (percentage swelling errors) when the true swelling is small (e.g., average swelling error of 0.13% and percentage error of 33.0% for true swelling < 1%) and larger (smaller) absolute swelling errors (percentage swelling errors) when the true swelling is large (e.g., average swelling error of 0.60% and percentage error of 16.0% for true swelling > 2%). Overall, across all test images in our database, our model shows average absolute swelling errors (percentage swelling errors) of about 0.3% (25%).

(A) Relationship between the true per-image cavity size and the model error in assessing the corresponding cavity density. Each data point represents one test image, where the blue circles and red squares denote CNL and NOME test images, respectively. The size of the data points scales with model error of the percent swelling. (B) The trend of model predicted absolute error in material swelling as a function of true average cavity size. Here, the x-axis represents binned values of true cavity size (i.e., groups of test images based on their range in true cavity sizes from the y-axis of the plot (A). The blue circles and red squares denote groups of CNL and NOME test images, respectively. The size of the points scales with the number of test images comprising the true average cavity size bin. The size legends denote the minimum, average, and maximum for the respective data trace. The error bars are the standard error in the mean of the absolute swelling error.

Throughout this work, much of our discussion of model predicative quality has focused on assessments at the per-image level. However, it is also important to assess how well the model predicts the distribution of cavity sizes across all the images, which is representative of the averaged distributions researchers might typically extract from a series of images for a given experiment or series of experiments. Figure 5A shows the true (in blue) and model predicted (in green) size distributions of all cavities in our set of test images. The number of true cavities is 20,597 and the number of predicted cavities is 18,169. The model correctly reproduces the qualitative shape of the distribution, including the bimodal behavior and critical bubble size (e.g., the valley between the two peaks in Fig. 5A), which are key indicators of the underlying physics of bias-driven cavity growth35. Quantitative distribution metrics like mean, median, standard deviation, skew, and kurtosis are all very well-predicted by the model. This result shows that the overall cavity size distribution is accurate enough not only for quantitative perceived swelling predictions, but also for qualitative understanding of underlying physics mechanisms and even and quantitative constraints on physics-based models of cavity evolution.

Plots of the full cavity size distribution showing the true cavity size distribution (blue) and predicted cavity size distribution (green) across all left-out test images. (A) Distributions of cavity sizes using physical units of nm. (B) Distributions of cavity sizes using reduced units corresponding to the fraction of the image size.

From Fig. 5A, we see that the regions where the model undercounts the cavity sizes the most lie in the 2.5–5 nm and 10–15 nm ranges. To better understand the nature of these errors, we plot the cavity size distribution in units of fraction of the image size in Fig. 5B. As expected, Fig. 5B no longer shows the bimodal size distribution, because the cavity sizes no longer have physical units and the database has a large collection of magnifications. What we can see from Fig. 5B is essentially all of the cavities missed by the model occur for cases where the cavity spans about 3% or less of the image. We believe the source of this error is twofold. First, cavities that span only 3% or less of the image tend to be comprised of a limited number of pixels, e.g., roughly 30 pixels in width or fewer, and are thus difficult for the model to positively discern from the image background. Second, we have found from our own data labeling exercises that it is difficult even for domain expert human labelers to discern whether features of this size are in fact cavities or not. This second source of error we believe is particularly interesting as it introduces an uncertainty or bias in the ground truth labels which can then propagate through the model training and resulting predictions. Better understanding of the uncertainty in ground truth labels and its impact on model performance is beyond the scope of this work but worth additional study.

The above discussion focused on correlating average values of cavity size with errors in cavity density and swelling, and assessing the ability of our model to predict the full distribution of cavity sizes. We have shown that our model can both well-reproduce the bimodal nature of the perceived cavity size distribution in real size units along with key metrics like distribution average, median, standard deviation, etc., and provide accurate per-image average cavity sizes. As a final piece of analysis, we examine two specific images in greater detail to better understand the impact of the entire cavity size distribution on the swelling error. These particular cases were test images from the CNL + NOME initial split case. We examine two extreme cases for this analysis: an underfocused NOME test image named “10 59 K.png” which showed the smallest swelling error of just 0.004 percent swelling, and an underfocused CNL test image name “02.jpg” which showed the largest swelling error of 1.46 percent swelling. The ground truth and model predicted cavity labels are shown in Fig. 6. As a first remark, the NOME image with the best swelling assessment showed a low F1 score of just 0.50, while the CNL image with the worst swelling assessment showed a high F1 score of 0.90. This finding points to the importance of evaluating materials property-centric metrics in addition to, or as a substitute for, conventional classification-based metrics for cases where the model is being evaluated for use in a specific materials domain application. Next, the cavity size distributions (represented here as fraction of the image size) shown in Fig. 6 highlight that the poor F1 of the NOME image is the result of the model missing many small cavities spanning about 2% or less of the image size (for this particular image, this amounts to cavities about 5 nm in size), while for the CNL image a handful of large cavities are missing or found cavities were predicted to have underestimated sizes. For the NOME image in Fig. 6A, the small swelling error is the result of a slight overestimation of the average cavity size, driven mainly by the prediction of a single large cavity with a size of about 5% of the image, as shown by the cumulative swelling contributions overlaid with the size histogram. In Fig. 6B, the poor swelling assessments are driven by the model missing the largest cavities in the image, and the over-representation of small predicted cavities does not make up for the underpredicted swelling. Overall, the extreme examples presented in Fig. 6 show that, at times, our model can show a good swelling assessment that is the result of error cancellation (Fig. 6A—misses many small cavities but has one large false positive cavity) and our model can have a poor swelling assessment that is the result of a combination of errors (Fig. 6B—predicts too many small cavities and misses some large cavities). However, we reiterate that when evaluating the numerous images comprising our complete test set, our model shows good assessments of material swelling on average.

Plots of the cavity size (given here as fraction of total image size) histogram distribution and cumulative swelling contributions for two cases: (A) NOME image named “10 59 K.png”, which is an underfocused image where the model predicted a low F1 score of 0.50 and a swelling error of only 0.004 percent swelling. (B) CNL image named “02.jpg”, which is an underfocused image where the model predicted a high F1 score of 0.90 and a large swelling error of 1.46 percent swelling. For the histograms in each panel, the blue and green bars denote the number of instances of true and predicted cavities in each size bin, respectively, and the dashed blue and green lines denote the fraction of swelling (normalized to the total true swelling value) one would obtain by calculating swelling using the respective cavity size distributions. The percentage error in predicted swelling for (A) and (B) corresponds to 5.4% and 27.0%, respectively. The width of the images shown in (A) and (B) are approximately 377 nm and 195 nm, respectively.

In this work, we used an end-to-end deep learning approach based on the Mask R-CNN model to detect and characterize nanoscale cavities in irradiated metal alloy TEM micrographs. We have assembled a database of labeled cavity images which includes 400 images and > 34 k cavities, with a domain encompassing an array of alloy compositions and irradiation conditions. We evaluated the performance of our Mask R-CNN models using a set of canonical classification-based metrics (overall and per-image precision, recall, and F1 scores) and materials domain-specific metrics of cavity size, cavity density, and swelling assessments. Given the importance of accurately characterizing swelling in irradiated alloys for their use as materials in nuclear reactor components, we particularly emphasized assessments of material swelling. Our model provides material swelling assessments with an average (standard deviation) swelling mean absolute error based on random leave-out cross validation of 0.30 (0.03) percent swelling, demonstrating good assessment ability of swelling with sufficiently small error to provide useful insight for new alloy design. We investigated the source of our swelling errors in greater detail, with three related findings of interest:

The model can occasionally have poor assessments of cavity density, but these poor density assessments always coincided (at least for the images evaluated here) with small swelling errors as the missed cavities were all small (e.g., cavities which span about 3% or less of the image size), indicating that poor cavity density assessments are not necessarily a worrisome sign for model performance.

Canonical classification-based metrics can sometimes paint a misleading picture of how well a model may perform for a specific materials-domain application. For example, we analyzed two extreme cases of test images with low (high) F1 scores which, in turn, ended up displaying very low (high) swelling errors, indicating that, like with point (1) above, missing many cavities is not necessarily an issue, assuming they are small.

Directly related with the above points, which is given that swelling scales with the cube of cavity sizes, it is essential to capture the sizes of large cavities accurately. While this is obvious from inspection of Eq. 1, we showed how this effect can manifest in practice, where even test images with small average cavity size errors may show larger-than-desired errors in swelling, where in some cases errors in the full cavity size distribution, at least as it relates to accurately assessing swelling, are mainly the result of errors in cavity sizes of about 15 nm or larger.

Given the key findings enumerated above, in order to obtain the most reliable model results, we recommend prospective users to evaluate our model on new TEM images obtained in underfocus conditions, and for which cavities are imaged at a magnification such that they span at least 3% of the image width or height, in order to maximize the likelihood they are detected and quantified correctly. Although the present results are very promising, the inability to reliably assess new types of cavity data, the errors on small cavity detection, and the swelling errors introduced for some large cavities are all still concerns. Some or all of these issues may be overcome with more data and more careful cross-checking of ground truth labels. For example, present work is ongoing to quantify and mitigate human bias in ground truth labeling, where the issue of the ML model missing small cavities is, at least in part, a consequence of inconsistent ground truth labeling of these small cavities and can be potentially reduced by aggregating results of labels obtained from a large number of labelers. However, obtaining and annotating new TEM images of irradiated samples is very time-consuming, particularly if one also needs to conduct the irradiation experiments before imaging. We believe a potentially fruitful area of future research is to include synthetic training data, which can augment existing experimental databases to expand the model training domain to include different size distributions, focusing and imaging conditions, and noise levels to improve model training. One avenue for creating synthetic data is to use generative models such as Generative Adversarial Networks (GANs). However, the main downside of using GANs is their reliance on an initial set of training images of cavities. A different method that doesn’t rely on an initial seed of training data is a physics-based simulation of cavities. Our initial work in this space combined simulated cavities onto experimental images containing real cavities to improve object detection model training36, and work is ongoing to address challenges of how to best integrate synthetic cavities with background TEM images and comprehensively evaluate object detection model performance with the addition of synthetic cavity data.

To encourage future studies of object detection and quantification in this space, we have made our full database of images and their associated ground truth annotations publicly available (see “Data and Code availability” section). In addition, we have provided a Python notebook tailored for running on the free GPU resources provided on Google Colab, to easily provide inference and basic analysis of material swelling on user-provided test images. Finally, our model is also hosted on DLHub37, which is part of the Foundry for data, models and science38. This infrastructure enables inference on new images using only two lines of python code. We have also included a notebook which can be used to call our model from Foundry (see Data and Code availability section). The Mask R-CNN model used for this tool was trained on the complete CNL + NOME database of 400 images to create the most accurate present model for detecting cavities on new images. Provided a new test image, the notebook saves the image with the model-specified cavity segmentations overlaid, together with a spreadsheet containing the bounding box, segmentation, and calculated size of each cavity in the image, along with the computed cavity density and swelling. We hope that tools such as these assist researchers and new users alike in the short term by creating a reduced barrier to using object detection tools. In the longer term, we hope to facilitate the generation of a broader community base of standardized (experimental and synthetic) image data and associated object detection models for the goal of creating state-of-the-art models able to accurately detect cavities and quantify vital materials properties such as swelling for a range of alloy compositions, irradiation doses, and imaging conditions.

In this work, two datasets were used to train and test the performance of our Mask R-CNN object detection model. Both datasets consist of TEM images of irradiated metal alloys. Objects of interest for detection and quantification are cavities, which generally appear as spherical and faceted shapes in the microstructure with contrast consistent with a region devoid of matrix material. Both datasets consist of underfocused and overfocused images of cavities. When the image is overfocused, the cavity appears as a disk surrounded by a bright ring, whereas when the image is underfocused, the cavity appears as a disk surrounded by a dark ring. In general, the ground truth labels of both datasets include both the disk and fringe contrast of the cavities in either the overfocused and underfocused cavities, respectively. In the case of accurate sizing, it is intended that further refinement using post analysis algorithms could be applied, such as contrast peak identification scheme, to accurately size the cavities based on the fringe contrast as recommended11,39, but such effort was not within the scope of the data collection and generation. The result is the database and size information enables accurate relative or perceived swelling via determination of the localized size of the cavity but some inherent error is present in the presented size and subsequent swelling data and thus the terminology of perceived swelling is adopted. Note, comparisons of ground truth and modelling labelling size are still accurate in relative means as the model predicts size in the same manner as the human-developed database. It is worth noting that all ML model assessments in this work include instances in the database of overlapping cavities in the image which may be counted as a single cavity. In the present work, we do not make an effort to quantify the statistics of overlapping cavities. However, based on our knowledge of the datasets, instances of overlapping cavities are generally a rare occurrence, where we estimate that less than 5% of the total cavities in the database are overlapping. Given additional data of overlapping cavity instances, the present model may be further improved.

The first dataset consists of bright-field TEM micrographs obtained and labeled by the Canadian Nuclear Laboratory (CNL), which we refer to as the CNL dataset throughout this work. The images were obtained from reactor spacer springs of commercial nuclear reactors in the Canada Deuterium Uranium (CANDU) reactor fleet40, and consist of both overfocused and underfocused images of cavities in Inconel X-750 Ni alloys which have undergone neutron irradiation. The reactor spacer springs used to obtain the CNL images were in reactor service for 14 years, with a damage dose of 30 displacements per atom (dpa). Additional details of the sample preparation, TEM imaging, and cavity annotation are described in the work of Anderson et al.34 Summary information of the number of overfocused and underfocused images, and the corresponding number of overfocused and underfocused cavities for the CNL dataset is summarized in SI Note 4. We note here that in the work of Anderson et al., it is stated that a total of 253 images comprise the database, where 230 images were used for training and 23 were reserved for testing their Faster R-CNN model. However, from the publicly available data linked in their paper, the available training set consists of 224 images and the testing set contains 19 images (243 total images). Further, when inspecting the provided annotations for all images, it was found that for 5 images, the annotations did not coincide with the cavities present on the image. Rather than re-annotating these images, we simply removed them from our present CNL database used in this work, yielding a total of 238 images. (Note the names of the 5 removed images are: 59_01.jpg, 59_02.jpg, 59_03.jpg, 59_04.jpg, 63_01.jpg). While 68% of the present CNL database consists of underfocused images, a large majority (about 83%) of the cavities are underfocused, resulting in a class imbalance where the database is significantly biased toward underfocused cavities.

The second dataset consists of TEM micrographs obtained and labeled by us as part of the Nuclear Oriented Materials & Examination (NOME) Laboratory at the University of Michigan, which we refer to as the NOME dataset throughout this work. These images were obtained through a wide variety of collaborations and professional contacts within the field. They consist of both overfocused and underfocused images. The materials compositions covered by these images are highly varied, including samples comprised of CW-316, T91, HT9, and 800H steel alloys. The irradiation undergone by each sample was also highly diverse and includes both damage received by light and heavy-ion as well as neutron bombardment, with total doses of up to 100 dpa. For annotating these images, a team of undergraduate student researchers were first trained by a domain expert to label images by practicing on several pre-labeled images not part of the NOME database. As mentioned above, the ground truth labels include the outer bright and dark rings for the overfocused and underfocused cavities, respectively. Feedback on their labeling was provided until results approximated those obtained by expert researchers. Once trained, the undergraduate team labeled the entire NOME database. The labels of each NOME database were corrected by a graduate student researcher (Matthew Lynch) and checked by a post-doctoral researcher (Priyam Patki) to form the final set of annotations. All labeling was done using the VGG Image Annotator (VIA) web tool41. The labeled NOME database comprises 162 images, as detailed in SI Note 4. Like the CNL database, the NOME database is significantly biased toward underfocused cavities, with about 75% of the total cavities coming from underfocused images. In order to assess different aspects of the model, 7 different splits of our combined CNL + NOME dataset were used to train and test the ability of our Mask R-CNN models to detect and quantify cavities, as detailed in SI Note 4. We note here that all of the images and annotations for the CNL and NOME datasets have been made publicly available on Figshare (see Data and Code availability section).

We use the Mask R-CNN object detection model to detect and quantify cavities in this work, as implemented in the Detectron2 package (PyTorch backend). The Detectron2 package was developed by the Facebook AI Research (FAIR) team42. Detectron2 is freely available and enables the implementation of many object detection models, such as Faster R-CNN22, Mask R-CNN23, and Cascade R-CNN43. These object detection models have been pre-trained on either the ImageNet20 or Microsoft COCO21 (Common Objects in Context) image databases, enabling the use of the transfer learning technique. When using transfer learning, the model backbone weights are frozen to those obtained from the previous ImageNet or Microsoft COCO image training, save for a small number of terminal layers (2 throughout this work). The Mask R-CNN input configuration was the same as that used in our previous work of detecting and quantifying dislocation loops and black dot defects in FeCrAl alloys28, except here we adjusted the candidate anchor box sizes to be 4, 8, 16, 32, 64, 128, and 256 pixels to enable the model to better detect small cavities. We note here that input files in the Detectron2 package typically use candidate anchor box sizes that are powers of 2, so we follow that practice and also include the small anchor box sizes of 4 and 8 pixels in an effort to better detect small cavities, as some of the images examined in this work contain cavities that are on this length scale.

This work evaluates our model using both classification-centric and materials property-centric metrics. For our classification metrics, we focus on the model P, R and F1 scores. Since we have only a single prediction category (i.e., cavities), the precision is calculated by dividing the number of found defects by the number of predicted defects, and the recall is calculated by dividing the number of found defects by the number of true defects. We evaluate P, R and F1 scores both on a per-image basis, from which we can obtain average per-image P, R and F1 scores, and we evaluate the so-called overall P, R and F1 scores, which is a single calculation using the total numbers of true, predicted and found cavities for the entire test set. For the materials property metrics, we calculate size distributions of predicted cavities for every test image, but focus our evaluation on comparing the true vs. predicted per-image average cavity size, true vs. predicted per-image cavity density (obtained by counting the number of true and predicted cavities in an image and dividing by the image area), and true vs. predicted per-image swelling. The swelling \(\frac{\Delta V}{V}\) of an image (expressed as percent swelling) is calculated following the work of Jiao et al.9:

where A is the area of the image, δ is the sample thickness, di is the cavity diameter, and N is the number of cavities in the image. Due to the lack of per-image sample thickness data, we have assumed that every image has a thickness of 100 nm. The cavity diameter is calculated as twice its radius, where the cavity radius is defined as the square root of the product of the minimum and maximum distances from the center of the cavity mask.

When evaluating the performance of object detection models like Mask R-CNN, there are two key hyperparameters to choose from, namely the intersection-over-union (IoU) threshold value, and the objectness score. The IoU threshold determines the cutoff between ground truth and predicted bounding boxes to determine when a cavity can be considered found in the correct position, and the objectness score is a measure of the model confidence that a predicted region corresponds to a cavity, and thus impacts the total number of predicted cavities. The method to match the true and predicted cavities based on IoU is the same as that employed in our previous work28. We provide a brief summary of this approach here. When evaluating an image, there is a list of true defect masks and predicted defect masks. To decide whether a defect has been found in the correct location, the IoU of every predicted defect is calculated for each true defect, and the defect with the highest IoU score is considered the best possible match. The IoU values are calculated using the bounding boxes obtained from the region proposal network. If this computed IoU score is above the designated threshold, this predicted defect is considered to be found. Each true defect can only be found one time, so if multiple predicted defects are found to pass the IoU threshold with a particular true defect, the predicted defect with the highest IoU score is considered the found defect, and the other defect(s) would then be considered false positives. The hyperparameters will be determined using the CNL + NOME initial split by evaluating the overall F1 score as a function of IoU threshold and objectness score, and by evaluating the error in predicted swelling as a function of objectness score (see Figure S5 in SI Note 5). This data split was chosen for hyperparameter optimization as it contains a representative and random subset of the full CNL + NOME image dataset examined in this work.

The datasets generated during and/or analyzed during the current study are available on Figshare (https://doi.org/10.6084/m9.figshare.20063117). The trained model on the full database of all CNL and NOME images, a Google Colab notebook and associated python scripts to make predictions on new images and save the associated data is also available on Figshare (https://doi.org/10.6084/m9.figshare.20063117). In addition, we have hosted the final trained model on DLHub, which is part of the Foundry for data, models and science. A notebook to use the hosted model on Foundry is also provided in the above Figshare repository. A small subset of the images (≈3%) are omitted from the public database due to protected rights of these images. Access to the omitted images and corresponding labels can be obtained through request with the corresponding author.

Seeger, A., Diehl, J., Mader, S. & Rebstock, H. Work-hardening and work-softening of face-centred cubic metal crystals. Philos. Mag. 2, 323–350 (1957).

Article ADS CAS Google Scholar

Zheng, C. et al. Microstructure response of ferritic/martensitic steel HT9 after neutron irradiation: Effect of dose. J. Nucl. Mater. 523, 421–433 (2019).

Article ADS CAS Google Scholar

Zhang, H. K., Yao, Z., Judge, C. & Griffiths, M. Microstructural evolution of CANDU spacer material Inconel X-750 under in situ ion irradiation. J. Nucl. Mater. 443, 49–58 (2013).

Article ADS CAS Google Scholar

Porter, D. L. & Garner, F. A. Irradiation creep and embrittlement behavior of AISI 316 stainless steel at very high neutron fluences. J. Nucl. Mater. 159, 114–121 (1988).

Article ADS CAS Google Scholar

Garner, F. A. Recent insights on the swelling and creep of irradiated austenitic alloys. J. Nucl. Mater. 122, 459–471 (1984).

Article ADS CAS Google Scholar

Snoeck, E., Majimel, J., Ruault, M. O. & Hÿtch, M. J. Characterization of helium bubble size and faceting by electron holography. J. Appl. Phys. 100, 66 (2006).

Article Google Scholar

Morgan, D. et al. Machine learning in nuclear materials research. Curr. Opin. Solid State Mater. Sci. 26, 100975 (2022).

Article ADS CAS Google Scholar

Cockeram, B. V., Smith, R. W., Hashimoto, N. & Snead, L. L. The swelling, microstructure, and hardening of wrought LCAC, TZM, and ODS molybdenum following neutron irradiation. J. Nucl. Mater. 418, 121–136 (2011).

Article ADS CAS Google Scholar

Jiao, Z. et al. Microstructure evolution of T91 irradiated in the BOR60 fast reactor. J. Nucl. Mater. 504, 122–134 (2018).

Article ADS CAS Google Scholar

Foreman, A. J. E., von Harrach, H. S. & Saldin, D. K. The TEM contrast of faceted voids. Philos. Mag. A 45, 625–645 (1981).

Article ADS Google Scholar

Yao, B., Edwards, D. J., Kurtz, R. J., Odette, G. R. & Yamamoto, T. Multislice simulation of transmission electron microscopy imaging of helium bubbles in Fe. J. Electron Microsc. 61, 393–400 (2012).

Article CAS Google Scholar

Ruhle, M. R. Transmission Electron Microscopy of Radiation-Induced Defects (1971). https://doi.org/10.2172/4027809.

Schneider, C. A., Rasband, W. S. & Eliceiri, K. W. NIH Image to ImageJ: 25 years of image analysis. Nat. Methods 9, 671–675 (2012).

Article CAS PubMed PubMed Central Google Scholar

Giannuzzi, L. A., Drown, J. L., Brown, S. R., Irwin, R. B. & Stevie, F. A. Applications of the FIB lift-out technique for TEM specimen preparation. Microsc. Res. Technol. 41, 285–290 (1998).

3.0.CO;2-Q" data-track-action="article reference" href="https://doi.org/10.1002%2F%28SICI%291097-0029%2819980515%2941%3A4%3C285%3A%3AAID-JEMT1%3E3.0.CO%3B2-Q" aria-label="Article reference 14" data-doi="10.1002/(SICI)1097-0029(19980515)41:43.0.CO;2-Q">Article CAS Google Scholar

Schemer-Kohrn, A., Toloczko, M. B., Zhu, Y., Wang, J. & Edwards, D. J. Removal of FIB damage using flash electropolishing for artifact-free TEM foils. Microsc. Microanal. 25, 1606–1607 (2019).

Article ADS Google Scholar

Spurgeon, S. R. et al. Towards data-driven next-generation transmission electron microscopy. Nat. Mater. 20, 274–279 (2021).

Article ADS CAS PubMed PubMed Central Google Scholar

Jiang, Y. et al. Electron ptychography of 2D materials to deep sub-ångström resolution. Nature 559, 343–349 (2018).

Article ADS CAS PubMed Google Scholar

Chatterjee, D. et al. An ultrafast direct electron camera for 4D STEM. Microsc. Microanal. 27, 1004–1006 (2021).

Article Google Scholar

Ophus, C. Four-dimensional scanning transmission electron microscopy (4D-STEM): From scanning nanodiffraction to ptychography and beyond. Microsc. Microanal. 66, 563–582. https://doi.org/10.1017/S1431927619000497 (2019).

Article ADS CAS Google Scholar

Deng, J. et al. ImageNet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2009).

Lin, T.-Y. et al. Microsoft COCO: Common objects in context. In: European Conference on Computer Vision (ECCV) 740–755 (2014).

Ren, S., He, K., Girshick, R. & Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149 (2017).

Article PubMed Google Scholar

He, K., Gkioxari, G., Dollar, P. & Girshick, R. Mask R-CNN. In: International Conference on Computer Vision (ICCV) (2017).

Liu, L. et al. Deep learning for generic object detection: A survey. Int. J. Comput. Vis. 128, 261–318 (2020).

Article MATH Google Scholar

Zhao, Z. Q., Zheng, P., Xu, S. T. & Wu, X. Object detection with deep learning: A review. IEEE Trans. Neural Netw. Learn. Syst. 30, 3212–3232 (2019).

Article PubMed Google Scholar

Jacobs, R. Deep learning object detection in materials science: Current state and future directions. Comput. Mater. Sci. 211, 111527. https://doi.org/10.1016/j.commatsci.2022.111527 (2022).

Article CAS Google Scholar

Roberts, G. et al. Deep learning for semantic segmentation of defects in advanced STEM images of steels. Sci. Rep. 9, 66 (2019).

Article Google Scholar

Jacobs, R. et al. Performance and limitations of deep learning semantic segmentation of multiple defects in transmission electron micrographs. Cell Rep. Phys. Sci. 66, 100876. https://doi.org/10.1016/j.xcrp.2022.100876 (2022).

Article CAS Google Scholar

Shen, M. et al. Multi defect detection and analysis of electron microscopy images with deep learning. Comput. Mater. Sci. 199, 110576 (2021).

Article CAS Google Scholar

Cohn, R. et al. Instance segmentation for direct measurements of satellites in metal powders and automated microstructural characterization from image data. JOM 73, 2159–2172 (2021).

Article ADS Google Scholar

Groschner, C. K., Choi, C. & Scott, M. C. Machine learning pipeline for segmentation and defect identification from high-resolution transmission electron microscopy data. Microsc. Microanal. 27, 549–556 (2021).

Article ADS CAS Google Scholar

Ziatdinov, M. et al. Deep learning of atomically resolved scanning transmission electron microscopy images: Chemical identification and tracking local transformations. ACS Nano 11, 12742–12752 (2017).

Article CAS PubMed Google Scholar

Ge, M. & Xin, H. L. Deep learning based atom segmentation and noise and missing-wedge reduction for electron tomography. Microsc. Microanal. 24, 504–505 (2018).

Article ADS Google Scholar

Anderson, C. M., Klein, J., Rajakumar, H., Judge, C. D. & Béland, L. K. Automated detection of helium bubbles in irradiated X-750. Ultramicroscopy 217, 113068 (2020).

Article CAS PubMed Google Scholar

Mansur, L. K. & Coghlan, W. A. Mechanisms of helium interaction with radiation effects in metals and alloys: A review. J. Nucl. Mater. 119, 1–25 (1983).

Article ADS CAS Google Scholar

Field, K. G. et al. Development and deployment of automated machine learning detection in electron microcopy experiments. Microsc. Microanal. 27, 2136–2137 (2021).

Article Google Scholar

Chard, R. et al. DLHub: Model and data serving for science. In: Proceedings—2019 IEEE 33rd International Parallel and Distributed Processing Symposium, IPDPS 2019 283–292 (2019). https://doi.org/10.1109/IPDPS.2019.00038.

Chicago, U. of & Wisconsin-Madison, U. of. Foundry Materials Informatics Environment. https://ai-materials-and-chemistry.gitbook.io/foundry/v/docs/ (2021).

Jenkins, M. L. & Kirk, M. A. Characterisation of Radiation Damage by Transmission Electron Microscopy (Taylor & Francis Group, 2000). https://doi.org/10.1201/9781420034646.

Zhang, H. K., Yao, Z., Morin, G. & Griffiths, M. TEM characterization of in-reactor neutron irradiated CANDU spacer material Inconel X-750. J. Nucl. Mater. 451, 88–96 (2014).

Article ADS CAS Google Scholar

Dutta, A., Gupta, A. & Zisserman, A. VGG Image Annotator (VIA). https://www.robots.ox.ac.uk/~vgg/software/via/ (2020).

Wu, Y., Kirillov, A., Massa, F., Lo, W.-Y. & Girshick, R. Detectron2. https://github.com/facebookresearch/detectron2 (2019).

Cai, Z. & Vasconcelos, N. Cascade R-CNN: Delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 6154–6162 (2018).

Towns, J. et al. XSEDE: Accelerating scientific discovery. Comput. Sci. Eng. 120, 4–5 (2014).

Google Scholar

Download references

This work was funded by the Electric Power Research Institute (EPRI) under award number 10012138. This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation Grant Number ACI-1548562. Specifically, it used the Bridges-2 system through allocation TG-DMR090023, which is supported by NSF award number ACI-1928147, at the Pittsburgh Supercomputing Center (PSC) [52].

Department of Materials Science and Engineering, University of Wisconsin-Madison, Madison, WI, 53706, USA

Ryan Jacobs & Dane Morgan

Nuclear Engineering and Radiological Sciences, University of Michigan - Ann Arbor, Ann Arbor, MI, 48109, USA

Priyam Patki, Matthew J. Lynch, Steven Chen & Kevin G. Field

You can also search for this author in PubMed Google Scholar

K.G.F. and D.M. conceived and managed the project. R.J. performed all analysis, wrote the first version of the manuscript and reviewed the manuscript. P.P., M.L. and S.C. prepared the dataset and performed additional analysis. All authors reviewed the manuscript.

Correspondence to Ryan Jacobs.

The authors declare no competing interests.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and Permissions

Jacobs, R., Patki, P., Lynch, M.J. et al. Materials swelling revealed through automated semantic segmentation of cavities in electron microscopy images. Sci Rep 13, 5178 (2023). https://doi.org/10.1038/s41598-023-32454-2

Download citation

Received: 14 November 2022

Accepted: 28 March 2023

Published: 30 March 2023

DOI: https://doi.org/10.1038/s41598-023-32454-2

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

News

Materials swelling revealed through automated semantic segmentation of cavities in electron microscopy images