Quantitative microstructural characterisation with AstroEBSD

In this article we’ll run through the capabilities of AstroEBSD 2.0 for analysing simultaneously collected EBSD and EDS datasets. We will assume a degree of knowledge about these techniques. Our approach lets you understand the interplay between structure and chemistry within metallurgical precipitates, inclusions, and other features of engineering interest.

We will demonstrate (1) data loading, (2) principal component analysis dataset decompositions, and (3) refined template matching for phase-ID. For (3) you will need Bruker DynamicS EBSP simulations.

We have set up a (hopefully easy to use) PCA workflow deck, so users can input their settings, filepaths to data, etc, and just press go to generate results.

The following sections describe our pipeline, and each will be considered in detail:

Step 1: Get AstroEBSD.

Step 1.5: Decide whether you’re going to do template matching.

Step 2: Load your data and run the code.

Step 3: Check your outputs.

Step 4: Observe and analyse trends.

The workflow is summarised in Figure 1, the graphical abstract of [1], reproduced here:

Figure 1: Our PCA analysis pipeline [1], using tools presented here and made available open source with AstroEBSD.

Step 1: Get AstroEBSD.

Simple — the latest version of AstroEBSD is available at http://github.com/benjaminbritton/AstroEBSD. This tutorial is written using version 2.0.

Step 1.5: Decide whether you’re going to do template matching.

The full functionality of our PCA pipeline uses refined template matching (RTM) to identify the crystal structure of extracted characteristic electron backscatter patterns (EBSPs). If you don’t have access to Bruker DynamicS simulations, or you just don’t want to run the RTM, parts of the following workflow will be redundant. These are marked with a (*). If so, please ignore these steps.

If you do wish to run RTM there are a few preliminary things you need to set up. We will assume you’ve generated .bin files containing the simulations using DynamicS. For example, Ni_1024.bin.

  1. Move the simulation file (Ni_1024.bin) to /phases/masterpatterns.
  2. Ensure a corresponding .cif file (Ni.cif) is present in /phases/cifs.
  3. Create an AstroEBSD .pha file (Ni.pha) and move it to /phases/phasefiles. You can copy a default one from the repository, for this tutorial the ONLY part that you need to edit are the following lines:
Ni.cif <---- edit this line to your .cif file's name
Ni_1024.bin <---- edit this line to your .bin file's name
0 <---- toggle this on/off

The software knows where to look for the .cif files and simulations, but doesn’t know what they’re called. The .pha file collates this information. A nuance of DynamicS also means that we need to specify whether or not the structure has hexagonal symmetry.

As above, if you do not wish to run RTM you can ignore this step. But you will need to specify some redundant settings. To reitereate, if this is the case the steps you can ignore are marked as (*). The end of an RTM-specific section is marked (/*).

Step 2: Load your data and run the code.

Our pipeline is built for handling Bruker exported .h5 files. We use the Bruker-supplied bcf2hdf5 converter application. An example datasets is available at https://zenodo.org/record/3737987#.X19Eti3MxQI, and will be used in this tutorial. If your data is in another format (and isn’t encrypted / you can actually read it), you could build your own data loader, and import into the same data structure as our workflow expects, or you could generate a .h5 file with the same hierarchical structure as our example. There’s an example of this approach in the deck: /decks/EBSD_2019_Astro2.m.

Our pipeline is built for handling Bruker exported .h5 files. We use the Bruker-supplied bcf2hdf5 converter application. An example datasets is available at https://zenodo.org/record/3737987#.X19Eti3MxQI, and will be used in this tutorial. If your data is in another format (and isn’t encrypted / you can actually read it), you could build your own data loader, and import into the same data structure as our workflow expects, or you could generate a .h5 file with the same hierarchical structure as our example.

Open the PCA example deck: /decks/PCA/PCA_deck.m.

Firstly, specify where your data is:


If you specify more than one file in the cell array, the software will loop through them and analyse each in turn with the same settings. This syntax would be given as:

files={'SuperalloyExample.h5', 'SomeOtherData.h5'};

InputUser.SavingFolder is then where your results will be saved.

There are a lot of settings for various aspects of our workflow. We’ve organised them into Critical (which all users will need to change), Medium (which provide advanced data sensitivities and options), and Low level (which few users will ever need to adjust). Not all of these will be discussed here, some require a read of Experimental Micromechanics group papers (eg. [1] for PCA EBSD/EDS and [2] for RTM) for a full explanation.

Critical settings:

(*) Specify the phases you wish to analyse with RTM. These should correspond to the .pha file names. If you are not using RTM, this is an unused variable.



Decide whether you wish to load and run the analysis on just EBSD, EDS, or both data types. These are binary 1=yes, 0=no indicator variables.


Next, assign an optional (standard deviation) weighting to the data-types (ie. EBSD and EDS parts of the data matrix). This can bias your decomposition to favour EBSD (structural) information, or EDS (chemical) information. Read more about this in [1]. A high number > 1 for EBSD weighting will push the analysis to decomposing (and calculating PCs) on the EBSD data, with EDS information coincidental. This leads to a smaller effective spatial resolution in phase separation. A small number << 1 will perform a decomposition on the EDS information, with EBSD effectively coincidental. This leads to a higher effective spatial resolution, due to larger interaction volume of the EDS signals. A well chosen intermediate value can improve cross-correlation quality between simulated template and measured EBSP. A good default value or starting point is just 1.


If you provide a vector of (float) values to PCA_Setup.EBSD_weighting, e.g. [1, 0.1, 0.001] our software will loop through them.

Figure 2[1] demonstrates geometrically how this weighting influences the calculation of principal components.

Figure 2: The calculation of PCs can be biased towards specific variable dimensions, by relative adjustment of their covariances. This reduces to linearly scaling EBSD measurements relative to EDS measurements.

Another possibly useful feature of AstroEBSD is the introduction of a spatial weighting kernel to measurements. While the above description of adjusting relative variable importance acts on the EBSD/EDS measurement axes of our dataset, without further adjustment all measurements are by default totally independent and there is no influence of locality on the analysis. In other words, you could completely reorder the measurements, perform our analysis, then reorder them back to the original scan grid and you would achieve exactly the same result. It can be useful to not throw away this information, and introduce local structure / dependence. It is more likely that spatially close measurements are similar. The effect of this on a complete analysis can be seen in Figure 3:

Figure 3: The effect of changing our spatial weighting kernel radius on outputs. Stuff gets blurred together as you increase r.

If you wish to do this with AstroEBSD, adjust the following lines:


The first line toggles on the spatial kernel, and the radius corresponds to how many data points around a test data point will be allowed to influence it. Specifically, for a PCA_Setup.KernelRadius of 1, a 3-by-3 Gaussian kernel is generated and convolved (stride 1) with the data grid. The kernel function can be adjusted in the lower level settings.

(*) Finally, decide whether you’d like to use AstroEBSD’s RTM-based pattern centre (PC) refinement tool:


If toggled on, a GUI will appear for you to select points at which RTM will find the best PC. This gets mapped across the area of interest for unseen points. You need some a priori knowledge though — you’ll need to tell it which phase the scan points are (for example, an FCC Ni matrix). This number should correspond to the position of the phase in InputUser.Phases cell array.


And that’s all you need to get started. At the end of this article we will discuss further some useful Medium and Low-level settings, but for now let’s take a look at what the software does and its outputs.

Step 3: Check the outputs.

Once the code has run, and it will take significantly longer if you ran with RTM, results are saved where you specified in InputUser.SavingFolder. There will be a folder with a descriptive name, for example: PCA_Output/SuperalloyExample_EBSD+EDS/weighting1_vt0,1_Guo_rad3. A folder with your sample name and _EBSD+EDS is created, then another folder with your settings is generated within. ‘Guo’ refers to the spatial kernel [3], and ‘vt’ refers to the variance tolerance limit (more on these in the Medium level settings description later). If you ran the code with more than one sample, weighting, etc then the results will be saved seperately.

Within this folder you’ll have a Results.mat file, containing the outputs. Also printed as .png files will be the Bruker-exported Radon peak height (image quality) map and the SEM image. You’ll also have a rotated characteristic component (RCC) assignment map, shown below in Figure 4. As discussed in our paper, RCCs are VARIMAX rotated principal components.

Figure 4: Rotated characteristic component assignment map for the SuperalloyExample.h5 dataset. It shows the regions of the dataset (split into tiles) that have been identified as sufficiently dissimilar to warrant a new component.

This may look a bit messy, but in this case we have oversampled the dataset with more than the bare minimum principal components / RCCs (ie. the variance tolerance is too relaxed), so in the top left tile the same region is represented by red, pink, and purple, for example. For memory reasons we by default analyse these datasets (you can change this in the Medium level settings) in tiles, which introduces the rectangular artefacts.

(*) If you ran RTM, the phase map and Euler angles will also be calculated. The phase index and colouring corresponds to the position in the list InputUser.Phases. The phase map is presented here in Figure 5, the quality metric (cross-correlation peak height) in Figure 6, and a map of the first Euler angle in Figure 7.

Figure 5: RTM phase assignment map.
Figure 6: RTM quality (cross-correlation function peak height) map.
Figure 7: Euler 1 map (ZXZ).


If you’d like details on how each scan point was influenced by each of the RCCs, take a look in the corresponding tiles’ folders. This contains .png images of each of the factors (here the 6th of the first tile).

Figure 8: 6th RCC of the first tile, for the SuperalloyExample dataset.

All of this information is also contained numerically in Results.mat, for you to analyse, plot, and use however you’d like. That’s it for the non-RTM stuff. To go ahead with Step 4 and beyond, you’ll need this functionality working and set up.

(*) Step 4: Observe and analyse trends.

This section requires RTM to be working, and your simulation libraries set up in the right place (see Step 1.5).

Compare RCC EBSPs to simulations.

Sometimes it can be extremely useful to check by eye how RTM has assigned phase. For example, and this is something we have seen before, a piece of charging dirt may be throwing off the cross-correlation by matching to a bright zone axis, with a faint band present in a runner-up simulation and the RCC pattern that isn’t in the pattern with ‘best’ cross-correlation (implying a misidentification). To inspect the ‘best’ template orientations for each candidate phase, open up /decks/PCA/PatternComparison.m. Load up into memory your Results.mat (if it isn’t already), using:


Specify the mtex and AstroEBSD filepaths, then:


InputPats shouldn’t need to be changed in most cases, as tile.pat will contain your RCC EBSPs. Specify the component you’d like to compare (a number can be selected by plotting the assignment map, as in Figure 4), and choose a subplot height and width. The script will generate the following image (and if you specified patcomp.print to be true, will save it to the current directory):

Figure 9: Pattern comparison for the SuperalloyExample dataset, using just three candidate phases.

In this case we have compared an FCC Ni pattern, which is reflected in the pattern comparison. More complicated examples using this functionality are included in [4], for example.

Plot IPFs and investigate chemistry as f(structure).

Using the correlated structure / chemistry features in the RCCs, extracted using PCA, we can investigate how microstructural chemistry varies across regions of differing structure or crystallography.

You may have noticed that in a PCA run’s results folder are two sub-directories: AverageSpectra and RCSpectra. Our analysis prints the inferred RCSpectra to a specific file format, .spx, that can be re-read into Bruker Esprit. We use this to chemically quantify the spectra in a pretty standard way. We do this for both RCC averages (AverageSpectra) and the RCCs themselves (RCSpectra).

Figure 10: Format of results.xlsx.

We then export the at.% quantification to results.xlsx, presented in Figure 10. You could use any other way of generating such a quantification table, so long as it keeps the format above, is called results.xlsx, and is placed into the AverageSpectra / RCSpectra folder.

We have built another deck specifically for interpreting these chemistries alongside the RTM inferred structures, found at /decks/PCA/PostAnalysis.m.

Specify the directory locations (as with previously described decks) and then list the datasets you’d like to look into:


In this case, the script is set up to analyse all datasets in list with settings as described in namesetting. The reason it’s set up this way by default, is you may have analysed SuperalloyExample with many different settings options, and perhaps an Example2 with just a handful, but only want to analyse further a specific settings set for both (a convoluted situation I found myself in…).

You can then detail what you’d like to analyse:

post.EBSD=1; %plot IPFs and colour keyspost.RCSpectra=1; %plot RC spectra quant
post.RCSpectra_std=1; %include standard deviation lines on the radar plot
post.AvSpectra=1; %plot average spectra quant
post.AvSpectra_std=1; %include standard deviation lines on the radar plot

If you just want to plot IPF maps, you can leave post.EBSD on and turn all the others off. If you’d like radar plots / bar charts of chemistry as a function of structure, turn on post.RCSpectra and post.AvSpectra to do this for each spectrum type (we generally find these to be near-identical). The settings post.AvSpectra_std and post.RCSpectra_std describe whether to include dotted lines for one standard deviation on the radar plot.

Run the code and you’ll get IPF maps (e.g. Figure 11), bar charts of chemistries (Figure 12) and radar plots of chemistries vs structure (Figure 13).

Figure 11: IPFZ for the SuperalloyExample dataset.
Figure 12: Bar charts of chemistry by phase for the SuperalloyExample dataset.
Figure 13: Radar plots of chemistry by phase for the SuperalloyExample dataset.

So that’s a run through of some capabilities of AstroEBSD for quantitative microstructural analysis. Most of the work presented in this article was conducted as part of my (Tom McAuliffe’s) PhD, written up in [1]. The RTM approach was developed as part of Alex Foden’s PhD, and is written up in [2].


[1] T. P. McAuliffe, A. Foden, C. Bilsland, D. Daskalaki-Mountanou, D. Dye, and T. B. Britton, “Advancing characterisation with statistics from correlative electron diffraction and X-ray spectroscopy, in the scanning electron microscope,” Ultramicroscopy, no. c, p. 112944, Jan. 2020.

[2] A. Foden, D. M. Collins, A. J. Wilkinson, and T. B. Britton, “Indexing electron backscatter diffraction patterns with a refined template matching approach,” Ultramicroscopy, vol. 207, p. 112845, Dec. 2019.

[3] R. Guo, M. Ahn, and H. Zhu, “Spatially Weighted Principal Component Analysis for Imaging Classification,” Journal of Computational and Graphical Statistics, vol. 24, p274–296, Mar. 2015.

[4] T. Dessolier, T. P. McAuliffe, W. J. Hamer, C. G. M. Hermse, and T.B. Britton, “Effect of high temperature service on the complex through-wall microstructure of centrifugally cast HP40 reformer tube,” ArXiv, 2020 (under review).

Appendix: Extra PCA settings.

There are a few more nuances to the types of decompositions you can do using AstroEBSD. These are included in the Medium and Low level settings. A few highlights that may be of particular interest include:

Medium level settings

Printing: decide whether to display and save figures.

PCA_Setup.crop_factor: What should the tiling of the total area of interest be? Eg. 3 gives 3-by-3 tiling. This has significant implications on memory usage.

PCA_Setup.variance_tolerance: in %, the ‘best’ way of choosing number of copmonents to retain. 0.1 is standard, if you want better spatial resolution at the cost of possible oversampling this can be reduced. If you want to group more stuff together (ie. think you’re already oversampling), raise it to 0.2 or 0.3. If you want to force a specific number of components (applied to every tile) set this in PCA_Setup.components. … Read [1] for help with choosing this.

RTM settings are as discussed in the RTM paper. RTM_setup.Screensize is (square) resolution of the EBSPs for cross-correlation, reduce this to save memory (esp. wrt to the template library).

Low level settings

These only have to be adjusted fairly infrequently. Of particular note are:

RTM_setup.Sampling_Freq is the SO3 angular spacing of template EBSPs. Increase this to save memory and processing time, but 7 or 8˚ is standard.

There are also settings for the spatial kernel function that can be toggled on/off in the critical settings. This includes a label to append to the directory name and an inline function describing how to calculate the kernel. A Gaussian kernel after Guo et al [3] is included by default. You can check the kernel weightings by running the GenerateKernel function.

PCA_Setup.KernelFnLabel='Guo'; %for labelling
PCA_Setup.KernelFunction = @(distance,r) ((1 - (distance)./r).^2).^2;
%check with k=GenerateKernel(PCA_Setup.KernelFunction,PCA_Setup.KernelRadius);

Finally, the lowest level includes AstroEBSD background corrections as required. Most of these you can leave as-is, but there are options in there for hot pixel correction, square cropping, mean-centring, and many others which users may want to adjust.

ML researcher | Microscopy PhD | Engineer