IR Search FAQ
Introduction
Spartan Parallel Suite ships with the SSPD (Spartan Spectra and Properties Database) which includes more than 300,000 entries containing
IR spectra computed with the EDF2/6-31G* model. We refer to these IR spectra as the SIRD (Spartan Infrared Database). From the Databases dialogue (Search menu),
the SIRD tab accepts an unknown IR spectrum in JCAMP (.dx format) which is processed and matched against the entries in the SIRD.
The search returns a sorted list of possible matches.
Preproccessing the unknown spectrum.
The spectrum contained in an input JCAMP (.dx) file will vary
widely depending on its origin. The range and scale of its intensity
and wavenumber values will vary. Before it can be compared against
database entries, it must be converted into a form more compatible with
the spectra in the SIRD. The raw spectrum intensity data are normalized
to a range of [0,1], and are fitted a function of the form
$$U(\nu)=\sum_{i=1}^N{H_i \over 4({\nu-X_i \over W})^2+1}.$$
The terms in the sum are Lorentzian Peak Functions, with \(X_i =
Position\), \(H_i = Height\), and \(W = Full Width Half Height\).
The parameters \(X_i\)s,\(H_i\)s, and \(W\) are found using a nonlinear least squares solver ,lmfit, and are used in the
search steps that follow.
The original unknown spectrum is displayed in the plot window as a black trace, and the
resulting spectrum, \(U(\nu)\), is displayed as a blue trace.
This allows an easy visual comparison between the two.
Step1 Searching.
The entire set of SIRD spectra is searched in the first step.
Each entry is checked against any filters that may be set, and those that pass are matched against the unknown spectrum.
Entries failing a filter test will not be matched. See Applying Search Filters.
Entries that pass all filter tests are scored against the unknown
spectrum. This is done by taking each entry and generating a set of
spectra, \(C_j(\nu)\), using its parameters. The spectra, similar in
form to \(U(\nu)\) above are generated by varying FWHH and scaling the
\(H_i\)s. They are scored against the unknown spectrum using a sum of
absolute value difference measure. $$score_j=\sum_{i=1}^M|U(\nu_i) -
C_j(\nu_i)|.$$ The smallest \(score_j\) is retained and reported in the
"Score(step1)" column reported in the hit list table in the Spartan Infrared Database search
window. When all entries have been processed, a limited number of hits
with the best (smallest) scores is passed to step2. This number is set
by default to 500, but can be changed in the filters dialog using the
"Keep Following Pass 1" setting.
Step2 Searching.
Step2 is similar to Step1 except that the number of \(C_j(\nu)\) generated and scored is higher.
They are generated by varying FWWH and translating/scaling the \(H_i\)s.
Two scores are computed for entries in the short list.
$$score_j=\sum_{i=1}^Mmin(|U(\nu_i) - C_j(\nu_i)|,0.1).$$
$$score_j=\sum_{i=1}^Mmin(|U(\nu_i) - C_j(\nu_i)|,0.2).$$
The absolute value differences are clipped to 0.1 and 0.2.
Postprocessing matches.
The three scores from step1 and step2 are converted to rankings by
sorting each score. A hit's position in the sorted list is defined to
be its rank. The three rankings for each hit are sorted into ascending
order, and a final sort of the list is performed.
Applying Search Filters.
If you have additional insight about your unknown molecule, it is possible to improve search results by
using filters. Filtering is possible by functional group, substructure, and peak matching.
-
Functional Group:
As an example, if you believe your unknown contains a phenyl group, check the phenyl group filter, and
all database entries not containing the group will be skipped.
-
Substructure:
Skips all entries that don't contain your specified substructure.
-
Peak Matching:
The computed spectra in SIRD
are represented by sets of parameters similar to those found for the
unknown spectrum. The parameters from the unknown spectrum are compared
to those in each database entry. A proprietary algorithm is used to
pair \(X_i\)s from the two sets. If a "good" pairing is not found, the
database base entry is skipped.