IR Search FAQ

Introduction
The search steps.
Applying Search Filters.

Introduction

Spartan Parallel Suite ships with the SSPD (Spartan Spectra and Properties Database) which includes more than 300,000 entries containing IR spectra computed with the EDF2/6-31G* model. We refer to these IR spectra as the SIRD (Spartan Infrared Database). From the Databases dialogue (Search menu), the SIRD tab accepts an unknown IR spectrum in JCAMP (.dx format) which is processed and matched against the entries in the SIRD. The search returns a sorted list of possible matches.

Preproccessing the unknown spectrum.

The spectrum contained in an input JCAMP (.dx) file will vary widely depending on its origin. The range and scale of its intensity and wavenumber values will vary. Before it can be compared against database entries, it must be converted into a form more compatible with the spectra in the SIRD. The raw spectrum intensity data are normalized to a range of [0,1], and are fitted a function of the form $$U(\nu)=\sum_{i=1}^N{H_i \over 4({\nu-X_i \over W})^2+1}.$$

The terms in the sum are Lorentzian Peak Functions, with $X_i = Position$, $H_i = Height$, and $W = Full Width Half Height$.

The parameters $X_i$s,$H_i$s, and $W$ are found using a nonlinear least squares solver ,lmfit, and are used in the search steps that follow.

The original unknown spectrum is displayed in the plot window as a black trace, and the resulting spectrum, $U(\nu)$, is displayed as a blue trace. This allows an easy visual comparison between the two.

Step1 Searching.

The entire set of SIRD spectra is searched in the first step. Each entry is checked against any filters that may be set, and those that pass are matched against the unknown spectrum. Entries failing a filter test will not be matched. See Applying Search Filters. Entries that pass all filter tests are scored against the unknown spectrum. This is done by taking each entry and generating a set of spectra, $C_j(\nu)$, using its parameters. The spectra, similar in form to $U(\nu)$ above are generated by varying FWHH and scaling the $H_i$s. They are scored against the unknown spectrum using a sum of absolute value difference measure. $$score_j=\sum_{i=1}^M|U(\nu_i) - C_j(\nu_i)|.$$ The smallest $score_j$ is retained and reported in the "Score(step1)" column reported in the hit list table in the Spartan Infrared Database search window. When all entries have been processed, a limited number of hits with the best (smallest) scores is passed to step2. This number is set by default to 500, but can be changed in the filters dialog using the "Keep Following Pass 1" setting.

Step2 Searching.

Step2 is similar to Step1 except that the number of $C_j(\nu)$ generated and scored is higher. They are generated by varying FWWH and translating/scaling the $H_i$s. Two scores are computed for entries in the short list. $$score_j=\sum_{i=1}^Mmin(|U(\nu_i) - C_j(\nu_i)|,0.1).$$ $$score_j=\sum_{i=1}^Mmin(|U(\nu_i) - C_j(\nu_i)|,0.2).$$ The absolute value differences are clipped to 0.1 and 0.2.

Postprocessing matches.

The three scores from step1 and step2 are converted to rankings by sorting each score. A hit's position in the sorted list is defined to be its rank. The three rankings for each hit are sorted into ascending order, and a final sort of the list is performed.

Applying Search Filters.

If you have additional insight about your unknown molecule, it is possible to improve search results by using filters. Filtering is possible by functional group, substructure, and peak matching.

Functional Group:
As an example, if you believe your unknown contains a phenyl group, check the phenyl group filter, and all database entries not containing the group will be skipped.
Substructure:
Skips all entries that don't contain your specified substructure.
Peak Matching:
The computed spectra in SIRD are represented by sets of parameters similar to those found for the unknown spectrum. The parameters from the unknown spectrum are compared to those in each database entry. A proprietary algorithm is used to pair $X_i$s from the two sets. If a "good" pairing is not found, the database base entry is skipped.