Data fitting is an empirical process of finding the best numerical description for an experimental dataset.
In general form fitting involves two closely related parts:
- Identification of type of process, dependence, or function that adequately describes experimental observation
- Finding the set of coefficients (amplitudes, rates etc) that gives the best fit between data and chosen function.
Data fitting is highly interactive process where computer helps you to get better qualitative and quantitative understanding of the phenomenon that your are investigating. Data fitting is by no means an automatic analysis and in some implementations can morph in to a form of art.
On the other hand, superficial fitting analysis, especially without critical feedback, is often subtly or grossly misleading. Thorough knowledge of underlying processes and large amount of common sense are key virtues in fitting analysis.
In the nutshell of the fitting routine, program makes small changes to each parameters of function that you want to find and calculates the difference between computed and experimental data. This process is repeated multiple times until the difference (or residual) falls below your goal. While we often visualize data as graphs, it is important to remember that typical fitting is done point-by-point basis without regard to fit at adjacent points.
Fitting analysis of data is a four-step loop:
-
The first, and most crucial, step in fitting analysis is to identify type of process you are dealing with. Your knowledge of system that you are studying is absolutely paramount here. If you are making a titration curve you can expect response to depend linearly on concentration of reagent. If you are dealing with transient kinetics of a chemical reaction you can expect exponential changes in signal with time. Typically you would not know all details about the reaction (why would you spend time doing fit?) but you need to make a good guess about where to start from what you do know already. The more you know, the more restrictions you can place on what would be considered an acceptable outcome, which will increase accuracy of results if your choice of function was correct. See more on fitting functions.
The second step in fitting involves guessing correct initial coefficients and specifying desired accuracy. For built-in functions initial guesses are handled by Igor automatically. For more specialized, user functions Igor needs to know what may be bulk numbers for each parameter that you need to find. As in choosing dependence type, providing reasonable initial guesses is a critical contribution of the operator for the outcome of fitting. The range of reasonable values depends on many factors: type of function, number of parameters, dataset size etc. Given correct guess on function and initial values, setting reasonable accuracy mostly balances precision of the result against time it takes to find it, but there is limit of how closely you can get to ideal.
In the third step Igor performs iterative search for coefficients of chosen function that will gives best match with data. This step is where number-crunching ability of your computer comes really handy. During search Igor displays results of each iteration. Your contribution is limited to possible interruption of fitting process if you see that coefficients diverge from reasonable values.
-
The fourth, and last step involves critical analysis of the results. It includes, but is not limited to following:
- Examine the residual, i.e. the difference between computed and initial data. A residual that looks like random noise around zero line is a good indication of reasonable fit. Waiving or other predictable shapes of residual suggests that you may need to look for another function or revise your initial guesses. It is a good practice to show residual curve, often on expanded scale, whenever fitted curve is shown graphically.
- Compare fitted values of any known parameters of values from other sources or expected ranges. For example, you would not expect to see negative absolute absorption, or amplitudes of optical absorption more than that pure compound.
- Convergence to similar results from different initial values is a good measure of reliability of fitting.
- Compare related fitting results. Monotonic change in a parameter across series is an indication you have described this parameter properly.
At his point you need to make decision: is fit satisfactory or do you need to return to the start, revise function and/or guesses and repeat it? Keep in mind that by examining pattern of discrepancy between computed and initial data can often suggest more accurate formulation that an eye can pick in raw date. |