RGA - algorithm for black box  global optimization.

RGA  is the acronym, which stands for  Regularized Global Approximation

The RGA -algorithm for  black box global optimization  is  a tool for finding a global optimum/optima of a cost/ objective function  (OF)  of  n variables, subjected to general constraints: the goal is to get the result with  as few OF-calls as possible.
The  OF   is not assumed  differentiable  or  continuous *,  and the values which are returned by the OF  in response to relevant OF-calls can be corrupted with noise.

The algorithm is in fact a smooth nonlinear estimator  with  unsupervised leaning.
It  is a powerful tool for optimization problems governed with  expensive or/and defined in multi-dimensional spaces  OFs,  and  for estimation of (vector-valued)  functions or functionals,  defined on multi-parameter families of functions.
The RGA -algorithm  has no external adjustable  parameters,  all parameters are self- adapted.

• The RGA -algorithm allows one:
1. to get an approximation (f) of a function F of  n variables, i.e. to estimate the OF at  any point of  a given domain.
2.  to find extreme points (minima/maxima/global extremum)  of the function F in the given domain.
• The algorithm requires:
1.  to  indicate a finite domain of search/constrains
2.  to evaluate  the OF  on the set of n-D points, which are implied by the algorithm
all of intrinsic  points, which are representative for the OF.
• The algorithm is based upon:
Regularized Global Approximation of the OF ( RGA-algorithm).

Given the values of the OF  on an (initial) set of starting n-D points, a differentiable (regularized)approximation f of the function F  is constructed in the entire domain of search (global approximation).
The next step consists in expansion of the initial set of points by all the extreme points (minima and maxima) of the (differentiable) approximation f . The OF should be evaluated at these points. Now the expanded set of the OF-values constitutes  a base  for  constructing a new approximation.
So, throughout  its every step ('try')  the algorithm "enriches" the previous set of points and constructs a new approximation after the OF returns the values on the subset of new (intrinsic) points.
The approximate function f can be interpreted as prediction of the true value at any point of the domain of search.
The algorithm is terminated when the minimum of having been calculated values of the OF is reached namely at that point which has been predicted as the point of global minimum provided the value itself is well-predicted.

Block-schema of RGA-algorithm **.

• Starter: "Injection"  of a set of  random points {xk = 1, 2,..., K }An external routine/ controlled  process  (black box)   returns  the  values  (when noisy, two first  moments of the noise are assumed to be given) of an objective function {F(xk)}.
• PredictorApproximation ***  of F  by  [ Sk ( f (xk) - F(xk) )2 +  ( f (x), e - tL f (x)) ]  ->  min;   ( L is Laplacian , t  is  a parameter of regularization; we called the functional  "diffusion  regularization "  because of  { d t- L },  and  the relevant space   as "Sobolev  space of infinite order" - because of expansion of the  exponential  operator; the first term is a conventional misfit function in an Euclidean K-dimensional space; K is the total number  of  points {xk};  the second one is an inner  product in an infinite-dimensional Hilbert space with the weight  e - t ).
• Self-learning circuit: The last modification of the RGA has concerned  a parameter of regularization: instead of a scalar parameter of regularization  "t"   we introduced  a  tensor  ("anisotropic  smoothness" ). The tests of  the RGA  with Rosenbrock's function, which has a long curved valley, showed that the relevant estimate of the tensor is fairly reliable.    See also  the Lennard-Jones cluster folding.
• Corrector, and so forth:  As long as we get a differentiable linear estimate of  the function "f",  and the Green's function of e - tL   is so "pretty" = e tL,  it is easy to get estimated all necessary gradients/Hessians/etc at any point of interest, and thus,  starting from given points {xk} to find a set of  local  extrema  of  f.  Then x*= arg min {k+1} {F(xk)}  is the predicted  point of  global minimum.  Afterwards  it is necessary just to check  a reliability of the prediction ("stopping rule").

References

1. Gennady  A. Ryzhikov and Marina S. Biryulina, 1995,
3D nonlinear inversion by Entropy of Image Contrast (EnIC) optimization
2. Ryzhikov, G.A., Biryulina, M.S. and Keydar, S., 1995,
Analysis of 1D seismic waveform inversion by Regularized Global Approximation

* Sure, OF is assumed to be continuous/smooth in a small vicinity of a global minimum.
** Calligraphic F  stands  here  for  the approximate function  f
***   By no means  our  approximation  leads to  loss of accuracy: see, e.g.  protein folding.  Due to the use of  the  so called  "partly unbiased  estimation" we are dealing in fact with interpolation/extrapolation  of the function F.

to the next RGA-page   .

Геннадий Рыжиков