DR error measures: Difference between revisions

From Xenharmonic Reference
 
(79 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{technical}}
{{technical}}
This article will describe several '''least-squares error measures for [[delta-rational chord]]s'''. They have the advantage of not fixing a particular interval in the chord when constructing the chord of best fit. However, like any other numerical measure of concordance or error, you should take them with a grain of salt.
This article will describe several '''least-squares error measures for [[delta-rational chord]]s'''. The idea behind least-squares error measures is to find the chord with the exact specified delta signature such that it deviates as little as possible from the chord that is meant to approximate that delta signature, and to measure the deviation. Least-squares error measures have the advantage of not fixing a particular interval in the chord when constructing the chord of best fit. However, like any other numerical measure of concordance or error, you should take them with a grain of salt.


== Conventions and introduction ==
== Conventions and introduction ==
The idea motivating least-squares error measures on a chord as an approximation to a given delta signature is the following (for simplicity, let’s talk about the fully DR case first):
The idea motivating least-squares error measures on a chord as an approximation to a given delta signature is the following:


We want the error of a chord 1:''r''<sub>1</sub>:''r''<sub>2</sub>:...:''r''<sub>''n''</sub> (in increasing order), with ''n'' &gt; 1, in the linear domain as an approximation to a fully delta-rational chord with signature +δ<sub>1</sub> +δ<sub>2</sub> ... +δ<sub>''n''</sub>, i.e. a chord
We want the error of a chord 1:''r''<sub>1</sub>:''r''<sub>2</sub>:...:''r''<sub>''n''</sub> (in increasing order; we also take ''r''<sub>0</sub> = 1), with ''n'' &gt; 1, in the linear domain as an approximation to a delta-rational chord with signature +δ<sub>1</sub> +δ<sub>2</sub> ... +δ<sub>''n''</sub> (possibly with some deltas free), i.e. the target chord is


<math>x : x + \delta_1 : \cdots : x + \sum_{l=1}^n \delta_l</math>
<math>x : x + \delta_1 : \cdots : x + \sum_{l=1}^n \delta_l</math>


with root real-valued harmonic ''x''. Let <math>D_0 = 0, D_i = \sum_{k=1}^i \delta_k</math> be the delta signature +δ<sub>1</sub> +δ<sub>2</sub> ... +δ<sub>''n''</sub> written cumulatively.
with some root real-valued harmonic ''x''. Let <math>D_0 = 0, D_i = \sum_{k=1}^i \delta_k</math> be the delta signature +δ<sub>1</sub> +δ<sub>2</sub> ... +δ<sub>''n''</sub> written cumulatively.


We want to measure the error ''without having to fix any dyad'' (as one might naively fix a dyad and measure errors in the other deltas). To do this we solve a least-squares error problem: use a root-sum-square error function and optimize ''x'' to minimize that function.
We want to measure the error ''without having to fix any interval in the target chord'' (as one might naively fix an interval and measure errors in the other deltas in relation to the fixed interval). To do this we solve a least-squares optimization problem: use a root-sum-square objective function and optimize ''x'' (and any free deltas) to minimize that function.


== Domain and error model ==
== Domain and error model ==
Line 17: Line 17:
* to measure either the linear (frequency ratio) error or the logarithmic (cents) one (called the ''domain'');
* to measure either the linear (frequency ratio) error or the logarithmic (cents) one (called the ''domain'');
* the collection of intervals to sum over (which we call the ''error model''):
* the collection of intervals to sum over (which we call the ''error model''):
** ''Rooted'': Only intervals from the root real-valued harmonic ''x'' are chosen.
** ''rooted'': Only intervals from the root real-valued harmonic ''x'' are chosen.
** ''Pairwise'': All intervals in the chord are compared.
** ''pairwise'': All intervals in the chord are chosen.
** ''All-steps'': Only intervals between adjacent notes are compared.
** ''all-steps'': Only intervals between adjacent notes are chosen.
The method to solve the problem will also differ depending on the numbers of variables involved (only one variable ''x'' for fully delta-rational).


We arrive at the following general formula: Let <math>[n] = \{0, 1, 2, ..., n\},</math> let <math>I \subseteq {[n] \choose 2}</math> be the error model, and let <math>f_D</math> represent the domain function (identity or <math>\log</math>). Then the error function to be minimized by optimizing <math>x</math> and any free deltas is:
We arrive at the following general formula: Let <math>I \subseteq {\{0, 1, 2, ..., n\} \choose 2}</math> be the error model, and let <math>f_D</math> represent the domain function (identity for linear, or <math>\log</math>). Then the objective function (measuring the error) to be minimized by optimizing <math>x</math> and any free deltas is:


<math>
<math>
\sum_{i<j, \{i,j\} \in I} \Bigg( f_D \Bigg( \frac{x + D_j}{x + D_i} \Bigg) - f_D\Bigg( \frac{r_j}{r_i} \Bigg) \Bigg)^2.
\sqrt{\sum_{i<j, \{i,j\} \in I} \Bigg( f_D \Bigg( \frac{x + D_j}{x + D_i} \Bigg) - f_D\Bigg( \frac{r_j}{r_i} \Bigg) \Bigg)^2}.
</math>
</math>


{| class="wikitable"
{| class="wikitable"
|+ Error function for various modes and error models
|+ Objective function for various modes and error models
|-
|-
!|Domain
!|Domain
!|Error model
!|Error model
!|Error function
!|Objective function
|-
|-
!rowspan="3"|Linear
!rowspan="3"|Linear
Line 45: Line 44:
||<math>\sqrt{\sum_{0\leq i<n} \Bigg( \frac{x + D_{i+1}}{x + D_i} - \frac{r_{i+1}}{r_i} \Bigg)^2 }</math>
||<math>\sqrt{\sum_{0\leq i<n} \Bigg( \frac{x + D_{i+1}}{x + D_i} - \frac{r_{i+1}}{r_i} \Bigg)^2 }</math>
|-
|-
!rowspan="3"|Logarithmic<br/>(nepers)
!rowspan="3"|Logarithmic<br/>(nataves)
!|Rooted
!|Rooted
||<math>\sqrt{\sum_{i=1}^n \Bigg(\log \frac{x + D_i}{r_i x} \Bigg)^2}</math>
||<math>\sqrt{\sum_{i=1}^n \Bigg(\log \frac{x + D_i}{r_i x} \Bigg)^2}</math>
Line 55: Line 54:
||<math>\sqrt{\sum_{0\leq i<n} \Bigg(\log \frac{x + D_{i+1}}{x + D_i} - \log \frac{r_{i+1}}{r_i} \Bigg)^2}</math>
||<math>\sqrt{\sum_{0\leq i<n} \Bigg(\log \frac{x + D_{i+1}}{x + D_i} - \log \frac{r_{i+1}}{r_i} \Bigg)^2}</math>
|}
|}
To convert nepers to cents, multiply by <math>\frac{1200}{\log 2}.</math>
To convert nataves to cents, multiply by <math>\frac{1200}{\log 2}.</math>


== Solution methods ==
== Solution methods ==
=== Analytic (FDR case) ===
This section gets into the depths of mathematical optimization methods used to minimize DR error. (Optimization is a whole field and there are many different methods; the reason most of those methods exist is to find solutions when finding formulas for the solutions is infeasible.)
==== Rooted linear ====
=== Grid method (FDR case) ===
Setting the derivative to 0 gives us the closed-form solution
<syntaxhighlight lang="py" line="1">
class GridSolution:
    def __init__(self, x, fx):
        self.x = x
        self.fx = fx
   
    def __repr__(self):
        return f"GridSolution(x={self.x:.5f}, fx={self.fx:.5f})"


<math>x = \frac{\sum_{i=1}^n D_i }{-n + \sum_{i=1}^n r_i},</math>
def grid_method(f, window_size=100, coarse_steps=1000, fine_steps=1000):
    best_error = float("inf")
    best_x = 0
   
    coarse_step = window_size / coarse_steps
    # Fine step partitions one coarse step into smaller pieces
    fine_step = coarse_step / fine_steps
   
    # --- Phase 1: Coarse Grid Search ---
    # Search in the window (0, window_size]
    for i in range(1, coarse_steps + 1):
        x = i * coarse_step
        fx = f(x)
        if fx < best_error:
            best_error = fx
            best_x = x


which can be plugged back into
    # --- Phase 2: Fine Grid Search ---
    # Center the new search window around the best_x found in Phase 1
    # We search from (best_x - coarse_step/2) to (best_x + coarse_step/2)
   
    fine_window_lower = best_x - (coarse_step / 2)
   
    for j in range(1, fine_steps + 1):
        x = fine_window_lower + (j * fine_step)
        fx = f(x)
        if fx < best_error:
            best_error = fx
            best_x = x
           
    return GridSolution(best_x, best_error)
</syntaxhighlight>


<math>\sqrt{\sum_{1=1}^n \Bigg( 1 + \frac{D_i}{x} - r_i \Bigg)^2 }</math>
=== BFGS-B (one related delta set, arbitrary free deltas) ===
We let ''x''<sub>1</sub> = ''x'' and include additional free variables ''x''<sub>2</sub>, ..., ''x''<sub>''n''</sub>, one for every additional +?, after coalescing segments of consecutive +?'s into one +? and after trimming leading and trailing free delta segments.


to obtain the least-squares linear error.
BFGS-B is a quasi-Newton optimization method (based on BFGS) particularly suited for this problem:
* The objective function is smooth, allowing use of gradients
* Fast convergence, requiring at worst 20 iterations for accuracy
* Naturally deals with the ''x'' > 0 constraint using a log barrier and minimizing the transformed function using the unconstrained BFGS method
* Acceptable memory usage given a realistic number of parameters for practical DR chords (up to 3 interior free delta segments, thus 4 parameters).


=== Grid method (FDR case) ===
It is a ''quasi-Newton method'' because it uses an approximation of the Hessian (matrix of mixed second partial derivatives) of the objective function at each step.


=== Partially DR (one related delta set, one free variable) ===
In the Python implementation below, <code>x</code> represents the vector <math>(x_1, x_2, ..., x_n),</math> <code>x0</code> is the initial guess for the solution, and <code>f</code> is the objective function.
Suppose we wish to approximate a target delta signature of the form <math>+\delta_1 +? +\delta_3</math> with the chord <math>1:r_1:r_2:r_3</math> (where the +? is free to vary). By a derivation similar to the above, the least-squares problem is


<math>
<syntaxhighlight lang="py" line="1">
\displaystyle {\underset{x,y}{\text{minimize}} \sqrt{\bigg(\frac{x + \delta_1}{x} - r_1 \bigg)^2 + \bigg(\frac{x+\delta_1 + y}{x} - r_2 \bigg)^2 + \bigg(\frac{x+\delta_1 + y + \delta_3}{x} - r_3 \bigg)^2 }},
import numpy as np
</math>
import math


where ''y'' represents the free delta +?.
class BFGSSolution:
    def __init__(self, x, fx, iterations, success):
        self.x = x
        self.fx = fx
        self.iterations = iterations
        self.success = success


We can set the partial derivatives with respect to ''x'' and ''y'' of the inner expression equal to zero (since the derivative of sqrt() is never 0) and use SymPy to solve the system:
def numerical_gradient(f, x, eps=1e-8):
    grad = np.zeros_like(x)
    for i in range(len(x)):
        x_plus = x.copy()
        x_minus = x.copy()
        x_plus[i] += eps
        x_minus[i] -= eps
        grad[i] = (f(x_plus) - f(x_minus)) / (2 * eps)
    return grad


<syntaxhighlight lang="py">
def bfgs(f, grad, x0, max_iterations=100, tolerance=1e-5):
import sympy
    x = np.array(x0, dtype=float)
x = sympy.Symbol("x", real=True)
    n = len(x)
y = sympy.Symbol("y", real=True)
    fx = f(x)
d1 = sympy.Symbol("\\delta_{1}", real=True)
    g = grad(x)
d2 = sympy.Symbol("\\delta_{2}", real=True)
   
d3 = sympy.Symbol("\\delta_{3}", real=True)
    # Approximate inverse Hessian
r1 = sympy.Symbol("r_1", real=True)
    # Initial approximation is the identity matrix
r2 = sympy.Symbol("r_2", real=True)
    H = np.eye(n)
r3 = sympy.Symbol("r_3", real=True)
   
err_squared = ((x + d1) / x - r1) ** 2 + ((x + d1 + y) / x - r2) ** 2 + ((x + d1 + y + d3) / x - r3) ** 2
    for i in range(max_iterations):
err_squared.expand()
        # 0: Check convergence
err_squared_x = sympy.diff(err_squared, x)
        grad_norm = np.linalg.norm(g)
err_squared_y = sympy.diff(err_squared, y)
        if grad_norm < tolerance:
sympy.nonlinsolve([err_squared_x, err_squared_y], [x, y])
            return BFGSSolution(x, fx, i, True)
</syntaxhighlight>
       
        # 1: Set search direction p (negative gradient direction transformed by H)
        # p = -H * g
        p = -H @ g
       
        # 2: Get alpha satisfying Wolfe conditions (Armijo rule and curvature condition)
        c1 = 1e-4
        c2 = 0.9
        max_line_search = 20
        rho_ls = 0.5
       
        gp = np.dot(g, p)
        alpha = 1.0
       
        for _ in range(max_line_search):
            x_next_guess = x + alpha * p
           
            # Check Armijo rule
            # f(x + alpha*p) <= f(x) + c1 * alpha * p^T * g
            if f(x_next_guess) <= fx + c1 * alpha * gp:
                # Check Curvature condition
                # -p^T * grad(x_next) <= -c2 * p^T * g
                g_next_guess = grad(x_next_guess)
                if -np.dot(p, g_next_guess) <= -c2 * gp:
                    break
           
            alpha *= rho_ls
       
        # 3: Set s = alpha * p and x_next = x + s
        s = alpha * p
        x_next = x + s
       
        # 4: Set y = grad(x_next) - grad(x)
        # We re-calculate grad(x_next) here to match the strict logic flow,
        # though optimization could reuse g_next_guess from the successful line search.
        g_next = grad(x_next)
        y = g_next - g
       
        # 5: BFGS Update
        # Update H += U + V
        sy = np.dot(s, y)
       
        # Prevent division by zero if step size was extremely small
        if sy == 0:
            # In a robust implementation, you might reset H to Identity here
            break
           
        Hy = H @ y
       
        # Calculate scalar for the first term: (s^T y + y^T H y) / (s^T y)^2
        scalar1 = (sy + np.dot(y, Hy)) / (sy * sy)
        U = scalar1 * np.outer(s, s)
       
        # Calculate the second term matrices
        # W = (H y) s^T + s (y^T H)
        # Note: Since H is symmetric, y^T H is equivalent to (H y)^T
        W = np.outer(Hy, s) + np.outer(s, Hy)
        V = (-1 / sy) * W
       
        H = H + U + V
       
        # Update x, fx, and g for next iteration
        x = x_next
        fx = f(x_next)
        g = g_next


The unique solution with ''x'' > 0 is
    return BFGSSolution(x, fx, max_iterations, False)


<math>
def bfgs_barrier(f, bounds, x0, history_size=10, max_iterations=100, tolerance=1e-5, barrier_weight=1e-4):
x = \frac{2 \delta_1 + \delta_3 + 2y}{r_2 + r_3 - 2},
    """
</math>
    Solves bounded optimization using a Log-Barrier method wrapped around BFGS.
    Note: x0 must be strictly feasible (inside bounds).
    """
    def transformed_f(x):
        penalty = 0
        for i, val in enumerate(x):
            lower, upper = bounds[i]
           
            # Hard cutoff prevents log domain errors during line search exploration
            if (lower is not None and val <= lower) or (upper is not None and val >= upper):
                return float("inf")
           
            # Log barrier penalties
            if lower is not None:
                penalty -= barrier_weight * math.log(val - lower)
            if upper is not None:
                penalty -= barrier_weight * math.log(upper - val)
       
        return f(x) + penalty
   
    # Use the transformed function for gradients as well
    grad = lambda x: numerical_gradient(transformed_f, x)
   
    result = bfgs(transformed_f, grad, x0, history_size, max_iterations, tolerance)
   
    # Restore actual function value
    result.fx = f(result.x)
    return result
</syntaxhighlight>


<math>
=== L-BFGS-B (one related delta set, arbitrary free deltas) ===
y = \frac{- 2 \delta_1^2 r_1 + \delta_1^2 r_2 + \delta_1^2 r_3 - \delta_1 \delta_3 r_1 + \delta_1 \delta_3 r_2 - \delta_1 \delta_3 r_3 + \delta_1 \delta_3 + \delta_3^2 r_2 - \delta_3^2}{2 \delta_1 r_1 - 2 \delta_1 - \delta_3 r_2 + \delta_3 r_3}.
L-BFGS-B is an approximation to BFGS-B limiting memory usage, particularly suited for high-dimensional problems but nevertheless agreeing very well with BFGS-B for typical DR test cases. Python code for the L-BFGS method is provided below.
</math>


=== Partially DR (one related delta set, arbitrary free deltas) ===
<syntaxhighlight lang="py" line="1">
We similarly include a free variable y<sub>i</sub> to be optimized for every additional +?, after coalescing strings of consecutive +?'s into segments and after trimming leading and trailing free delta segments.
import numpy as np
import math


The L-BFGS-B (limited-memory Broyden-Fletcher-Goldfarb-Shanno with Box constraints) algorithm is a bounded quasi-Newton method particularly well-suited for this problem:
class LBFGSSolution:
    def __init__(self, x, fx, iterations, success):
        self.x = x
        self.fx = fx
        self.iterations = iterations
        self.success = success


* '''Gradient-based:''' Uses numerical gradient information to find precise search directions, unlike derivative-free methods (Nelder-Mead, Powell) which explore via simplex transformations or conjugate directions
def numerical_gradient(f, x, eps=1e-8):
* '''Quasi-Newton approximation:''' Builds up an approximation of the Hessian (second derivative matrix) using only gradient evaluations, avoiding expensive exact Hessian computation
    grad = np.zeros_like(x)
* '''Memory efficient:''' Limited-memory variant stores only the last m gradient differences (typically m=5-10), making it suitable for problems with many free variables
    for i in range(len(x)):
* '''Box constraints:''' Naturally handles the constraint x > 0 via logarithmic barrier penalties, converting the constrained problem to unconstrained
        x_plus = x.copy()
* '''Smooth objective:''' The sum-of-squared-errors objective function is smooth and differentiable, ideal for gradient-based optimization
        x_minus = x.copy()
        x_plus[i] += eps
        x_minus[i] -= eps
        grad[i] = (f(x_plus) - f(x_minus)) / (2 * eps)
    return grad


{| class="wikitable"
def l_bfgs(f, grad, x0, history_size=10, max_iterations=100, tolerance=1e-5):
|+ Performance comparison
    x = np.array(x0, dtype=float) # Ensure float array
! Method !! Iterations !! Typical Error !! Notes
    fx = f(x)
|-
    g = grad(x)
| L-BFGS-B || 14-20 || 0.000359-0.000681 || Fastest convergence, lowest errors
   
|-
    s_history = []
| Nelder-Mead || 200-400 || 0.000465-0.000993 || Derivative-free, slower but robust
    y_history = []
|-
    rho_history = []
| Powell || 150-250 || 0.000566-0.000681 || Conjugate directions, moderate speed
   
|}
    for iteration in range(max_iterations):
        grad_norm = np.linalg.norm(g)
        if grad_norm < tolerance:
            return LBFGSSolution(x, fx, iteration, True)
       
        # --- Two-loop recursion ---
        q = g.copy()
       
        # We need to store alpha to use it in the second loop
        m = len(s_history)
        alpha = [0.0] * m
       
        # First loop (Backward)
        for i in range(m - 1, -1, -1):
            alpha[i] = rho_history[i] * np.dot(s_history[i], q)
            q = q - alpha[i] * y_history[i]
       
        # Initial Hessian approximation scaling
        gamma = 1.0
        if m > 0:
            gamma = np.dot(s_history[-1], y_history[-1]) / np.dot(y_history[-1], y_history[-1])
       
        z = q * gamma


'''Implementation details:'''
        # Second loop (Forward)
* Variables: [x, y<sub>1</sub>, y<sub>2</sub>, ..., y<sub>k</sub>] where k = number of interior free segments
        for i in range(m):
* Bounds: x ∈ (10<sup>-6</sup>, ), y<sub>i</sub> ∈ (-∞, ∞)
            beta = rho_history[i] * np.dot(y_history[i], z)
* Normalization: Scales deltas to keep x ≈ 5 for numerical stability
            z = z + s_history[i] * (alpha[i] - beta)
* Multiple starting points: Tries 4-5 random initializations and returns best result
       
* Barrier weight: 10<sup>-10</sup> (very small to minimize interference with true objective)
        p = -z # Search direction


The gradient-based approach allows L-BFGS-B to converge in typically 15-20 iterations versus 200-400 for derivative-free methods, making it the preferred choice for PDR optimization.
        # --- Line search (Backtracking) ---
 
        step_size = 1.0
==== Pseudocode ====
        c1 = 1e-4
        rho_ls = 0.5
        max_line_search = 20
       
        # Check for sufficient decrease (Armijo rule)
        # f(x + alpha*p) <= f(x) + c1 * alpha * p^T * g
       
        g_dot_p = np.dot(g, p)
       
        for _ in range(max_line_search):
            x_new = x + step_size * p
            fx_new = f(x_new)
           
            if fx_new <= fx + c1 * step_size * g_dot_p:
                break
            step_size *= rho_ls
        else:
            # If line search fails to find a better point, stop or accept best attempt
            # (Here we just accept the last reduced step_size)
            pass
       
        # --- Update History ---
        s = x_new - x
        g_new = grad(x_new)
        y = g_new - g
       
        sy = np.dot(s, y)
       
        if sy > 1e-10: # Ensure curvature condition
            s_history.append(s)
            y_history.append(y)
            rho_history.append(1.0 / sy)
           
            # Maintain history size
            if len(s_history) > history_size:
                s_history.pop(0)
                y_history.pop(0)
                rho_history.pop(0)
           
        x = x_new
        fx = fx_new
        g = g_new
   
    return LBFGSSolution(x, fx, max_iterations, False)
</syntaxhighlight>


== External links ==
== External links ==
* [https://inthar-raven.github.io/delta/ Inthar's DR chord explorer (includes least-squares error calculation in both domains and multiple error models)]
* [https://turbofishcrow.github.io/delta/ Inthar's DR chord explorer (includes least-squares error calculation in both domains and multiple error models)]
[[Category:Atypical ratios]]
{{cat|Atypical ratios}}

Latest revision as of 04:22, 24 February 2026

This is a technical or mathematical page. While the subject may be of some relevance to music, the page treats the subject in technical language.

This article will describe several least-squares error measures for delta-rational chords. The idea behind least-squares error measures is to find the chord with the exact specified delta signature such that it deviates as little as possible from the chord that is meant to approximate that delta signature, and to measure the deviation. Least-squares error measures have the advantage of not fixing a particular interval in the chord when constructing the chord of best fit. However, like any other numerical measure of concordance or error, you should take them with a grain of salt.

Conventions and introduction

The idea motivating least-squares error measures on a chord as an approximation to a given delta signature is the following:

We want the error of a chord 1:r1:r2:...:rn (in increasing order; we also take r0 = 1), with n > 1, in the linear domain as an approximation to a delta-rational chord with signature +δ12 ... +δn (possibly with some deltas free), i.e. the target chord is

x:x+δ1::x+l=1nδl

with some root real-valued harmonic x. Let D0=0,Di=k=1iδk be the delta signature +δ12 ... +δn written cumulatively.

We want to measure the error without having to fix any interval in the target chord (as one might naively fix an interval and measure errors in the other deltas in relation to the fixed interval). To do this we solve a least-squares optimization problem: use a root-sum-square objective function and optimize x (and any free deltas) to minimize that function.

Domain and error model

We have two choices:

  • to measure either the linear (frequency ratio) error or the logarithmic (cents) one (called the domain);
  • the collection of intervals to sum over (which we call the error model):
    • rooted: Only intervals from the root real-valued harmonic x are chosen.
    • pairwise: All intervals in the chord are chosen.
    • all-steps: Only intervals between adjacent notes are chosen.

We arrive at the following general formula: Let I({0,1,2,...,n}2) be the error model, and let fD represent the domain function (identity for linear, or log). Then the objective function (measuring the error) to be minimized by optimizing x and any free deltas is:

i<j,{i,j}I(fD(x+Djx+Di)fD(rjri))2.

Objective function for various modes and error models
Domain Error model Objective function
Linear Rooted i=1n(x+Dixri)2=i=1n(1+Dixri)2
Pairwise 0i<jn(x+Djx+Dirjri)2
All-steps 0i<n(x+Di+1x+Diri+1ri)2
Logarithmic
(nataves)
Rooted i=1n(logx+Dirix)2
Pairwise 0i<jn(logx+Djx+Dilogrjri)2
All-steps 0i<n(logx+Di+1x+Dilogri+1ri)2

To convert nataves to cents, multiply by 1200log2.

Solution methods

This section gets into the depths of mathematical optimization methods used to minimize DR error. (Optimization is a whole field and there are many different methods; the reason most of those methods exist is to find solutions when finding formulas for the solutions is infeasible.)

Grid method (FDR case)

class GridSolution:
    def __init__(self, x, fx):
        self.x = x
        self.fx = fx
    
    def __repr__(self):
        return f"GridSolution(x={self.x:.5f}, fx={self.fx:.5f})"

def grid_method(f, window_size=100, coarse_steps=1000, fine_steps=1000):
    best_error = float("inf")
    best_x = 0
    
    coarse_step = window_size / coarse_steps
    # Fine step partitions one coarse step into smaller pieces
    fine_step = coarse_step / fine_steps 
    
    # --- Phase 1: Coarse Grid Search ---
    # Search in the window (0, window_size]
    for i in range(1, coarse_steps + 1):
        x = i * coarse_step
        fx = f(x)
        if fx < best_error:
            best_error = fx
            best_x = x

    # --- Phase 2: Fine Grid Search ---
    # Center the new search window around the best_x found in Phase 1
    # We search from (best_x - coarse_step/2) to (best_x + coarse_step/2)
    
    fine_window_lower = best_x - (coarse_step / 2)
    
    for j in range(1, fine_steps + 1):
        x = fine_window_lower + (j * fine_step)
        fx = f(x)
        if fx < best_error:
            best_error = fx
            best_x = x
            
    return GridSolution(best_x, best_error)

We let x1 = x and include additional free variables x2, ..., xn, one for every additional +?, after coalescing segments of consecutive +?'s into one +? and after trimming leading and trailing free delta segments.

BFGS-B is a quasi-Newton optimization method (based on BFGS) particularly suited for this problem:

  • The objective function is smooth, allowing use of gradients
  • Fast convergence, requiring at worst 20 iterations for accuracy
  • Naturally deals with the x > 0 constraint using a log barrier and minimizing the transformed function using the unconstrained BFGS method
  • Acceptable memory usage given a realistic number of parameters for practical DR chords (up to 3 interior free delta segments, thus 4 parameters).

It is a quasi-Newton method because it uses an approximation of the Hessian (matrix of mixed second partial derivatives) of the objective function at each step.

In the Python implementation below, x represents the vector (x1,x2,...,xn), x0 is the initial guess for the solution, and f is the objective function.

import numpy as np
import math

class BFGSSolution:
    def __init__(self, x, fx, iterations, success):
        self.x = x
        self.fx = fx
        self.iterations = iterations
        self.success = success

def numerical_gradient(f, x, eps=1e-8):
    grad = np.zeros_like(x)
    for i in range(len(x)):
        x_plus = x.copy()
        x_minus = x.copy()
        x_plus[i] += eps
        x_minus[i] -= eps
        grad[i] = (f(x_plus) - f(x_minus)) / (2 * eps)
    return grad

def bfgs(f, grad, x0, max_iterations=100, tolerance=1e-5):
    x = np.array(x0, dtype=float)
    n = len(x)
    fx = f(x)
    g = grad(x)
    
    # Approximate inverse Hessian
    # Initial approximation is the identity matrix
    H = np.eye(n)
    
    for i in range(max_iterations):
        # 0: Check convergence
        grad_norm = np.linalg.norm(g)
        if grad_norm < tolerance:
            return BFGSSolution(x, fx, i, True)
        
        # 1: Set search direction p (negative gradient direction transformed by H)
        # p = -H * g
        p = -H @ g
        
        # 2: Get alpha satisfying Wolfe conditions (Armijo rule and curvature condition)
        c1 = 1e-4
        c2 = 0.9
        max_line_search = 20
        rho_ls = 0.5
        
        gp = np.dot(g, p)
        alpha = 1.0
        
        for _ in range(max_line_search):
            x_next_guess = x + alpha * p
            
            # Check Armijo rule
            # f(x + alpha*p) <= f(x) + c1 * alpha * p^T * g
            if f(x_next_guess) <= fx + c1 * alpha * gp:
                # Check Curvature condition
                # -p^T * grad(x_next) <= -c2 * p^T * g
                g_next_guess = grad(x_next_guess)
                if -np.dot(p, g_next_guess) <= -c2 * gp:
                    break
            
            alpha *= rho_ls
        
        # 3: Set s = alpha * p and x_next = x + s
        s = alpha * p
        x_next = x + s
        
        # 4: Set y = grad(x_next) - grad(x)
        # We re-calculate grad(x_next) here to match the strict logic flow, 
        # though optimization could reuse g_next_guess from the successful line search.
        g_next = grad(x_next)
        y = g_next - g
        
        # 5: BFGS Update
        # Update H += U + V
        sy = np.dot(s, y)
        
        # Prevent division by zero if step size was extremely small
        if sy == 0: 
            # In a robust implementation, you might reset H to Identity here
            break 
            
        Hy = H @ y
        
        # Calculate scalar for the first term: (s^T y + y^T H y) / (s^T y)^2
        scalar1 = (sy + np.dot(y, Hy)) / (sy * sy)
        U = scalar1 * np.outer(s, s)
        
        # Calculate the second term matrices
        # W = (H y) s^T + s (y^T H)
        # Note: Since H is symmetric, y^T H is equivalent to (H y)^T
        W = np.outer(Hy, s) + np.outer(s, Hy)
        V = (-1 / sy) * W
        
        H = H + U + V
        
        # Update x, fx, and g for next iteration
        x = x_next
        fx = f(x_next)
        g = g_next

    return BFGSSolution(x, fx, max_iterations, False)

def bfgs_barrier(f, bounds, x0, history_size=10, max_iterations=100, tolerance=1e-5, barrier_weight=1e-4):
    """
    Solves bounded optimization using a Log-Barrier method wrapped around BFGS.
    Note: x0 must be strictly feasible (inside bounds).
    """
    def transformed_f(x):
        penalty = 0
        for i, val in enumerate(x):
            lower, upper = bounds[i]
            
            # Hard cutoff prevents log domain errors during line search exploration
            if (lower is not None and val <= lower) or (upper is not None and val >= upper):
                return float("inf")
            
            # Log barrier penalties
            if lower is not None:
                penalty -= barrier_weight * math.log(val - lower)
            if upper is not None:
                penalty -= barrier_weight * math.log(upper - val)
        
        return f(x) + penalty
    
    # Use the transformed function for gradients as well
    grad = lambda x: numerical_gradient(transformed_f, x)
    
    result = bfgs(transformed_f, grad, x0, history_size, max_iterations, tolerance)
    
    # Restore actual function value
    result.fx = f(result.x)
    return result

L-BFGS-B is an approximation to BFGS-B limiting memory usage, particularly suited for high-dimensional problems but nevertheless agreeing very well with BFGS-B for typical DR test cases. Python code for the L-BFGS method is provided below.

import numpy as np
import math

class LBFGSSolution:
    def __init__(self, x, fx, iterations, success):
        self.x = x
        self.fx = fx
        self.iterations = iterations
        self.success = success

def numerical_gradient(f, x, eps=1e-8):
    grad = np.zeros_like(x)
    for i in range(len(x)):
        x_plus = x.copy()
        x_minus = x.copy()
        x_plus[i] += eps
        x_minus[i] -= eps
        grad[i] = (f(x_plus) - f(x_minus)) / (2 * eps)
    return grad

def l_bfgs(f, grad, x0, history_size=10, max_iterations=100, tolerance=1e-5):
    x = np.array(x0, dtype=float) # Ensure float array
    fx = f(x)
    g = grad(x)
    
    s_history = []
    y_history = []
    rho_history = []
    
    for iteration in range(max_iterations):
        grad_norm = np.linalg.norm(g)
        if grad_norm < tolerance:
            return LBFGSSolution(x, fx, iteration, True)
        
        # --- Two-loop recursion ---
        q = g.copy()
        
        # We need to store alpha to use it in the second loop
        m = len(s_history)
        alpha = [0.0] * m 
        
        # First loop (Backward)
        for i in range(m - 1, -1, -1):
            alpha[i] = rho_history[i] * np.dot(s_history[i], q)
            q = q - alpha[i] * y_history[i]
        
        # Initial Hessian approximation scaling
        gamma = 1.0
        if m > 0:
            gamma = np.dot(s_history[-1], y_history[-1]) / np.dot(y_history[-1], y_history[-1])
        
        z = q * gamma

        # Second loop (Forward)
        for i in range(m):
            beta = rho_history[i] * np.dot(y_history[i], z)
            z = z + s_history[i] * (alpha[i] - beta)
        
        p = -z # Search direction

        # --- Line search (Backtracking) ---
        step_size = 1.0
        c1 = 1e-4
        rho_ls = 0.5
        max_line_search = 20
        
        # Check for sufficient decrease (Armijo rule)
        # f(x + alpha*p) <= f(x) + c1 * alpha * p^T * g
        
        g_dot_p = np.dot(g, p)
        
        for _ in range(max_line_search):
            x_new = x + step_size * p
            fx_new = f(x_new)
            
            if fx_new <= fx + c1 * step_size * g_dot_p:
                break
            step_size *= rho_ls
        else:
            # If line search fails to find a better point, stop or accept best attempt
            # (Here we just accept the last reduced step_size)
            pass 
        
        # --- Update History ---
        s = x_new - x
        g_new = grad(x_new)
        y = g_new - g
        
        sy = np.dot(s, y)
        
        if sy > 1e-10: # Ensure curvature condition
            s_history.append(s)
            y_history.append(y)
            rho_history.append(1.0 / sy)
            
            # Maintain history size
            if len(s_history) > history_size:
                s_history.pop(0)
                y_history.pop(0)
                rho_history.pop(0)
            
        x = x_new
        fx = fx_new
        g = g_new
    
    return LBFGSSolution(x, fx, max_iterations, False)