$$\newcommand{\v}{\mathbf}$$

- Let $f(\v{v}) = \frac{1}{2}\left(\langle \v{v}, \v{e}_3\rangle -||\v{v}||\right)^2$ and let $n(\v{a},\v{b},\v{c}) = (\v{b}-\v{a})\times(\v{c}-\v{a}).$ We want to understand the behavior of the function $(f \circ n)(\v{a},\v{b},\v{c}).$
- Geometrically, we can write $f(n) = A \cdot(\cos{\theta_z} - 1)^2$, where $A$ is the area of the triangle and $\theta_z$ is the angle between the normal of the triangle and the z-axis.  This is because the length of the triangle normal $||n||$ is twice the area of the triangle, and because of the definition of cosine in terms of the dot product. 
- Because of the chain rule, the derivative factors: $D(f\circ n) = (Df \circ n) \cdot Dn$. The first term $(Df \circ n)$  depends only (!) on $\v{n}$, i.e. the area of the triangle and its orientation in space. Beyond that, it does not depend on the exact positions of the vertices $\v{a}$, $\v{b}$, $\v{c}$ and so it's invariant to things like translation.  The second term $Dn = [\frac{\partial \v{n}}{\partial \v{a}}, \frac{\partial \v{n}}{\partial \v{b}}, \frac{\partial \v{n}}{\partial \v{c}}]$ depends only on how that area and orientation changes as you move the vertices.
- We can compute each of these terms. The first one is: $$Df\circ n = ||\v{n}||\,(\langle \hat{\v{n}}, \v{e}_3\rangle - 1)(\v{e}_3 - \hat{\v{n}})$$ 
  where $\widehat{\v n} \equiv \v{n}/||\v{n}||$ is the unit vector in the direction of the normal $\v{n}$. Note that this is a vector showing how $f$ changes as you vary each of the components of $\v{n}$. It is proportional to $(\cos{\theta_z}-1)$ and proportional to the area $A$. So, as the orientation of the triangle aligns more closely with $\v{e}_3$, this term vanishes.
- The second term is $Dn = [\frac{\partial \v{n}}{\partial \v{a}}, \frac{\partial \v{n}}{\partial \v{b}}, \frac{\partial \v{n}}{\partial \v{c}}]$ and for simplicity, we can look at just one of the partial derivatives, to see how $\v{n}$ changes when you change one vertex $\v{a}$ of the triangle.  If you calculate it out, you find that:

  $$\frac{\partial \v{n}}{\partial \v{a}} = \frac{\partial}{\partial \v{a}}\left[(\v{b}-\v{a})\times(\v{c}-\v{a})\right] = \v{I} \times (\v{b}-\v{c}) = \begin{bmatrix}\v{e}_1 \times (\v{b}-\v{c})\\\v{e}_2 \times (\v{b}-\v{c})\\\v{e}_3 \times (\v{b}-\v{c})\end{bmatrix}$$

  This is, in full, a 3x3 matrix showing how each of the normal components $n_1$ $n_2$ $n_3$ vary as you change each of the coordinates $a_1$, $a_2$, $a_3$. You get similar expressions for the other derivatives; putting them altogether, you get a 9x3 matrix showing how changing any of the vertices $\v{a}$, $\v{b}$, $\v{c}$ affects the three components of $\v{n}$:

  $$D\v{n} = \begin{bmatrix}\v{I} \times (\v{b}-\v{c})\\ \v{I} \times (\v{c}-\v{a}) \\ \v{I} \times (\v{b}-\v{a})\end{bmatrix}$$

- Now we can compute the full derivative $D(f\circ n) = (Df \circ n)\cdot Dn$.

$$ \begin{align}Df&=||\v{n}||\,(\langle \hat{\v{n}}, \v{e}_3\rangle - 1)(\v{e}_3 - \hat{\v{n}}) \bullet \begin{bmatrix}\v{I} \times (\v{b}-\v{c})\\ \v{I} \times (\v{c}-\v{a}) \\ \v{I} \times (\v{b}-\v{a})\end{bmatrix}\\
\end{align}$$


<hr/>
- We can figure out the behavior of $f$ qualitatively, too.
- It turns out that $||n||$ is equal to twice the area of the triangle. (This is because the area of a triangle is half the size of the cross product of any two sides.)
- Let $\theta_z$ be the angle between $n$ and the z-axis. Then $\langle n, e_3\rangle = ||n||\, ||e_3|| \cos{(\theta_z)}$ so our expression becomes 
  $$f(x_1,x_2,x_3) = \frac{||n||}{2}\left(\cos{(\theta_z)}-1\right)^2$$
- So if $A$ is the area of the triangle, and $\theta_z$ is the angle between the normal and the z-axis, we have $$f(x_1,x_2,x_3) = A\cdot(\cos(\theta_z)-1)^2.$$
- So, there are two ways to shrink $f$ : by shrinking the area, or by rotating the triangle so that it is perpendicular to the z-axis (making $\cos(\theta_z) = 1)$. Depending on the size of the triangle, gradient ascent will result in some combination of these two changes.
- The minimum value of $f$ occurs when the area of the triangle shrinks to nothing and/or when the triangle is perpendicular to the z-axis. In this case, $f$ is zero.
- I believe Netwon's method is rotating the triangle to make it more perpendicular to the z-axis. It may also be warping the triangle, i.e. moving its vertices to shrink the area of the triangle. I don't know which strategy the gradient favors&mdash;maybe it depends on the particular triangle, e.g. its initial orientation?  As a complete guess, I wonder if it does something like translate the vertices in the z-direction toward the geometric center of the triangle.

$$\frac{\partial f}{\partial x_i} = \frac{\partial A}{\partial x_i}(\cos{\theta}-1)^2 -2 A \cdot (\cos{\theta}-1)\sin{(\theta)}\frac{\partial \theta}{\partial x_i}$$


---

$$\newcommand{\v}{\mathbf}$$

- Let $f(\v{v}) = \frac{1}{2}\left(\langle \v{v}, \v{e}_3\rangle -||\v{v}||\right)^2$ and let $n(\v{a},\v{b},\v{c}) = (\v{b}-\v{a})\times(\v{c}-\v{a}).$ We want to understand the behavior of the function $(f \circ n)(\v{a},\v{b},\v{c}).$
- Geometrically, we can write $f(n) = A \cdot(\cos{\theta_z} - 1)^2$, where $A$ is the area of the triangle and $\theta_z$ is the angle between the normal of the triangle and the z-axis.  This is because the length of the triangle normal $||n||$ is twice the area of the triangle, and because of the definition of cosine in terms of the dot product. 
- Because of the chain rule, the derivative factors: $D(f\circ n) = (Df \circ n) \cdot Dn$. The first term $(Df \circ n)$  depends only (!) on $\v{n}$, i.e. the area of the triangle and its orientation in space. Beyond that, it does not depend on the exact positions of the vertices $\v{a}$, $\v{b}$, $\v{c}$ and so it's invariant to things like translation.  The second term $Dn = [\frac{\partial \v{n}}{\partial \v{a}}, \frac{\partial \v{n}}{\partial \v{b}}, \frac{\partial \v{n}}{\partial \v{c}}]$ depends only on how that area and orientation changes as you move the vertices.
- We can compute each of these terms. The first one is: $$Df\circ n = ||\v{n}||\,(\langle \hat{\v{n}}, \v{e}_3\rangle - 1)(\v{e}_3 - \hat{\v{n}})$$ 
  where $\widehat{\v n} \equiv \v{n}/||\v{n}||$ is the unit vector in the direction of the normal $\v{n}$. Note that this is a vector showing how $f$ changes as you vary each of the components of $\v{n}$. It is proportional to $(\cos{\theta_z}-1)$ and proportional to the area $A$. So, as the orientation of the triangle aligns more closely with $\v{e}_3$, this term vanishes.
- The second term is $Dn = [\frac{\partial \v{n}}{\partial \v{a}}, \frac{\partial \v{n}}{\partial \v{b}}, \frac{\partial \v{n}}{\partial \v{c}}]$ and for simplicity, we can look at just one of the partial derivatives, to see how $\v{n}$ changes when you change one vertex $\v{a}$ of the triangle.  If you calculate it out, you find that:

  $$\frac{\partial \v{n}}{\partial \v{a}} = \frac{\partial}{\partial \v{a}}\left[(\v{b}-\v{a})\times(\v{c}-\v{a})\right] = \v{I} \times (\v{b}-\v{c}) = \begin{bmatrix}\v{e}_1 \times (\v{b}-\v{c})\\\v{e}_2 \times (\v{b}-\v{c})\\\v{e}_3 \times (\v{b}-\v{c})\end{bmatrix}$$

  This is, in full, a 3x3 matrix showing how each of the normal components $n_1$ $n_2$ $n_3$ vary as you change each of the coordinates $a_1$, $a_2$, $a_3$. You get similar expressions for the other derivatives; putting them altogether, you get a 9x3 matrix showing how changing any of the vertices $\v{a}$, $\v{b}$, $\v{c}$ affects the three components of $\v{n}$:

  $$D\v{n} = \begin{bmatrix}\v{I} \times (\v{b}-\v{c})\\ \v{I} \times (\v{c}-\v{a}) \\ \v{I} \times (\v{b}-\v{a})\end{bmatrix}$$

- Now we can compute the full derivative $D(f\circ n) = (Df \circ n)\cdot Dn$.

$$ \begin{align}Df&=||\v{n}||\,(\langle \hat{\v{n}}, \v{e}_3\rangle - 1)(\v{e}_3 - \hat{\v{n}}) \bullet \begin{bmatrix}\v{I} \times (\v{b}-\v{c})\\ \v{I} \times (\v{c}-\v{a}) \\ \v{I} \times (\v{b}-\v{a})\end{bmatrix}\\
&=
||\v{n}||\,(\langle \hat{\v{n}}, \v{e}_3\rangle - 1) \begin{bmatrix}(\v{e}_3 - \hat{\v{n}}) \bullet [\v{I} \times (\v{b}-\v{c})]\\ (\v{e}_3 - \hat{\v{n}}) \bullet[\v{I} \times (\v{c}-\v{a})] \\ (\v{e}_3 - \hat{\v{n}}) \bullet [\v{I} \times (\v{b}-\v{a})]\end{bmatrix}\\
&=
||\v{n}||\,(\langle \hat{\v{n}}, \v{e}_3\rangle - 1) \begin{bmatrix}\v{e}_3 \bullet [\v{I} \times (\v{b}-\v{c})] - \\ (\v{e}_3 - \hat{\v{n}}) \bullet[\v{I} \times (\v{c}-\v{a})] \\ (\v{e}_3 - \hat{\v{n}}) \bullet [\v{I} \times (\v{b}-\v{a})]\end{bmatrix}\\
\end{align}$$


<hr/>
- We can figure out the behavior of $f$ qualitatively, too.
- It turns out that $||n||$ is equal to twice the area of the triangle. (This is because the area of a triangle is half the size of the cross product of any two sides.)
- Let $\theta_z$ be the angle between $n$ and the z-axis. Then $\langle n, e_3\rangle = ||n||\, ||e_3|| \cos{(\theta_z)}$ so our expression becomes 
  $$f(x_1,x_2,x_3) = \frac{||n||}{2}\left(\cos{(\theta_z)}-1\right)^2$$
- So if $A$ is the area of the triangle, and $\theta_z$ is the angle between the normal and the z-axis, we have $$f(x_1,x_2,x_3) = A\cdot(\cos(\theta_z)-1)^2.$$
- So, there are two ways to shrink $f$ : by shrinking the area, or by rotating the triangle so that it is perpendicular to the z-axis (making $\cos(\theta_z) = 1)$. Depending on the size of the triangle, gradient ascent will result in some combination of these two changes.
- The minimum value of $f$ occurs when the area of the triangle shrinks to nothing and/or when the triangle is perpendicular to the z-axis. In this case, $f$ is zero.
- I believe Netwon's method is rotating the triangle to make it more perpendicular to the z-axis. It may also be warping the triangle, i.e. moving its vertices to shrink the area of the triangle. I don't know which strategy the gradient favors&mdash;maybe it depends on the particular triangle, e.g. its initial orientation?  As a complete guess, I wonder if it does something like translate the vertices in the z-direction toward the geometric center of the triangle.

$$\frac{\partial f}{\partial x_i} = \frac{\partial A}{\partial x_i}(\cos{\theta}-1)^2 -2 A \cdot (\cos{\theta}-1)\sin{(\theta)}\frac{\partial \theta}{\partial x_i}$$

-----

$$\newcommand{\v}{\mathbf}$$

- Let $f(\v{v}) = \frac{1}{2}\left(\langle \v{v}, \v{e}_3\rangle -||\v{v}||\right)^2$ and let $n(\v{a},\v{b},\v{c}) = (\v{b}-\v{a})\times(\v{c}-\v{a}).$ We want to understand the behavior of the function $(f \circ n)(\v{a},\v{b},\v{c}).$
- Geometrically, we can write $f(n) = A \cdot(\cos{\theta_z} - 1)^2$, where $A$ is the area of the triangle and $\theta_z$ is the angle between the normal of the triangle and the z-axis.  This is because the length of the triangle normal $||n||$ is twice the area of the triangle, and because of the definition of cosine in terms of the dot product. 
- Because of the chain rule, the derivative factors: $D(f\circ n) = (Df \circ n) \cdot Dn$. The first term $(Df \circ n)$  depends only (!) on $\v{n}$, i.e. the area of the triangle and its orientation in space. Beyond that, it does not depend on the exact positions of the vertices $\v{a}$, $\v{b}$, $\v{c}$ and so it's invariant to things like translation.  The second term $Dn = [\frac{\partial \v{n}}{\partial \v{a}}, \frac{\partial \v{n}}{\partial \v{b}}, \frac{\partial \v{n}}{\partial \v{c}}]$ depends only on how that area and orientation changes as you move the vertices.
- We can compute each of these terms. The first one is: $$Df\circ n = ||\v{n}||\,(\langle \hat{\v{n}}, \v{e}_3\rangle - 1)(\v{e}_3 - \hat{\v{n}})$$ 
  where $\widehat{\v n} \equiv \v{n}/||\v{n}||$ is the unit vector in the direction of the normal $\v{n}$. Note that this is a vector showing how $f$ changes as you vary each of the components of $\v{n}$. It is proportional to $(\cos{\theta_z}-1)$ and proportional to the area $A$. So, as the orientation of the triangle aligns more closely with $\v{e}_3$, this term vanishes.
- The second term is $Dn = [\frac{\partial \v{n}}{\partial \v{a}}, \frac{\partial \v{n}}{\partial \v{b}}, \frac{\partial \v{n}}{\partial \v{c}}]$ and for simplicity, we can look at just one of the partial derivatives, to see how $\v{n}$ changes when you change one vertex $\v{a}$ of the triangle.  If you calculate it out, you find that:

  $$\frac{\partial \v{n}}{\partial \v{a}} = \frac{\partial}{\partial \v{a}}\left[(\v{b}-\v{a})\times(\v{c}-\v{a})\right] = \v{I} \times (\v{b}-\v{c}) = \begin{bmatrix}\v{e}_1 \times (\v{b}-\v{c})\\\v{e}_2 \times (\v{b}-\v{c})\\\v{e}_3 \times (\v{b}-\v{c})\end{bmatrix}$$

  This is, in full, a 3x3 matrix showing how each of the normal components $n_1$ $n_2$ $n_3$ vary as you change each of the coordinates $a_1$, $a_2$, $a_3$. You get similar expressions for the other derivatives; putting them altogether, you get a 9x3 matrix showing how changing any of the vertices $\v{a}$, $\v{b}$, $\v{c}$ affects the three components of $\v{n}$:

  $$D\v{n} = \begin{bmatrix}\v{I} \times (\v{b}-\v{c})\\ \v{I} \times (\v{c}-\v{a}) \\ \v{I} \times (\v{b}-\v{a})\end{bmatrix}$$

- Now we can compute the full derivative $D(f\circ n) = (Df \circ n)\cdot Dn$.

$$ \begin{align}Df&=||\v{n}||\,(\langle \hat{\v{n}}, \v{e}_3\rangle - 1)(\v{e}_3 - \hat{\v{n}}) \bullet \begin{bmatrix}\v{I} \times (\v{b}-\v{c})\\ \v{I} \times (\v{c}-\v{a}) \\ \v{I} \times (\v{b}-\v{a})\end{bmatrix}\\
&=
||\v{n}||\,(\langle \hat{\v{n}}, \v{e}_3\rangle - 1) \begin{bmatrix}(\v{e}_3 - \hat{\v{n}}) \bullet [\v{I} \times (\v{b}-\v{c})]\\ (\v{e}_3 - \hat{\v{n}}) \bullet[\v{I} \times (\v{c}-\v{a})] \\ (\v{e}_3 - \hat{\v{n}}) \bullet [\v{I} \times (\v{b}-\v{a})]\end{bmatrix}\\
&=
||\v{n}||\,(\langle \hat{\v{n}}, \v{e}_3\rangle - 1) \begin{bmatrix}\v{e}_3 \bullet [\v{I} \times (\v{b}-\v{c})] - \hat{\v{n}} \bullet [\v{I} \times (\v{b}-\v{c})] \\ 
\v{e}_3 \bullet [\v{I} \times (\v{c}-\v{a})] - \hat{\v{n}} \bullet [\v{I} \times (\v{c}-\v{a})] \\
\v{e}_3 \bullet [\v{I} \times (\v{b}-\v{a})] - \hat{\v{n}} \bullet [\v{I} \times (\v{b}-\v{a})] \\
\end{bmatrix}\\
&=
||\v{n}||\,(\langle \hat{\v{n}}, \v{e}_3\rangle - 1) \begin{bmatrix}[\v{e}_3 \times \v{I}] \bullet (\v{b}-\v{c})] + \v{I} \bullet [\hat{\v{n}} \times (\v{b}-\v{c})] \\ 
[\v{e}_3 \times \v{I}] \bullet (\v{c}-\v{a})] + \v{I} \bullet [\hat{\v{n}} \times (\v{c}-\v{a})] \\
[\v{e}_3 \times \v{I}] \bullet (\v{b}-\v{a})] + \v{I} \bullet [\hat{\v{n}} \times (\v{b}-\v{a})] \\
\end{bmatrix}\\
&=
||\v{n}||\,(\langle \hat{\v{n}}, \v{e}_3\rangle - 1) \begin{bmatrix}\v{R}\bullet (\v{b}-\v{c}) +   \hat{\v{n}} \times (\v{b}-\v{c}) \\ 
\v{R} \bullet (\v{c}-\v{a}) + \hat{\v{n}} \times (\v{c}-\v{a})\\
\v{R} \bullet (\v{b}-\v{a})]+  \hat{\v{n}} \times (\v{b}-\v{a}) \\
\end{bmatrix}\\
\end{align}$$


<hr/>
- We can figure out the behavior of $f$ qualitatively, too.
- It turns out that $||n||$ is equal to twice the area of the triangle. (This is because the area of a triangle is half the size of the cross product of any two sides.)
- Let $\theta_z$ be the angle between $n$ and the z-axis. Then $\langle n, e_3\rangle = ||n||\, ||e_3|| \cos{(\theta_z)}$ so our expression becomes 
  $$f(x_1,x_2,x_3) = \frac{||n||}{2}\left(\cos{(\theta_z)}-1\right)^2$$
- So if $A$ is the area of the triangle, and $\theta_z$ is the angle between the normal and the z-axis, we have $$f(x_1,x_2,x_3) = A\cdot(\cos(\theta_z)-1)^2.$$
- So, there are two ways to shrink $f$ : by shrinking the area, or by rotating the triangle so that it is perpendicular to the z-axis (making $\cos(\theta_z) = 1)$. Depending on the size of the triangle, gradient ascent will result in some combination of these two changes.
- The minimum value of $f$ occurs when the area of the triangle shrinks to nothing and/or when the triangle is perpendicular to the z-axis. In this case, $f$ is zero.
- I believe Netwon's method is rotating the triangle to make it more perpendicular to the z-axis. It may also be warping the triangle, i.e. moving its vertices to shrink the area of the triangle. I don't know which strategy the gradient favors&mdash;maybe it depends on the particular triangle, e.g. its initial orientation?  As a complete guess, I wonder if it does something like translate the vertices in the z-direction toward the geometric center of the triangle.

$$\frac{\partial f}{\partial x_i} = \frac{\partial A}{\partial x_i}(\cos{\theta}-1)^2 -2 A \cdot (\cos{\theta}-1)\sin{(\theta)}\frac{\partial \theta}{\partial x_i}$$

---

$$\newcommand{\v}{\mathbf}$$

- Let $f(\v{v}) = \frac{1}{2}\left(\langle \v{v}, \v{e}_3\rangle -||\v{v}||\right)^2$ and let $n(\v{a},\v{b},\v{c}) = (\v{b}-\v{a})\times(\v{c}-\v{a}).$ We want to understand the behavior of the function $(f \circ n)(\v{a},\v{b},\v{c}).$
- Geometrically, we can write $f(n) = A \cdot(\cos{\theta_z} - 1)^2$, where $A$ is the area of the triangle and $\theta_z$ is the angle between the normal of the triangle and the z-axis.  This is because the length of the triangle normal $||n||$ is twice the area of the triangle, and because of the definition of cosine in terms of the dot product. 
- Because of the chain rule, the derivative factors: $D(f\circ n) = (Df \circ n) \cdot Dn$. The first term $(Df \circ n)$  depends only (!) on $\v{n}$, i.e. the area of the triangle and its orientation in space. Beyond that, it does not depend on the exact positions of the vertices $\v{a}$, $\v{b}$, $\v{c}$ and so it's invariant to things like translation.  The second term $Dn = [\frac{\partial \v{n}}{\partial \v{a}}, \frac{\partial \v{n}}{\partial \v{b}}, \frac{\partial \v{n}}{\partial \v{c}}]$ depends only on how that area and orientation changes as you move the vertices.
- We can compute each of these terms. The first one is: $$Df\circ n = ||\v{n}||\,(\langle \hat{\v{n}}, \v{e}_3\rangle - 1)(\v{e}_3 - \hat{\v{n}})$$ 
  where $\widehat{\v n} \equiv \v{n}/||\v{n}||$ is the unit vector in the direction of the normal $\v{n}$. Note that this is a vector showing how $f$ changes as you vary each of the components of $\v{n}$. It is proportional to $(\cos{\theta_z}-1)$ and proportional to the area $A$. So, as the orientation of the triangle aligns more closely with $\v{e}_3$, this term vanishes.
- The second term is $Dn = [\frac{\partial \v{n}}{\partial \v{a}}, \frac{\partial \v{n}}{\partial \v{b}}, \frac{\partial \v{n}}{\partial \v{c}}]$ and for simplicity, we can look at just one of the partial derivatives, to see how $\v{n}$ changes when you change one vertex $\v{a}$ of the triangle.  If you calculate it out, you find that:

  $$\frac{\partial \v{n}}{\partial \v{a}} = \frac{\partial}{\partial \v{a}}\left[(\v{b}-\v{a})\times(\v{c}-\v{a})\right] = \v{I} \times (\v{b}-\v{c}) = \begin{bmatrix}\v{e}_1 \times (\v{b}-\v{c})\\\v{e}_2 \times (\v{b}-\v{c})\\\v{e}_3 \times (\v{b}-\v{c})\end{bmatrix}$$

  This is, in full, a 3x3 matrix showing how each of the normal components $n_1$ $n_2$ $n_3$ vary as you change each of the coordinates $a_1$, $a_2$, $a_3$. You get similar expressions for the other derivatives; putting them altogether, you get a 9x3 matrix showing how changing any of the vertices $\v{a}$, $\v{b}$, $\v{c}$ affects the three components of $\v{n}$:

  $$D\v{n} = \begin{bmatrix}\v{I} \times (\v{b}-\v{c})\\ \v{I} \times (\v{c}-\v{a}) \\ \v{I} \times (\v{b}-\v{a})\end{bmatrix}$$

- Now we can compute the full derivative $D(f\circ n) = (Df \circ n)\cdot Dn$.

$$ \begin{align}Df&=||\v{n}||\,(\langle \hat{\v{n}}, \v{e}_3\rangle - 1)(\v{e}_3 - \hat{\v{n}}) \bullet \begin{bmatrix}\v{I} \times (\v{b}-\v{c})\\ \v{I} \times (\v{c}-\v{a}) \\ \v{I} \times (\v{b}-\v{a})\end{bmatrix}\\
&=
||\v{n}||\,(\langle \hat{\v{n}}, \v{e}_3\rangle - 1) \begin{bmatrix}(\v{e}_3 - \hat{\v{n}}) \bullet [\v{I} \times (\v{b}-\v{c})]\\ (\v{e}_3 - \hat{\v{n}}) \bullet[\v{I} \times (\v{c}-\v{a})] \\ (\v{e}_3 - \hat{\v{n}}) \bullet [\v{I} \times (\v{b}-\v{a})]\end{bmatrix}\\
&=
||\v{n}||\,(\langle \hat{\v{n}}, \v{e}_3\rangle - 1) \begin{bmatrix}\v{e}_3 \bullet [\v{I} \times (\v{b}-\v{c})] - \hat{\v{n}} \bullet [\v{I} \times (\v{b}-\v{c})] \\ 
\v{e}_3 \bullet [\v{I} \times (\v{c}-\v{a})] - \hat{\v{n}} \bullet [\v{I} \times (\v{c}-\v{a})] \\
\v{e}_3 \bullet [\v{I} \times (\v{b}-\v{a})] - \hat{\v{n}} \bullet [\v{I} \times (\v{b}-\v{a})] \\
\end{bmatrix}\\
&=
||\v{n}||\,(\langle \hat{\v{n}}, \v{e}_3\rangle - 1) \begin{bmatrix}[\v{e}_3 \times \v{I}] \bullet (\v{b}-\v{c})] + \v{I} \bullet [\hat{\v{n}} \times (\v{b}-\v{c})] \\ 
[\v{e}_3 \times \v{I}] \bullet (\v{c}-\v{a})] + \v{I} \bullet [\hat{\v{n}} \times (\v{c}-\v{a})] \\
[\v{e}_3 \times \v{I}] \bullet (\v{b}-\v{a})] + \v{I} \bullet [\hat{\v{n}} \times (\v{b}-\v{a})] \\
\end{bmatrix}\\
&=
||\v{n}||\,(\langle \hat{\v{n}}, \v{e}_3\rangle - 1) \begin{bmatrix}\v{R}\bullet (\v{b}-\v{c}) +   \hat{\v{n}} \times (\v{b}-\v{c}) \\ 
\v{R} \bullet (\v{c}-\v{a}) + \hat{\v{n}} \times (\v{c}-\v{a})\\
\v{R} \bullet (\v{b}-\v{a})+  \hat{\v{n}} \times (\v{b}-\v{a}) \\
\end{bmatrix}\\
\end{align}$$
  where $\v{R} \equiv \begin{bmatrix}\end{bmatrix}$


<hr/>
- We can figure out the behavior of $f$ qualitatively, too.
- It turns out that $||n||$ is equal to twice the area of the triangle. (This is because the area of a triangle is half the size of the cross product of any two sides.)
- Let $\theta_z$ be the angle between $n$ and the z-axis. Then $\langle n, e_3\rangle = ||n||\, ||e_3|| \cos{(\theta_z)}$ so our expression becomes 
  $$f(x_1,x_2,x_3) = \frac{||n||}{2}\left(\cos{(\theta_z)}-1\right)^2$$
- So if $A$ is the area of the triangle, and $\theta_z$ is the angle between the normal and the z-axis, we have $$f(x_1,x_2,x_3) = A\cdot(\cos(\theta_z)-1)^2.$$
- So, there are two ways to shrink $f$ : by shrinking the area, or by rotating the triangle so that it is perpendicular to the z-axis (making $\cos(\theta_z) = 1)$. Depending on the size of the triangle, gradient ascent will result in some combination of these two changes.
- The minimum value of $f$ occurs when the area of the triangle shrinks to nothing and/or when the triangle is perpendicular to the z-axis. In this case, $f$ is zero.
- I believe Netwon's method is rotating the triangle to make it more perpendicular to the z-axis. It may also be warping the triangle, i.e. moving its vertices to shrink the area of the triangle. I don't know which strategy the gradient favors&mdash;maybe it depends on the particular triangle, e.g. its initial orientation?  As a complete guess, I wonder if it does something like translate the vertices in the z-direction toward the geometric center of the triangle.

$$\frac{\partial f}{\partial x_i} = \frac{\partial A}{\partial x_i}(\cos{\theta}-1)^2 -2 A \cdot (\cos{\theta}-1)\sin{(\theta)}\frac{\partial \theta}{\partial x_i}$$