Mark’s Blog

Convolutional neural network

2025-04-21T00:00:00+00:00

No not the media source. The deep learning model.

$\text{Convolutional neural network (CNN)}$

$\text{Review of convolutions}$

$\text{Convolution vs Cross-Correlation}$

$\text{Edge Detection Example}$

$\text{CNN Structures}$

$\text{Classic CNN Networks}$

$\text{Residual connections and ResNet}$

Convolutions

2025-04-19T00:00:00+00:00

You’ve probably seen a convolution before, whether you know it or not. And they’re super useful in deep learning.

$\text{Convoluti}\!\ast \!\text{ns}$

Let’s say we have two polynomials $P(x)$ and $Q(x)$ what is their product. Now you may recall the FOIL or BOX method from gradeschool. Lets define these polynomials as

\[P(x) = \sum_{i=0}^{m}{a_{i}x^{i}, \:\:\: \text{and}\:\:\: Q(x) = \sum_{j=0}^{n}{b_{j}x^{j}}}\]

Now these definitions may look scary but I encourage you to write down some degree polynomial lets say 2 and expand the summation from 0 to 2 to prove the above to yourself. Let $R(x) = P(x)Q(x)$. What is the summation for $R(x)$? It is:

\[R(x) = \sum_{k=0}^{m+n}{c_kx^k}, \:\:\: \text{where } c_k= \sum_{i=0}^{k}{a_ib_{k-i}}\]

And so it turns out that $c_k$ is a discrete convolution of the coefficients. To see this better we can extended the bounds of the summation to the infinites (we can do this because all the terms outside the original range are zero):

\[c_k= \sum_{i=-\infty}^{\infty}{a_ib_{k-i}}\]

which is exactly the formula for a discrete convolution. You might see it like

\[(a \ast b)[k] := \sum_{i=-\infty}^{\infty} a[i]b[k-i]\]

Some quick notes on notation: the $\ast$ symbol means convolution. The number/variable inside the brackets is the index for the corresponding sequence (Ex: $a_i= a[i]$). And $:=$ is read “defined as”.

The formula for a convolution on continuous functions is the following

\[(f \ast g)(s) := \int_{-\infty}^{\infty} f(x) g(s - x) dx\]

where $f(x)$ is the kernel and $g(x)$ is the function to be convolved. Again this looks pretty similar to the discrete convolution, we switch to integration to catch all those values between the integers (non integers) as it makes it much easier than a summation.

$\text{Applications}$

You can’t talk about convolutions without taking about signal processing as its one of convolutions most known applications (at least before CNNs became popular).

$\text{Signal Processing}$

Before I start I would just like to say I’m not to familiar with signal processing so forgive me for any mistakes I make. So according to Wikipedia signal processing is the analysis of signals like radiowaves, images, sound, and more.

According to Steven W. Smith in his The Scientist and Engineer’s Guide to Digital Signal Processing convolutions are the “single most important technique in Digital Signal Processing” (See I told they were important). The first two algorithms you learn in Digital Signal Processing (DSP) involving convolution are the input and output side algorithms.

Essentially, the input side convolves each input with the impulse response to produce the output signal or $x[n] \ast h[n] = y[n]$. It does this by decomposing the input signal into versions of the input signal and then convolves each with the impulse response and then synthesizes all the versions to get one output signal.

The output side algorithm is the same thing except we go from the output’s perspective (it looks at what inputs we need to get a particular output). If you didn’t understand that don’t worry. If you would like to learn more about convolutions and DSP check out Smith’s book.

$\text{Probability}$

Let’s say we have two dice. Each dice will have a separate probability distribution. If we want to find out what is the probability that both die sum to some number we need to be able to combine the probability distributions. We do this through convolutions (in this case the new combined distribution who show how likely each sum is).

I will give an overview of what this would look like for discrete random variables (SideQuest: Try to find out how convolutions work for continuous random variables), but first some terminology (This is only for review purposes). We say $X$ is a discrete random variable if its values can be counted, otherwise, $X$ is a continuous random variable. The range or support of $X$ (denoted $R_X$)contains all the possible values of $X$ aka $R_X = \{x_1, x_2, x_3, ...\}$. The probability mass function (PMF) of $X$ is the function $p_X(x_k) = p(X=x_k), \:\:\:\text{for} \:\:\: k =1,2,3,...$. And for all countable values ($\forall x \in$ ):

\[\begin{equation} p_X(x) = \left\{ \begin{array}{l l} p(X=x) & \quad \text{if } x \in R_X\\ 0 & \quad \text{otherwise} \end{array} \right. \end{equation}\]

Let $X$ be a discrete random variable with range $R_X$ and PMF $p_X(x)$. Let $Y$ be another discrete random variable with range $R_Y$ and PMF $p_Y(y)$.

The PMF of $Z$ such that $Z = X + Y$

\[\begin{align*} p_Z(z) = \sum_{y=R_Y}{p_Y(y)p_X(z-y)} \\ p_Z(z) = \sum_{x=R_X}{p_X(x)p_Y(z-x)} \end{align*}\]

They both give us the same solution since $Y+X = X+Y$. In the first equation we loop over the possible values of Y and we find the probability that $X = z-y$. The second one is the same but the places of the variables are swapped.

Now about the sidequest I mentioned before make sure to take a look at probability density functions.

$\text{CNNs}$

CNNs are where most people (nowadays) encounter convolutions and there’s a lot to them so I’ll be posting it in a part 2.

$\text{References}$

Overlay image courtesy LearnOpenCV
The scientist and engineer’s guide to digital signal processing Steven W. Smith, ISBN:978-0-9660176-3-2
Taboga, Marco (2021). “Convolutions”, Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. Website Link

All about Integration

2025-04-18T00:00:00+00:00

Integration somehow finds its way everywhere, even after calculus classes. Deep learning, physics, you name it, we integrate it.

All About Integration

While I assume most of you know what integration is, here’s a brief explanation/review. If you want a full explanation of integration, I would recommend 3Blue1Brown’s excellent playlist, the Essence of calculus. For those who want a quick understanding, integration (at least for what we’re doing) is the “area under the curve”. So, integrating some function would give us the area under that function’s curve.

Before continuing, it is recommended that you have seen at least basic integrals and derivatives before and some of their “rules”. Btw, when I introduce ideas in my blog, I’m not going to give the exact assumptions we make and all the definitions unless necessary (like if we’re doing real analysis). By the (second part, aka the Newton–Leibniz theorem,) Fundamental Theorem of Calculus, let us recall that

\[\int_{a}^{b} f(x) dx = \text{F}(b) - \text{F}(a)\]

where $\text{F}(x)$ is the antidervative of $f(x)$. All it’s telling us is that the integral over some interval is just the difference of the antiderivative evaluated at the bounds. An indefinite integral is simply the definite integral (like the one we showed above) but with no bounds. Ex:

\[\int f(x) dx = \text{F}(x) + \text{C}\]

The only real difference is that when we have no bounds, we have to add a “ $+ \text{C}$ “ after evaluating the integral, called the constant of integration. Why, you may ask? Because the derivative of any constant is always zero and so we account for the fact that when we undo the derivative of some function, there may have been some constant in that function.

Taking Antiderivatives

For most calculus students, the transition from taking derivatives to undoing them can be hard. The best way I like to imagine it is undoing the rules we learned for differentiating. For example, the power rule, which states:

\[\frac{d}{dx}x^n = n x ^{n - 1}\]

Undoing the power rule is a simple as going back to the original function, or in this case $x^n$. Essentially, how do we go from $n x^{n-1}$ to $x^n$? We multiply by $\frac{1}{n}$ to get rid of the $n$ and add $1$ to the exponent. So in general:

\[\int x^n dx = \frac{1}{n} x^{n+1} + C\]

Each derivative rule has a subsequent undoing rule. U-substitution can be used to undo chain rules. Integration by parts can be used to undo the quotient rule and many other integrals. There are many ways we can take integrals, and each “technique” we learn is another tool in our toolbox for taking each one apart.

** Side Note ** Many people often forget what $dx$ means in both the integral and the derivative. You can think of it as infinitesimally small changes in x, which means $\frac{dy}{dx}$ is almost $\frac{\text{infinitesimally small changes in y}}{\text{infinitesimally small changes in x}}$.

Improper Integrals

An improper integral is when we attempt to evaluate an integral where the integrand is undefined at one or more points in the interval, or when one or both of the limits of integration are infinite. There are two main types of improper integrals: ones with discontinuities and infinities. With infinities, we take the limit and we “approach” infinity (whether that be positive or negative) when evaluating. Ex:

\[\int_{a}^{\infty} f(x) dx = \lim_{b \to \infty} \int_{a}^{b} f(x) dx\]

We do something similar when one of the bounds is $-\infty$. And if both bounds:

\[\int_{-\infty}^{\infty} f(x) dx = \lim_{a \to -\infty} \int_{a}^{c} f(x) dx + \lim_{b \to \infty} \int_{c}^{b} f(x) dx\]

where $c$ is any chosen number. For discontinuities at the bounds, it’s pretty similar. If there is a discontinuity within the interval of integration (at $e$, for example, where ($a < e < b$), we simply take the improper integral from $a$ to $e$ and then from $e$ to $b$. It’s almost like the above integral, but we choose $c$.

Cool Tricks

The more time you spend integrating, the more shortcuts you will find. Yet beware, too many shortcuts can lead to more mistakes. For example, one cool integration trick I learned from Griffith’s famous Quantum Mechanics textbook is shortcuts when integrating over symmetric intervals. For example, we have an integral over the interval $[-a, a]$. If the function we are integrating is odd (that is $f(-x) = -f(x)$ ) we have:

\[\int_{-a}^{a} f(x) dx = 0\]

Yes, I typed that right, the answer is $0$. Now before you go and Google why, think about what an odd function is (specifically, its end behavior) and why a symmetric interval. For an even function (that is $f(-x) = f(x)$) we have:

\[\int_{-a}^{a} f(x) dx = 2\int_{0}^{a} f(x) dx\]

With those two thought problems, I leave you with. And as they say at integration bees, “Integrate!”.

Extra Credit: explain Tangent half-angle substitution in the comments

References

Overlay image from 3blue1Brown [video]
Single Variable Calculus Early Transcendentals James Stewart, Daniel Clegg and Saleem Watson

Mark’s Blog

Convolutional neural network

\(\text{Convolutional neural network (CNN)}\)

\(\text{Review of convolutions}\)

\(\text{Convolution vs Cross-Correlation}\)

\(\text{Edge Detection Example}\)

\(\text{CNN Structures}\)

\(\text{Classic CNN Networks}\)

\(\text{Residual connections and ResNet}\)

Convolutions

\(\text{Convoluti}\!\ast \!\text{ns}\)

\(\text{Applications}\)

\(\text{Signal Processing}\)

\(\text{Probability}\)