CS180 Project 2

Project 2: Fun with Filters and Frequencies!

Overview

In this project, I explored image filtering, edge detection, and multi-resolution blending. The assignment was divided into two main parts:

Part 1: Fun with Filters — convolution, finite differences, and derivative of Gaussian.
Part 2: Fun with Frequencies — sharpening, hybrid images, Gaussian/Laplacian stacks, and multi-resolution blending (the famous Oraple!).

Part 1: Filters and Edges

Part 1.1: Convolutions from Scratch!

In this section, I implemented convolution in three ways: first using four nested loops, then with two nested loops, and finally by comparing against the built-in scipy.signal.convolve2d. All implementations supported zero-padding to handle boundaries and the runtime for the four-loop version is O(H * W * KH * KW), (Height * Width * Kernel Height * Kernel Width), while the runtime for the two-loop is only O(H*W) since it's vectorized.

To test, I created a 9×9 box filter and applied it to a grayscale image of Lebron and Draymond Green. I also convolved the image with finite difference operators D_x and D_y.

Below is my main convolution function, which calls either the two-loop or four-loop implementation:


            def convolve_2D(image, kernel, pad_width = 0, mul_type="two_loop"):

            image = np.array(image, dtype=np.float32)
            kernel = np.array(kernel, dtype=np.float32)
        
            # * Here is the padding part of the code, for loops that insert 0s according to pad_width
            # * numpy doesn't allow array size changes so a new array has to be created
        
            def _pad_image(image, pad_width):
                padded_image = np.full((image.shape[0] + 2 *pad_width, image.shape[1] + 2 * pad_width), 0, dtype=np.float32)
        
                for row in range(image.shape[0]): 
                    for col in range(image.shape[1]):
                        padded_image[row + pad_width][col + pad_width] = image[row][col]
                    
                return padded_image
        
            
        # * _pad_image is a helper function we can pull out of the bag here only if needed
        
            if(pad_width):
                image = _pad_image(image, pad_width)
        
                print(image)
        
        # * After creating the padded image, we can create the holder for the reusult
        # * The formula for the shape is padded_image_dim - kernel_dim + 1 
        
            result = np.full((image.shape[0] - kernel.shape[0] + 1, image.shape[1] - kernel.shape[1] + 1), 0, dtype=np.float32)
        
        
        # * Since this is a convolution, we should flip the kernel
        
            kernel = np.flip(kernel)
        
        # * Now we have both the flipped image and the rotated kernel, so we should be able to multiply and add
        
        
        # * This is a four for-loop convolution implementation
            def element_wise_dot_four(image, kernel):
                for slide_right_num in range(image.shape[0] - kernel.shape[0] + 1):
                    for slide_down_num in range(image.shape[1] - kernel.shape[1] + 1):
                        addition_holder = 0 
                        for row in range(kernel.shape[0]):
                            for col in range(kernel.shape[1]):
                                addition_holder += kernel[row][col] * image[row + slide_right_num][col + slide_down_num]
        
                        result[slide_right_num][slide_down_num] = addition_holder
                        addition_holder = 0
        
                return result
        
        # * This is a two for-loop convolution implementation
        
            def element_wise_dot_two(image, kernel):
                for row in range(result.shape[0]):
                    for col in range(result.shape[1]):
                        window = image[row:row+kernel.shape[0], col:col+kernel.shape[1]]
        
                        result[row][col] = np.sum(window * kernel)
        
                return result
        
        
            
            if mul_type == "two_loop":
                result = element_wise_dot_two(image, kernel)
            else:
                result = element_wise_dot_four(image, kernel)
        
            return result

These are the kernel implementations:



            box_filter = 1/9 * np.full((3,3), 1, dtype=np.float32)

            my_convolved_image = convolve_2D(gray, box_filter, pad_width=0)
            
            builtin_convolved_image = scipy.signal.convolve2d(gray, box_filter, mode="valid")

            '''
            '''
            # * Now I'm going to use these as kernels instead
            
            dx = np.array([[1, 0, -1]])
            dy = np.array([[1],[0],[-1]])
            
            my_dx_image = convolve_2D(gray, dx, pad_width=0)
            
            my_dy_image = convolve_2D(gray, dy, pad_width=0)
            
            builtin_dx_image = scipy.signal.convolve2d(gray, dx, mode="valid")
            builtin_dy_image = scipy.signal.convolve2d(gray, dy, mode="valid")

Below I show some results and comparisons:

Original grayscale — Original image converted to grayscale

Box filter comparisons — Box filter (9×9) comparison

Finite difference comparisons — Finite difference (Dx, Dy) comparison

The results from my implementation matched those from scipy.signal.convolve2d. Runtime was significantly faster using the built-in function, but the custom code helped build intuition for how convolutions operate at the pixel level.

Part 1.2: Finite Difference Operator

In this section, I applied the finite difference operators D_x and D_y to the Cameraman image to compute horizontal and vertical image gradients. These operators highlight edges by capturing intensity changes along the x and y directions. I then combined them to calculate the gradient magnitude, which emphasizes overall edge strength. Finally, I binarized the gradient magnitude using a threshold value (0.34) to produce a clear edge map. This demonstrates how simple derivative filters can be used for edge detection.

Part 1.3: Derivative of Gaussian (DoG) Filter

In this part, I reduced noise by first smoothing the image with a Gaussian filter before applying the finite difference operators. This produced cleaner edge maps compared to the raw derivatives from Part 1.2. I also constructed derivative of Gaussian (DoG) filters by convolving the Gaussian kernel with D_x and D_y. Using these DoG filters directly gives the same result as applying Gaussian smoothing first, then taking derivatives. Compare the DoG Filter Binarized Gradient Magnitute (threshold = 0.14) to the one above and notice that it looks better.

DoG Filter Binarized Gradient Magnitute (threshold = 0.14)

Part 2: Fun with Frequencies

Part 2.1: Image Sharpening

In this part, I implemented image sharpening using the unsharp masking technique. First, I applied a Gaussian filter to the Taj Mahal image to extract its low-frequency components (the blurred version). Subtracting this blurred image from the original left me with the high-frequency details such as edges and fine textures. By adding these high-frequency details back to the original image, I produced a sharpened result that enhances edges and makes the image appear crisper. This demonstrates how unsharp masking works by amplifying high-frequency content while preserving the overall structure of the image.

I similarly applied the procedure below to an image of a house. You can see that it does look sharper, but a little darkly glazed.

Notice here that I also applied the same to an already high-res image of Jalen Brunson. You can see that the result is not nearly as good as the original.

Part 2.2: Hybrid Images

In this part, I created three hybrid images: the classic Derek + Nutmeg pair (as required) and two additional hybrids of my own choosing. For the Derek + Nutmeg hybrid, I carefully walked through the entire pipeline: starting with the original, aligned input images, computing and visualizing their Fourier transforms, generating the low-pass filtered result for one image, and extracting the high-pass component of the other. A key step here was the cutoff frequency choice, which I set by selecting the Gaussian blur parameter (σ = 6 with a 31×31 kernel). This parameter determined what counted as “low-frequency” structure versus “high-frequency” detail. Choosing too small of a cutoff left both images overly sharp and hard to separate, while too large of a cutoff lost important structure. With this setting, the hybrid image clearly looks like Derek up close (high frequencies dominate) but transitions to Nutmeg from afar (low frequencies dominate). I also included the final hybrid image and its frequency spectrum visualization to confirm that the low- and high-frequency information were separated correctly.

For my two additional hybrids, I presented the original input images alongside the final blended results. These demonstrate how the same technique generalizes beyond the canonical example, and highlight how alignment, filter size, and cutoff frequency choices influence the outcome of the hybrid. Across all three examples, this process shows how Gaussian blurs and frequency-domain reasoning can be combined to generate perceptually interesting images that depend on viewing distance.

Hybrid cat image — Hybrid Image (low-pass + high-pass combination)

FFT of hybrid cat image — Frequency Analysis of Hybrid Image (FFT magnitude)

Below, I repeat the hybrid image process using a picture of LeBron and a goat:

Hybrid LeBron + Goat image — Original Lebron

FFT of Hybrid LeBron + Goat image — Original Goat

Below, I repeat the hybrid image process using a picture of a face and a skull:

Now, here's a comprehensive figure including the alignment and everything else. I believe the skull is a weird color because normalization is applied before displaying the hybrid, so it all looks fine in the end. Pretty cool looking though, maybe I should've kept it un-normalized.

Part 2.3: Gaussian and Laplacian Stacks

In this part, I recreated the famous "Oraple" figure by blending an apple and an orange using Gaussian and Laplacian stacks. To do this, I reused and extended some of my code from Project 1, where I had already implemented Gaussian pyramids and convolution from scratch. I modified the functions so they would support RGB images and increased the kernel size to produce smoother results suitable for multi-resolution blending. The process involved building Gaussian stacks for the input images and a binary mask, constructing Laplacian stacks by subtracting adjacent Gaussian levels, and then blending contributions from both images level by level. Finally, I collapsed the blended Laplacian stack to form the hybrid "Oraple."

Using these functions, I was able to recreate figure 3.42 in the Szeliski book

Recreated Oraple Figure — Recreated Figure 3.42 (Oraple) using Gaussian and Laplacian stacks

Part 2.4: Multiresolution Blending (The Oraple!)

In this final part, I extended the multi-resolution blending technique from Part 2.3 by experimenting with different mask shapes. Instead of only using a vertical split mask (square), I also created a circular mask to blend images more naturally around a central region. This allowed me to produce more interesting composite images where the transition between the two inputs is smoother and less obvious, which you can see for the eye-hand image. The implementation reused my Gaussian and Laplacian stack code from earlier parts, but generalized it so that any mask shape can be applied for blending. This demonstrates the flexibility of stack-based blending beyond just the traditional "Oraple" example. However, since we used a stack instead of a pyramid, there's no blending at L0, which makes those details appear somewhat stark, as seen below.

Here's an example of me using my circular mask for a cool hand-eye.

I also wanted to try using the square mask to combine jalen brunson and lebron.

What I Learned

The most important thing I learned was how frequency analysis reveals structure that’s invisible in raw pixel space. From edge detection to hybrid images and blending, frequency decomposition makes image manipulation both efficient and perceptually convincing. Also, the importance of the lowest level of detail during the multiresolution blending section was really interesting. Also, this took pretty long, so I didn't have time to do bells and whistles unfortunately.