Technical Notes from Ceshine Lee

Practical deep dives into machine learning and data science. Expect code, critical analysis, and real-world tips. No fluff.

[Notes] MaxViT: Multi-Axis Vision Transformer

Photo Credit MaxViT: Multi-Axis Vision Transformer(1) is a paper jointly produced by Google Research and University of Texas at Austin in 2022. The paper proposes a new attention model, named multi-axis attention, which comprises a blocked local and a dilated global attention module. In addition, the paper introduces MaxViT architecture that combines multi-axis attentions with convolutions, which is highly effective in ImageNet benchmarks and downstream tasks. Multi-Axis Attention Source: [2] ...

[Notes] PolyLoss: A Polynomial Expansion Perspective of Classification Loss Functions

Photo Credit Introduction Recall that an one-dimensional Taylor series is an expansion of a real function $f (x)$ about a point $x = a$ [2]: $f (x) = f (a) + f^{'} (a) (x - a) + \frac{f^{″} (a)}{2!} (x - a)^{2} + . . + \frac{f^{n} (a)}{n!} (x - a)^{n} + . . .$ We can approximate the cross-entropy loss using the Taylor series (a.k.a. Taylor expansion) using $a = 1$ : $f (x) = - l o g (x) = 0 + (- 1) (1)^{- 1} (x - 1) + (- 1)^{2} (1)^{- 2} \frac{(x - 1)^{2}}{2} + . . . = \sum_{j = 1}^{\infty} (- 1)^{j} \frac{(j - 1)!}{j!} (x - 1)^{j} = \sum_{j = 1}^{\infty} \frac{(1 - x)^{j}}{j}$ We can get the expansion for the focal loss simply by multiplying the cross-entropy loss series by $(1 - x)^{γ}$ : ...

[Notes] Understanding Visual Attention Network

credit Introduction At the start of 2022, we have a new pure convolution architecture (ConvNext)[1] that challenges the transformer architectures as a generic vision backbone. The new Visual Attention Network (VAN)[2] is yet another pure and simplistic convolution architecture that its creators claim to have achieved SOTA results with fewer parameters. Source: [2] What ConvNext tries to achieve is modernizing a standard ConvNet (ResNet) without introducing any attention-based modules. VAN still has attention-based modules, but the attention weights are obtained from a large kernel convolution instead of a self-attention block. To overcome the high computation costs brought by a large kernel convolution, it is decomposed into three components: a spatial local convolution (depth-wise convolution), a spatial long-range convolution (depth-wise dilation convolution), and a channel convolution (1x1 point-wise convolution). ...

[Notes] Understanding ConvNeXt

credit Introduction Hierarchical Transformers (e.g., Swin Transformers[1]) has made Transformers highly competitive as a generic vision backbone and in a wide variety of vision tasks. A new paper from Facebook AI Research — “A ConvNet for the 2020s”[2] — gradually and systematically “modernizes” a standard ResNet[3] toward the design of a vision Transformer. The result is a family of pure ConvNet models dubbed ConvNeXt that compete favorably with Transformers in terms of accuracy and scalability. ...

Use MPIRE to Parallelize PostgreSQL Queries

Photo Credit Introduction Parallel programming is hard, and you probably should not use any low-level API to do it in most cases (I’d argue that Python’s built-in multiprocessing package is low-level). I’ve been using Joblib’s Parallel class for tasks that are embarrassingly parallel and it works wonderfully. However, sometimes the task at hand is not simple enough for the Parallel class (e.g., you need to share something from the main process that is not pickle-able, or you want to maintain states in each child process). I’ve recently found this library — MPIRE (MultiProcessing Is Really Easy) — that significantly mitigates this problem of not having enough flexibility, while still having a high-level and user-friendly API. ...