<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Mathoholic Systems]]></title><description><![CDATA[Engineering notes, system designs, backend architecture, AI/ML pipelines, and real build logs from my journey.]]></description><link>https://mathoholic.dev</link><image><url>https://cdn.hashnode.com/res/hashnode/image/upload/v1763296964633/96041b75-b9fc-40f6-a82f-9dca9abc407c.png</url><title>Mathoholic Systems</title><link>https://mathoholic.dev</link></image><generator>RSS for Node</generator><lastBuildDate>Mon, 20 Apr 2026 08:29:13 GMT</lastBuildDate><atom:link href="https://mathoholic.dev/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Your Dockerfile Is the Problem]]></title><description><![CDATA[Why Some Builds Are Fast and Others Are Painfully Slow
When people complain “Docker builds are slow,” what they really mean is one thing:

They didn’t design their Dockerfile properly.

Most articles, blog posts, and AI chat responses try to “fix” sl...]]></description><link>https://mathoholic.dev/your-dockerfile-is-the-problem</link><guid isPermaLink="true">https://mathoholic.dev/your-dockerfile-is-the-problem</guid><category><![CDATA[Docker]]></category><category><![CDATA[docker images]]></category><category><![CDATA[Dockerfile]]></category><category><![CDATA[#dockerbuild]]></category><category><![CDATA[Devops]]></category><category><![CDATA[development]]></category><category><![CDATA[deployment]]></category><dc:creator><![CDATA[Shantanu Sharma]]></dc:creator><pubDate>Sat, 20 Dec 2025 19:34:31 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1766259221355/9ccd6cc0-1402-431c-a165-18d8ccb180e1.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-why-some-builds-are-fast-and-others-are-painfully-slow">Why Some Builds Are Fast and Others Are Painfully Slow</h2>
<p>When people complain “Docker builds are slow,” what they really mean is one thing:</p>
<blockquote>
<p><strong>They didn’t design their Dockerfile properly.</strong></p>
</blockquote>
<p>Most articles, blog posts, and AI chat responses try to “fix” slow builds by:</p>
<ul>
<li><p>disabling cache</p>
</li>
<li><p>increasing CI timeouts</p>
</li>
<li><p>using bigger runners</p>
</li>
<li><p>blaming Docker itself</p>
</li>
</ul>
<p>None of these actually fix the underlying issue.</p>
<p>Docker builds are predictable. Just look at the Dockerfile.<br />If your builds are slow or unpredictable, it’s because of <strong>how your Dockerfile is written</strong>.</p>
<p>This article explains what’s really going on, how Docker caching works in real life, and exactly what you must do to fix slow builds once and for all.</p>
<h2 id="heading-docker-builds-arent-magic-theyre-predictable">Docker Builds Aren’t Magic : They’re Predictable</h2>
<p>When you run:</p>
<pre><code class="lang-plaintext">docker build .
</code></pre>
<p>Docker does <strong>three things</strong>:</p>
<ol>
<li><p>Uploads your build context to the daemon (or remote builder)</p>
</li>
<li><p>Walks through your Dockerfile, step by step</p>
</li>
<li><p>For each instruction, checks if it can reuse a cached result</p>
</li>
</ol>
<p>This is not a compiler with AI. It’s a <strong>filesystem snapshot engine</strong>.<br />Each instruction creates a <strong>layer</strong> a snapshot of the filesystem at that point.</p>
<p>These layers are <strong>immutable</strong>. Once a layer is created, it never changes.When you build again, Docker doesn’t “re-run” everything. Instead, it compares each instruction and its inputs with a previously built layer:</p>
<ul>
<li><p>If Docker can prove that <strong>the instruction and its inputs didn’t change</strong>, it will reuse the layer from cache.</p>
</li>
<li><p>If anything changed even something irrelevant ,the cache breaks and Docker re-runs that step and all subsequent ones.</p>
</li>
</ul>
<p>That’s the whole model.</p>
<hr />
<h2 id="heading-the-real-causes-of-slow-docker-builds-and-how-to-fix-them">The Real Causes of Slow Docker Builds (and How to Fix Them)</h2>
<p>Let’s unpack the real reasons builds slow down and how you fix them.</p>
<h2 id="heading-1-bad-layer-ordering-nukes-cache">1. Bad Layer Ordering Nukes Cache</h2>
<p>If you copy everything before installing dependencies, you’ve guaranteed rebuilds on every change.</p>
<h3 id="heading-bad-pattern">Bad Pattern</h3>
<pre><code class="lang-plaintext">FROM node:latest

WORKDIR /app

COPY . .

RUN npm install
RUN npm run build
CMD ["npm", "start"]
</code></pre>
<p><strong>What’s wrong here?</strong></p>
<ul>
<li><p><code>node:latest</code> is non-deterministic you don’t know what you’re building tomorrow ( if new package is available tomorrow it will pull that hence cache HIT got missed…rebuilds that again)</p>
</li>
<li><p><code>COPY . .</code> invalidates cache for <em>everything</em> whenever <em>any</em> file changes</p>
</li>
<li><p>So every tiny source tweak reruns <code>npm install</code></p>
</li>
</ul>
<h3 id="heading-fix-copy-only-what-matters-in-the-order-that-matters">Fix: Copy only what matters, in the order that matters</h3>
<pre><code class="lang-plaintext">FROM node:20-alpine AS builder

WORKDIR /app

COPY package.json package-lock.json ./
RUN npm ci

COPY . .
RUN npm run build

FROM nginx:1.25-alpine
COPY --from=builder /app/dist /usr/share/nginx/html
CMD ["nginx", "-g", "daemon off;"]
</code></pre>
<p><strong>Why this is better</strong></p>
<ul>
<li><p>Pin a stable base (<code>node:20-alpine</code>) —&gt; reproducible builds</p>
</li>
<li><p>Copy only dependency manifests before install —&gt; that layer stays cached as long as your dependencies don’t change</p>
</li>
<li><p>Copy app code later —&gt; changes here don’t invalidate the dependency install</p>
</li>
</ul>
<p>This simple reordering often cuts rebuild time by <strong>70–90%</strong>.</p>
<hr />
<h2 id="heading-2-latest-kills-reproducibility">2. “Latest” Kills Reproducibility</h2>
<p>If your BASE image changes under you, cache semantics become unreliable.</p>
<pre><code class="lang-plaintext">FROM node:latest
</code></pre>
<p>Today’s build not same as tomorrow’s build.</p>
<p>Use <strong>versioned base images</strong> instead:</p>
<pre><code class="lang-plaintext">FROM node:20-alpine
</code></pre>
<p>This fixes:</p>
<ul>
<li><p>reproducibility</p>
</li>
<li><p>downstream debugging</p>
</li>
<li><p>predictable cache</p>
</li>
</ul>
<hr />
<h2 id="heading-3-every-line-creates-a-layer">3. Every Line Creates a Layer</h2>
<p>Dockerfiles are <strong>immutable histories</strong>.<br />Every <code>RUN</code>, <code>COPY</code>, <code>ADD</code> becomes a layer.</p>
<p>If you install something and then delete it in a later line, the data is still in earlier layers and therefore your image is still big.</p>
<h3 id="heading-bad">Bad:</h3>
<pre><code class="lang-plaintext">RUN apt-get update
RUN apt-get install -y build-essential
RUN rm -rf /var/lib/apt/lists/*
</code></pre>
<h3 id="heading-better">Better:</h3>
<pre><code class="lang-plaintext">RUN apt-get update &amp;&amp; \
    apt-get install -y build-essential &amp;&amp; \
    rm -rf /var/lib/apt/lists/*
</code></pre>
<p>Now the package lists don’t survive in a separate layer.</p>
<hr />
<h2 id="heading-4-build-tools-dont-belong-in-runtime-images">4. Build Tools Don’t Belong in Runtime Images</h2>
<p>Build tools are only needed at build time. Shipping them to production is wasteful.</p>
<p>Use <strong>multi-stage builds</strong> intentionally:</p>
<pre><code class="lang-plaintext">FROM golang:1.22 as builder

WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download

COPY . .
RUN go build -o app

FROM gcr.io/distroless/base-debian12
COPY --from=builder /app/app /app
CMD ["/app"]
</code></pre>
<p>Final image includes only:</p>
<ul>
<li><p>the binary</p>
</li>
<li><p>runtime libs</p>
</li>
</ul>
<p>No Go, no compilers, no shells, no build cache.</p>
<p>This:</p>
<ul>
<li><p>cuts image size</p>
</li>
<li><p>reduces attack surface</p>
</li>
<li><p>eliminates unnecessary rebuild work</p>
</li>
</ul>
<hr />
<h2 id="heading-5-large-build-context-slow-upload">5. Large Build Context = Slow Upload</h2>
<p>Docker always uploads your build context to the builder.</p>
<p>If your context includes:</p>
<ul>
<li><p>node_modules</p>
</li>
<li><p>.git</p>
</li>
<li><p>logs</p>
</li>
<li><p>tests, docs, temp files</p>
</li>
</ul>
<p>Then every build has to send megabytes or more over the wire.</p>
<p>Fix this with a <code>.dockerignore</code>:</p>
<pre><code class="lang-plaintext">node_modules
.git
*.log
</code></pre>
<p>Smaller context = faster uploads = faster builds.</p>
<hr />
<h2 id="heading-6-ci-isnt-lying-it-exposes-bad-dockerfiles">6. CI Isn’t Lying: <em>It Exposes Bad Dockerfiles</em></h2>
<p>Locally, you might have warm cache.<br />CI runs on fresh machines.</p>
<p>That means:</p>
<ul>
<li><p>no existing cache</p>
</li>
<li><p>slow cold builds</p>
</li>
<li><p>every package download happens again</p>
</li>
</ul>
<p>If your Dockerfile depends on warm local cache to be fast, you built it wrong.</p>
<p>In CI, you must explicitly:</p>
<ul>
<li><p>export/import cache</p>
</li>
<li><p>use BuildKit with <code>--cache-from</code> / <code>--cache-to</code></p>
</li>
<li><p>or use dedicated layer caching</p>
</li>
</ul>
<p>Otherwise your CI builds always recreate steps that could be cached.</p>
<hr />
<h2 id="heading-7-pin-dependencies-dont-let-them-float">7. Pin Dependencies, Don’t Let Them Float</h2>
<p>Floating dependencies (<code>latest</code>, <code>*</code>, unpinned versions) make builds unpredictable.</p>
<p>Lockfiles (<code>package-lock.json</code>, <code>go.sum</code>, <code>requirements.txt</code>) should only change when you change dependencies, not every code update.</p>
<p>This means:</p>
<ul>
<li><p>cache hits stay valid longer</p>
</li>
<li><p>CI builds stable graphs</p>
</li>
<li><p>debugging is possible</p>
</li>
</ul>
<hr />
<h2 id="heading-a-simple-mental-model">A Simple Mental Model</h2>
<p>Here’s the core truth you should adopt now:</p>
<blockquote>
<p><strong>Docker builds are predictable. Your Dockerfile determines whether they’re fast or slow.</strong></p>
</blockquote>
<p>Treat Dockerfiles as:</p>
<ul>
<li><p>deterministic build graphs</p>
</li>
<li><p>ordered instruction sequences</p>
</li>
<li><p>cache design problems, not scripts</p>
</li>
</ul>
<p>When you write a Dockerfile, ask:</p>
<ul>
<li><p>What changes frequently?</p>
</li>
<li><p>What changes rarely?</p>
</li>
<li><p>What steps can stay cached?</p>
</li>
</ul>
<p>Design around <strong>cache boundaries</strong>, not commands.</p>
<h2 id="heading-checklist-for-faster-docker-builds">Checklist for Faster Docker Builds</h2>
<p>Before shipping or committing a Dockerfile, ensure:</p>
<p>🟠 Base image is pinned<br />🟠 Dependency install layer is early<br />🟠 App code is copied after deps<br />🟠 Build context is minimal<br />🟠 Multi-stage builds separate build &amp; runtime<br />🟠 No unnecessary tools in final image<br />🟠 Lockfiles are present<br />🟠 CI cache is configured</p>
<p>If any of these are missing, your builds aren’t engineered, they’re accidental.</p>
]]></content:encoded></item><item><title><![CDATA[How Containers Actually Work (and Why Most Devs Get It Wrong)]]></title><description><![CDATA[Why this article exists
Most developers think they “know Docker” because they can run:
docker build, docker run, docker-compose up
That’s not understanding. That’s muscle memory. If this is where your Docker knowledge stops, you are operating at carg...]]></description><link>https://mathoholic.dev/how-containers-actually-work-and-why-most-devs-get-it-wrong</link><guid isPermaLink="true">https://mathoholic.dev/how-containers-actually-work-and-why-most-devs-get-it-wrong</guid><category><![CDATA[Docker]]></category><category><![CDATA[containers]]></category><category><![CDATA[Devops]]></category><category><![CDATA[Developer]]></category><dc:creator><![CDATA[Shantanu Sharma]]></dc:creator><pubDate>Sun, 14 Dec 2025 13:30:11 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1765718232929/de2598e7-f5d8-427c-be95-f51353d4c3b8.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-why-this-article-exists">Why this article exists</h2>
<p>Most developers think they “know Docker” because they can run:</p>
<p><code>docker build, docker run, docker-compose up</code></p>
<p>That’s not understanding. That’s muscle memory. If this is where your Docker knowledge stops, you are operating at cargo-cult level:</p>
<ul>
<li><p>You copy commands</p>
</li>
<li><p>You don’t understand consequences</p>
</li>
<li><p>You panic when things break</p>
</li>
</ul>
<p>Docker is <strong>not magic</strong>. Docker is Linux primitives glued together with tooling.</p>
<p>Until you understand what actually happens under the hood, you will:</p>
<ul>
<li><p>Debug blindly in production</p>
</li>
<li><p>Lose data due to bad volume configuration</p>
</li>
<li><p>Break networking and blame Docker</p>
</li>
<li><p>Ship bloated images</p>
</li>
</ul>
<p>This article strips Docker down to its bones.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765711535005/88d05cc1-d306-49f6-9b8a-3c9f1d607a67.jpeg" alt class="image--center mx-auto" /></p>
<hr />
<h2 id="heading-what-docker-really-is-no-marketing-nonsense">What Docker really is (no marketing nonsense)</h2>
<p>Let’s kill the myths first. Docker is <strong>not</strong> a virtual machine replacement, not a deployment platform, and not a magic packaging tool. What Docker actually is much simpler and far more important: it uses <strong>Linux namespaces</strong> to isolate processes, <strong>cgroups</strong> to control resource usage, and <strong>union filesystems</strong> to build layered images, all coordinated by a long-running daemon called <strong>dockerd</strong>. Everything else you interact with — Dockerfiles, the CLI, Docker Compose — is just user experience built on top of these primitives. Docker didn’t invent containers; <strong>Linux did</strong>. Docker’s real contribution was making those low-level Linux features usable for everyday developers.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765712151365/2c0c4397-f963-4d2a-8b48-8017484274cc.jpeg" alt class="image--center mx-auto" /></p>
<hr />
<h2 id="heading-containers-vs-virtual-machines-the-lie-you-were-told">Containers vs Virtual Machines (the lie you were told)</h2>
<p>People often say: “Containers are lightweight VMs.”<br />That sentence has caused more production failures than bugs.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Aspect</td><td>Virtual Machine</td><td>Container</td></tr>
</thead>
<tbody>
<tr>
<td>Kernel</td><td>Separate</td><td>Shared with host</td></tr>
<tr>
<td>Boot time</td><td>Minutes</td><td>Milliseconds</td></tr>
<tr>
<td>Isolation</td><td>Hardware-level</td><td>Process-level</td></tr>
<tr>
<td>Overhead</td><td>Heavy</td><td>Lightweight</td></tr>
<tr>
<td>Security boundary</td><td>Stronger</td><td>Weaker (by design)</td></tr>
</tbody>
</table>
</div><h3 id="heading-the-uncomfortable-truth">The uncomfortable truth</h3>
<p>A container is just a process with constraints. <em>( it means we can allocate resources to it)</em></p>
<p>Same kernel.<br />Same OS.<br />Same host underneath.</p>
<p>If that sentence makes you uncomfortable, good.<br />It means you’re starting to understand Docker properly.</p>
<hr />
<h2 id="heading-the-filesystem-illusion-union-fs-explained-simply">The filesystem illusion (Union FS explained simply)</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765712862316/49fc1217-5f26-46ee-8fed-e624fbcbe059.jpeg" alt class="image--center mx-auto" /></p>
<p>When you pull a Docker image, Docker does not download one big file. It downloads layers.</p>
<p>Typical layers look like:</p>
<ol>
<li><p>Base OS layer</p>
</li>
<li><p>Runtime layer (Python, Node, Go)</p>
</li>
<li><p>Dependency layer</p>
</li>
<li><p>Application code layer</p>
</li>
</ol>
<p>Important facts about these layers:</p>
<ul>
<li><p>They are read-only</p>
</li>
<li><p>They are shared across containers</p>
</li>
<li><p>They are cached aggressively</p>
</li>
</ul>
<h3 id="heading-what-happens-when-a-container-starts">What happens when a container starts?</h3>
<p>Docker:</p>
<ul>
<li><p>Stacks all read-only layers</p>
</li>
<li><p>Adds one thin writable layer on top</p>
</li>
</ul>
<p>All file changes go only to that writable layer.</p>
<p>When the container is deleted:</p>
<ul>
<li><p>The writable layer disappears</p>
</li>
<li><p>Your data disappears with it</p>
</li>
</ul>
<p>That’s why:</p>
<ul>
<li><p>Writing data inside containers is a rookie mistake</p>
</li>
<li><p>Containers are disposable by design</p>
</li>
<li><p>Volumes exist</p>
</li>
</ul>
<hr />
<h2 id="heading-volumes-where-most-systems-go-to-die">Volumes: where most systems go to die</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765714799909/758bac4a-16cb-48f0-829c-51bb6fb0c746.jpeg" alt class="image--center mx-auto" /></p>
<p>Storage is where Docker setups usually collapse. You don’t get many choices—and choosing the wrong one guarantees pain.</p>
<p>There are <strong>three options</strong>.</p>
<ol>
<li><strong>Container filesystem (bad)</strong></li>
</ol>
<ul>
<li><p>Data lives inside the container’s writable layer</p>
</li>
<li><p>Destroy the container → data is gone</p>
</li>
<li><p>Only acceptable for temporary, throwaway files</p>
</li>
</ul>
<p>Use this for anything important and you’ve built a <strong>self-destructing system</strong>.</p>
<ol start="2">
<li><strong>Docker volumes (correct)</strong></li>
</ol>
<ul>
<li><p>Managed by Docker</p>
</li>
<li><p>Independent of the container lifecycle</p>
</li>
<li><p>Portable, predictable, and easy to back up</p>
</li>
</ul>
<p>This is what <strong>production systems are supposed to use</strong>.</p>
<ol start="3">
<li><h4 id="heading-bind-mounts-dangerous">Bind mounts (<strong>dangerous</strong>)</h4>
</li>
</ol>
<ul>
<li><p>Directly map host filesystem paths into containers</p>
</li>
<li><p>Environment-specific and brittle</p>
</li>
<li><p>Easy to break, painful to debug</p>
</li>
</ul>
<p>Great for <strong>local development</strong>.<br />Risky in <strong>production</strong> unless you know exactly what you’re doing.</p>
<h3 id="heading-one-rule-you-must-remember">One rule you must remember</h3>
<blockquote>
<p>App code lives in images.<br />App data lives in volumes.</p>
</blockquote>
<p>Break this rule and production will punish you.</p>
<hr />
<h2 id="heading-networking-why-localhosthttplocalhost-betrays-you">Networking: Why <a target="_blank" href="http://localhost"><code>localhost</code></a> Betrays You</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1765717605626/359e9b50-1935-4f8c-9bee-79e09a222db2.jpeg" alt class="image--center mx-auto" /></p>
<p>Inside a container:</p>
<ul>
<li><p><a target="_blank" href="http://localhost"><code>localhost</code></a> points <strong>only to the container itself</strong></p>
</li>
<li><p>Not the host</p>
</li>
<li><p>Not other containers</p>
</li>
<li><p>Not “where the database runs”</p>
</li>
</ul>
<p>This is where many systems break.</p>
<hr />
<h3 id="heading-what-docker-actually-sets-up">What Docker Actually Sets Up</h3>
<p>Docker doesn’t rely on magic. It creates:</p>
<ul>
<li><p>Virtual network bridges</p>
</li>
<li><p>Virtual network interfaces</p>
</li>
<li><p>An internal DNS server</p>
</li>
</ul>
<p>Every container on the same Docker network gets <strong>automatic service discovery</strong>.</p>
<p>That’s why this works:</p>
<pre><code class="lang-plaintext">db:5432
</code></pre>
<hr />
<h3 id="heading-the-hard-truth">The Hard Truth</h3>
<p>Docker resolves service names through its internal DNS. The moment you hardcode IP addresses or depend on <a target="_blank" href="http://localhost"><code>localhost</code></a> across containers, you’ve brought fragility into the system. It may appear to work today, in your environment, on your machine, but it will fail in production, usually under load or during a redeploy.</p>
<hr />
<h2 id="heading-the-docker-daemon-single-point-of-control">The Docker daemon: single point of control</h2>
<p>Everything in Docker flows through dockerd:</p>
<ul>
<li><p>Building images</p>
</li>
<li><p>Pulling images</p>
</li>
<li><p>Creating networks</p>
</li>
<li><p>Managing volumes</p>
</li>
<li><p>Running containers</p>
</li>
</ul>
<p>If dockerd crashes:</p>
<ul>
<li><p>Containers keep running</p>
</li>
<li><p>You lose control and orchestration</p>
</li>
</ul>
<p>This surprises people. It shouldn’t.</p>
<p>Containers are Linux processes.<br />Docker is just the manager.</p>
<p>This is not a bug.<br />This is how Linux works.</p>
<hr />
<h2 id="heading-what-you-should-take-away">What you should take away</h2>
<p>If you remember only five things:</p>
<ol>
<li><p>Containers are not VMs</p>
</li>
<li><p>Containers are processes</p>
</li>
<li><p>Filesystems are layered illusions</p>
</li>
<li><p>Data inside containers is disposable</p>
</li>
<li><p>Docker is Linux with a nice CLI</p>
</li>
</ol>
<p>Once this clicks:</p>
<ul>
<li><p>Docker stops being scary</p>
</li>
<li><p>Debugging becomes logical</p>
</li>
<li><p>Production failures make sense</p>
</li>
</ul>
<p>Ignore this, and Docker will keep “mysteriously” failing.</p>
<hr />
<h2 id="heading-whats-next">What’s next</h2>
<p>In the next article, we’ll dissect:</p>
<blockquote>
<p>Why 90% of Dockerfiles are inefficient — and how to fix them</p>
</blockquote>
<p>Most Dockerfiles in the wild are <strong>bloated, slow, insecure, and poorly cached</strong>, usually because they are written by copying patterns without understanding how Docker actually builds images. And yes, you are probably doing it wrong and fixing it will immediately make your builds faster, your images smaller, and your systems easier to run in production.</p>
]]></content:encoded></item><item><title><![CDATA[Make Your Python Code Faster with Dictionary Lookups]]></title><description><![CDATA[In the world of programming, efficiency is key. Whether you're a beginner or a seasoned developer, finding ways to optimize your code can make a significant difference, especially when working with large datasets. One powerful tool in Python that oft...]]></description><link>https://mathoholic.dev/pythondictionarylookups01</link><guid isPermaLink="true">https://mathoholic.dev/pythondictionarylookups01</guid><category><![CDATA[Python]]></category><category><![CDATA[dictionary]]></category><category><![CDATA[python beginner]]></category><category><![CDATA[Python 3]]></category><dc:creator><![CDATA[Shantanu Sharma]]></dc:creator><pubDate>Mon, 12 Aug 2024 03:48:38 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1723434128903/ba28b92a-2e1a-4b6b-acd5-831ef7b69b45.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In the world of programming, efficiency is key. Whether you're a beginner or a seasoned developer, finding ways to optimize your code can make a significant difference, especially when working with large datasets. One powerful tool in Python that often goes underutilized is the dictionary lookup. In this post, we'll explore how dictionary lookups work, why they're so efficient, and how you can leverage them to enhance your Python programs.</p>
<h3 id="heading-what-is-a-dictionary-lookup">What is a Dictionary Lookup?</h3>
<p>In Python, a dictionary is a collection of key-value pairs, where each key is unique, and each key maps to a specific value. This data structure is incredibly versatile and allows for fast access to data. A dictionary lookup refers to the process of retrieving a value associated with a specific key in a dictionary.<br />Here is an example:</p>
<pre><code class="lang-python"><span class="hljs-comment"># Creating a dictionary</span>
fruit_colors = {
    <span class="hljs-string">"apple"</span>: <span class="hljs-string">"red"</span>,
    <span class="hljs-string">"banana"</span>: <span class="hljs-string">"yellow"</span>,
    <span class="hljs-string">"cherry"</span>: <span class="hljs-string">"red"</span>,
    <span class="hljs-string">"orange"</span>: <span class="hljs-string">"orange"</span>
}

<span class="hljs-comment"># Performing a dictionary lookup</span>
color_of_apple = fruit_colors[<span class="hljs-string">"apple"</span>]
print(color_of_apple)  <span class="hljs-comment"># Output: red</span>
</code></pre>
<p>In this example, the key <code>"apple"</code> is used to quickly access the value <code>"red"</code>. This operation is extremely fast, and that speed is one of the primary reasons why dictionary lookups are so valuable.</p>
<h3 id="heading-why-are-dictionary-lookups-so-fast">Why Are Dictionary Lookups So Fast?</h3>
<p>The speed of dictionary lookups is primarily due to the underlying data structure: <strong>hash tables</strong>. When you create a dictionary in Python, each key is hashed using a hash function, which generates a unique identifier for that key. This hash is then used to determine where the corresponding value is stored in memory.</p>
<p>Because the hash function allows for direct access to the location of the value, retrieving a value from a dictionary typically takes <strong>O(1) time</strong>, meaning it’s constant time, regardless of the size of the dictionary. This is in stark contrast to other data structures, such as lists, where searching for a value can take <strong>O(n) time</strong> (linear time) because you might need to check each element.</p>
<hr />
<p><strong>When Should You Use Dictionary Lookups?</strong></p>
<p>Dictionary lookups are particularly useful in situations where you need to frequently access data based on a unique identifier. Here are a few scenarios where dictionary lookups can greatly enhance your code:</p>
<ol>
<li><strong>Data Mapping</strong>: When you have a set of unique keys that map to specific values, such as user IDs mapping to user information.</li>
</ol>
<pre><code class="lang-python">user_data = {
    <span class="hljs-string">"user123"</span>: {<span class="hljs-string">"name"</span>: <span class="hljs-string">"Alice"</span>, <span class="hljs-string">"age"</span>: <span class="hljs-number">30</span>},
    <span class="hljs-string">"user456"</span>: {<span class="hljs-string">"name"</span>: <span class="hljs-string">"Bob"</span>, <span class="hljs-string">"age"</span>: <span class="hljs-number">25</span>},
    <span class="hljs-comment"># more users...</span>
}
</code></pre>
<ol start="2">
<li><p><strong>Caching Results</strong>: If you’re performing an expensive computation or accessing a slow resource (like a database or API), you can store the results in a dictionary and reuse them to avoid repeated operations.</p>
<pre><code class="lang-python"> expensive_computation_cache = {}
 <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">expensive_computation</span>(<span class="hljs-params">x</span>):</span>
     <span class="hljs-keyword">if</span> x <span class="hljs-keyword">in</span> expensive_computation_cache:
         <span class="hljs-keyword">return</span> expensive_computation_cache[x]
     <span class="hljs-comment"># Perform the computation</span>
     result = x * x
     expensive_computation_cache[x] = result
     <span class="hljs-keyword">return</span> result
</code></pre>
</li>
<li><p><strong>Fast Membership Testing</strong>: If you need to frequently check if an item exists in a collection, using a dictionary (or a set, which also uses hashing) allows for O(1) membership tests.</p>
<pre><code class="lang-python"> blacklisted_emails = {<span class="hljs-string">"spam@example.com"</span>, <span class="hljs-string">"junk@example.com"</span>}
 <span class="hljs-keyword">if</span> email <span class="hljs-keyword">in</span> blacklisted_emails:
     print(<span class="hljs-string">"This email is blacklisted."</span>)
</code></pre>
</li>
</ol>
<p><strong>Building Lookup Tables from Lists</strong></p>
<p>In some cases, you might start with a list of dictionaries or tuples and need to frequently search for items based on a specific attribute. Instead of iterating through the list every time, you can convert it into a dictionary (often referred to as a lookup table) to speed up your searches.</p>
<p>For example, let's say you have a list of products and you want to quickly find a product by its ID:</p>
<pre><code class="lang-python">products = [
    {<span class="hljs-string">"id"</span>: <span class="hljs-number">101</span>, <span class="hljs-string">"name"</span>: <span class="hljs-string">"Laptop"</span>, <span class="hljs-string">"price"</span>: <span class="hljs-number">799</span>},
    {<span class="hljs-string">"id"</span>: <span class="hljs-number">102</span>, <span class="hljs-string">"name"</span>: <span class="hljs-string">"Tablet"</span>, <span class="hljs-string">"price"</span>: <span class="hljs-number">499</span>},
    {<span class="hljs-string">"id"</span>: <span class="hljs-number">103</span>, <span class="hljs-string">"name"</span>: <span class="hljs-string">"Smartphone"</span>, <span class="hljs-string">"price"</span>: <span class="hljs-number">699</span>},
]

<span class="hljs-comment"># Build a lookup table for products by ID</span>
product_lookup = {product[<span class="hljs-string">"id"</span>]: product <span class="hljs-keyword">for</span> product <span class="hljs-keyword">in</span> products}

<span class="hljs-comment"># Now you can quickly find a product by its ID</span>
product = product_lookup.get(<span class="hljs-number">102</span>)
print(product)  <span class="hljs-comment"># Output: {'id': 102, 'name': 'Tablet', 'price': 499}</span>
</code></pre>
<p>By transforming your list into a dictionary, you transform your search operation from <strong>O(n)</strong> to <strong>O(1)</strong>, greatly improving efficiency.</p>
<hr />
<h4 id="heading-pitfalls-to-watch-out-for"><strong>Pitfalls to Watch Out For</strong></h4>
<p>While dictionary lookups are powerful, there are a few things to be aware of:</p>
<ul>
<li><p><strong>Memory Usage</strong>: Dictionaries are fast, but they can use more memory than lists due to the overhead of storing hash tables. If memory is a constraint, consider this trade-off.</p>
</li>
<li><p><strong>Mutable Keys</strong>: In Python, dictionary keys must be immutable (e.g., strings, numbers, tuples). If you try to use a mutable type (like a list) as a key, you’ll encounter an error.</p>
</li>
<li><p><strong>Collisions</strong>: Although rare, hash collisions can occur, where two different keys produce the same hash value. Python handles this internally, but it’s something to be aware of if you’re working with a large set of keys.</p>
</li>
</ul>
<hr />
<h4 id="heading-conclusion"><strong>Conclusion</strong></h4>
<p>Dictionary lookups are a fundamental tool in Python programming that offers significant performance benefits. By understanding how they work and when to use them, you can write more efficient, scalable, and maintainable code. Whether you're dealing with large datasets, building a cache, or simply trying to speed up your searches, dictionary lookups should be one of your go-to techniques.</p>
<p>So, next time you're faced with a problem that involves frequent data retrieval, consider reaching for a dictionary—you'll be amazed at how much faster your code can run!</p>
<p>#Python #Programming #DataScience #CodingEfficiency #TechTips #Optimization</p>
]]></content:encoded></item><item><title><![CDATA[Beginner's Guide to Machine Learning]]></title><description><![CDATA[This series of articles will make machine learning easier to understand with resources to learn in depth.
What is machine learning?
There are complex definitions available which defines machine learning in more technical and mathematical terms. To st...]]></description><link>https://mathoholic.dev/machine-learning-01</link><guid isPermaLink="true">https://mathoholic.dev/machine-learning-01</guid><category><![CDATA[Machine Learning]]></category><category><![CDATA[Data Science]]></category><dc:creator><![CDATA[Shantanu Sharma]]></dc:creator><pubDate>Sat, 22 Jun 2024 04:16:16 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1719029137736/80a95fcf-7eda-4b86-bbed-b10e6ff8cc9a.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This series of articles will make machine learning easier to understand with resources to learn in depth.</p>
<h3 id="heading-what-is-machine-learning">What is machine learning?</h3>
<p>There are complex definitions available which defines machine learning in more technical and mathematical terms. To start with a simple one we will consider this one :</p>
<blockquote>
<p>Machine Learning is the field of study that gives computer the ability to learn without being explicitly programmed.</p>
<p>~ <a target="_blank" href="https://www.forbes.com/sites/gilpress/2021/05/28/on-thinking-machines-machine-learning-and-how-ai-took-over-statistics/">Arthur Samuel</a>, Computer Scientist</p>
</blockquote>
<p>Samuel's definition of machine learning, distinguishes it from traditional programming. If you are programmer you must know that in programming the rules to deal with input data are definied and on the basis of those rules our program generates output. Machine learning flips this whole paradigm.</p>
<p>In machine learning, data points and their corresponding correct answers (labels) are provided to coomputer. The computer uses this information to learn patterns and rules. So we don't make rules here. These learned rules allows the computer to predict the correct answers for new and unseen data. Machine learning is data driven process.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1719029548992/14370fb3-5314-4247-95d2-b4d8635b0981.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-how-machine-learning-works">How machine learning works?</h3>
<p>As machine learning is data driven process, data of good quality is of great importance to build a efficient machine learning solution. The process of machine learning involves seven broad steps.</p>
<ol>
<li><p><strong>Data Collection:</strong></p>
<p> The step for gathering data is foundation of machine learning process. The quality and quantity of data that we gather directly determines how well or badly the model will work.</p>
</li>
<li><p><strong>Preparing the Data:</strong></p>
<p> This step involves data wrangling to remove duplicates, correct the errors, deal with missing values, data type conversions and so on. We use data visualization techniques, to see relevant relationships between the different attributes and check for outliers. To select the right attributes for the model. The data is randomised so that the order of data do not effect what is learned. Another important step is to divide data into two parts. The larger part (~80%) would be used for training the model and the smaller part is used for the evaluation of trained model's performance.</p>
</li>
<li><p><strong>Choose a model:</strong></p>
<p> Data Scientist around the world have developed models for different purpose and goals. In the existing models out there we select the model which suits our purpose and goal well.</p>
</li>
<li><p><strong>Training:</strong></p>
<p> The step of Training, involves running the model on the prepared dataset, allowing the model to learn from it and make predictions. While the model process the data, it tries to find the hidden pattern.</p>
<p> The training process involves starting with some random values for the model's parameters, say X and Y. The model uses these values to make predictions, which are then compared to the actual correct answers. Based on this comparison, the model adjusts X and Y to improve its predictions. This process repeats, and each cycle of updating is called a training step.</p>
<p> The training process is iterative, meaning the model will continually learn and improve as it processes more data.</p>
</li>
<li><p><strong>Evaluation:</strong></p>
<p> The part of dataset created for evaluation is used to check the model's proficiency. This puts the model in a scenario where it encounters situations that were not part of its training.</p>
</li>
<li><p><strong>Fine-Tuning:</strong></p>
<p> After we have evaluated the model's performance, we can still improve its accuracy by fine-tuning certain parameters. These parameters, which were set implicitly during the initial training, can be adjusted to enhance the model's predictions further. This process of adjusting parameters to optimize the model's performance is known as parameter tuning or hyperparameter tuning.</p>
<p> These hyperparameters might include how fast our model learns (learning rate), the number of layers in a neural network, or the number of groups it makes in clustering alogrithm.</p>
<p> To do this (fine-tuning) we try different combination of these settings and see how well the model performs. The goal is to find the best combination of settings. There are several ways to tune hyperparameters:</p>
<ul>
<li><p><strong>Grid Search:</strong> Test all possible combinations.</p>
</li>
<li><p><strong>Random Search:</strong> Test a random selection of combinations.</p>
</li>
<li><p><strong>Bayesian Optimization:</strong> Use a smart method to predict which combinations will work best.</p>
</li>
</ul>
</li>
<li><p><strong>Deployment &amp; Monitoring:</strong></p>
<p> Once the model is trained and its hyperparameters are optimized, it is integrated into a production environment where it can start making predictions on new, unseen data. Often, models are deployed as part of a web service, accessible via APIs (Application Programming Interfaces) so other applications can send data to the model and receive predictions.</p>
</li>
</ol>
<p>    We continuously monitor the model's performance to ensure it maintains accuracy and efficiency. This involves tracking metrics such as prediction accuracy, response time, and resource usage. The model is re-trained periodically so that it remains up-to-date.</p>
<p>    The final step ensure that model remains useful, reliable, and relevant in its production environment.</p>
<p>Biblography:<br />Doshi, R., &amp; Hiran, K. K. (2021). <a target="_blank" href="https://amzn.to/3RDvVag"><em>Machine Learning</em></a>. Paperback.</p>
]]></content:encoded></item></channel></rss>