Share on

Red Hat npm breach, hands-on LLM engineering, agent-ready web, and practical tooling

NEWSLETTER

Beyond the Build • June 01, 2026

XX minutes of reading

NEWSLETTER | Amplifi Labs

Supply‑chain attack hits @redhat-cloud-services npm packages with malicious versions

Around the web • June 1, 2026

Maintainers report multiple @redhat-cloud-services npm packages were published with malicious code, affecting specific versions across projects like chrome (2.3.1/2.3.2/2.3.4), compliance-client (4.0.3/4.0.4/4.0.6), rbac-client (9.0.3/9.0.4/9.0.6), and others. If you consume this scope, immediately audit lockfiles and CI caches for the listed versions, roll back or upgrade to known‑good releases, rotate potentially exposed credentials, and redeploy clean builds.

Read Full Article →

AI engineering and efficient inference

Stanford course teaches end-to-end LLMs with Triton and scaling

Around the web •June 1, 2026

Stanford’s CS336 offers a public, implementation-heavy path to build language models from scratch, spanning data pipelines, Transformer internals, distributed training, scaling laws, and alignment. Assignments include implementing tokenizers/optimizers, writing a Triton-based FlashAttention2 kernel, preprocessing Common Crawl, and applying SFT/RL (with optional DPO safety alignment), with lectures available on YouTube. The page also lists current B200 GPU cloud prices (e.g., Modal $6.25/hr, RunPod $4.99/hr), providing practical cost guidance for hands-on experiments.

Read Full Article →

Run Gemma 4's 26B MoE at Speed on 2016 Xeon

Around the web •June 1, 2026

A decade-old, GPU-less Xeon E5-2620 v4 with 128 GB DDR3 can run Gemma 4 26B-A4B at reading speed by exploiting CPU-centric optimizations in ik_llama.cpp. The setup combines MTP-driven speculative decoding with a tiny drafter, CPU-tuned MoE routing and fused kernels, runtime repack plus mlock, Flash Attention on CPU, and MLA to shrink the KV cache, yielding an 82 GB RAM footprint (25 GB weights + ~56 GB KV at 262K context). For developers, it shows that mastering the inference engine and memory hierarchy can beat the memory wall and avoid black-box defaults, enabling capable local LLMs on inexpensive homelab hardware.

Read Full Article →

Web and product experience in the agent era

Prepare Your Website for AI Agents with Accessibility and Structure

UX Design •June 1, 2026

AI agents are starting to browse and act on the web, and sites without semantic HTML, strong accessibility, and structured data risk being ignored or misinterpreted. Prioritize accessible, machine-readable design—correct headings and landmarks, alt text, ARIA, schema.org markup, clean sitemaps/URLs, and predictable flows—so agents and screen readers can navigate, extract facts, and complete tasks. Teams that expose data and actions clearly (including APIs and sensible bot governance) will be better positioned as autonomous agents become standard clients.

Read Full Article →

Use RAS Trend and Velocity to Prioritize UX Investment

Nielsen Norman Group •May 29, 2026

RAS—a weighted measure of how much research recommendation value reaches users (3/2/1 points for high/medium/low; 0.66 for committed)—shifts leaders from activity metrics to outcomes. Pair the six‑month RAS rolling‑average trend with recommendation velocity (avg/month over last 3 months; 0–1=STOP, 2–3=CAUTION, 3+=CONTINUE) to produce a defensible STOP/CAUTION/CONTINUE investment signal for resourcing. The framework also guides staffing and coaching—assign influence‑oriented researchers to teams with rising but middling RAS, place juniors on healthy partnerships, and reserve high‑RAS teams for senior, high‑impact work.

Read Full Article →

Engineering tools, languages, and platforms

Radxa Dragon Q8B: Laptop-class Snapdragon SBC with dual 2.5GbE

Around the web •June 1, 2026

Radxa’s Dragon Q8B drops Qualcomm’s Snapdragon 8cx Gen 3 into an SBC, delivering strong single‑core gains (~27% in Geekbench 6) and solid efficiency, plus dual 2.5GbE, two USB‑C 3.2 Gen2 with DP Alt Mode, dual M.2 (M‑Key), and UFS. SKUs span 4–32GB ($149–$569), positioning it for high‑throughput edge workloads (routers/NAS), compact desktops, and light gaming; vkmark shows notable GPU wins. Software is early: Ubuntu 26/Debian 13 builds are evolving (Ethernet in Radxa OS currently broken, BIOS USB disabled, UFS not detected), while Armbian/Arch and Windows on Arm work—so early adopters should expect rough edges and watch for driver updates.

Read Full Article →

Trace Go HTTP Client Latency with net/http/httptrace Hooks

Around the web •June 1, 2026

A hands-on guide shows how to use Go’s net/http/httptrace (in stdlib since Go 1.7) by attaching a ClientTrace to context to capture DNS, connect, TLS, first-byte, and write events for outgoing requests. It includes a curl-style CLI for detailed timings and a reusable RoundTripper that logs DNS/TCP/TLS/TTFB/total durations and connection reuse, with tips on composing traces and measuring full body transfer by wrapping Response.Body.

Read Full Article →

New Blorp language targets near-C speed with explicit effects

Around the web •June 1, 2026

Blorp is a statically typed language that compiles to C, combining purity tracking and explicit effects with typed failure (Option/Result), value semantics (ARC/COW), structured concurrency, and compile-time bounds. Project benchmarks on an M4 MacBook Air show performance near C on several workloads—often ahead of Go and far ahead of Python, though slower on some tasks—while HM-style inference and checked imports aim to keep failure and performance characteristics explicit and reviewable.

Read Full Article →

‍