AI Capabilities and Limitations — Notes for iOS Devs - Devashree Shukla

Source: Anthropic Academy / Skilljar course — companion to “AI Fluency: Framework & Foundations.” Same caveat as part 1: lesson videos/exercises are gated behind Skilljar login. These notes are built from the public curriculum outline plus Anthropic’s own course description. The four-property structure below is accurate; specific lesson wording is inferred, not quoted.

How this connects to Part 1

Part 1 (the 4D Framework) taught the human side: Delegation, Description, Discernment, Diligence. This course is the mechanical explanation underneath that — it’s the “why” behind the “what.” Specifically:

Weak Delegation decisions often come from not understanding Knowledge and Working Memory limits.
Weak Description (prompting) often comes from not understanding Steerability.
Weak Discernment often comes from not understanding Next Token Prediction — you can’t spot a plausible-sounding wrong answer if you don’t know why the model produces plausible-sounding wrong answers in the first place.

So: this course is basically “read the datasheet before you use the part.”

Setup: Pretraining vs Fine-tuning (in one paragraph)

A base model is trained on huge amounts of text to get good at one thing: predicting the next token. That’s pretraining — it’s where all the raw “knowledge” and pattern-matching ability comes from, and it’s not inherently helpful, safe, or instruction-following. Fine-tuning (RLHF-style training) is the second stage that shapes that raw predictor into something that behaves like a cooperative assistant — follows instructions, refuses certain things, adopts a consistent “voice.” Everything downstream in this course is really about the tension between what pretraining gave the model (raw pattern completion over internet-scale text) and what fine-tuning layered on top (helpfulness, steerability, guardrails).

Why you should care as an iOS dev: this is why the model can write a beautiful, confident SwiftUI view for an API that doesn’t exist. Pretraining taught it what plausible Swift looks like; nothing forces “looks like real Swift” and “compiles against the current SDK” to be the same thing.

The Four Properties

1. Next Token Prediction — where every answer actually comes from

The model isn’t “looking up” the answer to your question. It’s generating the statistically most plausible next piece of text, one token at a time, given everything before it. This is why:

It’s excellent at well-worn patterns: reformatting JSON, converting between naming conventions, writing a standard ViewModel skeleton, summarizing a stack trace.
It can produce fluent, confident nonsense the moment a task pushes outside “plausible continuation” into “requires actually knowing/computing a fact” — hallucinated API names, invented method signatures, made-up Apple documentation quotes, wrong Swift concurrency rules.
Fluency and correctness are two separate axes. A hallucinated URLSession convenience method will read exactly as confidently as a real one — there’s no tell in the tone.

iOS translation: if a task is “reformat/transform/scaffold something structurally similar to a million examples online” (a Codable model, a standard list-detail SwiftUI screen), trust the fluency more. If a task is “tell me the exact current signature of this specific, possibly-recent API,” trust it less — verify against Apple docs or the actual SDK, every time. This is the mechanical reason Discernment (Part 1) exists.

2. Knowledge — what the model actually knows, and why it can be confidently wrong

The model “knows” what appeared in training data — frequently, recently, and consistently. That has direct, practical consequences:

Popular, stable APIs (UIKit fundamentals, Foundation, common Swift stdlib) — deep, reliable knowledge, tons of training examples.
Recent/niche APIs (this year’s WWDC additions, less common frameworks like SwiftData edge cases, RegexBuilder, new Swift 6 concurrency rules) — thin, possibly outdated, or entirely absent knowledge. The model may not “know it doesn’t know” — it’ll pattern-match toward something plausible from older APIs instead.
Knowledge cutoff is a hard wall, not a fuzzy fade — anything genuinely new relative to training data isn’t in there at all, even if the model sounds fine about it.

iOS translation: treat every AI answer about this year’s iOS/Swift features with extra suspicion by default — that’s exactly the zone where training data is thinnest and confident-wrong answers are most likely. Older, foundational APIs are comparatively safe ground.

3. Working Memory — what it’s paying attention to right now, and what falls off the edge

This maps to the context window — everything currently “in view” for the model (your prompt, pasted code, conversation history). Two failure modes:

Too much in the window at once dilutes attention — instructions given early in a huge pasted file can get “forgotten” relative to what’s most recent/salient.
Long, sprawling context (a giant file, a long chat) increases the odds that some part of the answer silently drifts away from an earlier constraint you gave.

iOS translation: if you paste in a 2,000-line ViewController or an entire module and ask for one focused change, don’t be surprised if the model “forgets” a detail you specified at the top, or fixates on the wrong section. Isolate the relevant slice of code, restate key constraints close to the actual ask, and don’t assume a rule stated once at the start of a long session still holds 40 messages later.

4. Steerability — how much control your instructions really give you

Instructions matter, but they’re not absolute — they’re a strong bias on the next-token prediction process, not a hard rule enforced like a compiler. Vague instructions get filled in with whatever’s statistically likely; specific, well-scoped instructions narrow that space a lot but never to zero.

iOS translation: this is the mechanical justification for good Description (Part 1). “Make this SwiftUI view better” is low-steerability — you’ll get a plausible answer, not necessarily the one you wanted. “Refactor this view to extract the list row into its own View conforming to Identifiable, keep the existing @Observable view model untouched, no new dependencies” is high-steerability — you’ve collapsed the plausible-answer space down to close to what you actually need.

When Properties Collide (the course’s closing module)

The real-world insight: these four properties don’t fail one at a time, they fail together, and that’s when things get genuinely confusing to debug. The course’s own framing, applied to iOS:

A long file (Working Memory strain) that also touches a recent API (Knowledge gap) → the model quietly reaches for an older, wrong API pattern because it’s more “plausible” (Next Token Prediction) given weak knowledge, and a vague ask (“clean this up”) gave it no scoping to push back with (Steerability).
This is exactly the situation that produces the worst kind of AI-generated iOS bug: it compiles, it looks idiomatic, it’s subtly using deprecated concurrency patterns, and there’s no single obvious “tell.”

Practical rule of thumb: the more of these four properties are simultaneously under strain (long context + recent API + vague prompt + something hard to verify by eye), the harder you should lean on Discernment before merging anything.

Quick reference: symptom → likely cause

What you’re seeing	Likely underlying property
Confident code using an API that doesn’t exist / wrong signature	Next Token Prediction outrunning Knowledge
Fine on older iOS versions, wrong on the newest APIs/Swift features	Knowledge cutoff
Model “forgot” a constraint from earlier in a long session/paste	Working Memory
Output technically follows your prompt but not what you meant	Steerability — prompt was under-specified
Everything above at once, on a gnarly refactor	Properties colliding — slow down, verify harder

One-paragraph version to remember

The model isn’t recalling facts, it’s predicting plausible next tokens (Next Token Prediction) — which is reliable where training data was dense and unreliable where it was thin (Knowledge), constrained by how much you’ve currently got in view (Working Memory), and only loosely obedient to what you actually asked for (Steerability). Real bugs show up when several of these strain at once — which is exactly when your Discernment (Part 1) needs to be sharpest.