Hackworth

Hackworth@piefed.ca · 2 days ago

The paper is more rigorous with language but can be a slog.

Hackworth@piefed.ca · 2 days ago

Hackworth@piefed.ca · edit-2 2 days ago

Anthropic has some similar findings, and they propose an architectural change (activation capping) that apparently helps keep the Assistant character away from dark traits (sometimes). But it hasn’t been implemented in any models, I assume because of the cost of scaling it up.

Hackworth@piefed.ca · 2 days ago

Hackworth@piefed.ca · 2 days ago

Hush Spain, or you don’t get Catalonia either.

Hackworth@piefed.ca · 5 days ago

I’d rather play a re-launched Vanguard: SoH.

Hackworth@piefed.ca · 6 days ago

LLMs don’t read.

Hackworth@piefed.ca · 6 days ago

This is probably role play, per the persona selection model, but there’s a lot of interesting research into the hidden “thoughts” of LLMs. Check out Neuronopedia and the Opus model cards for some great examples.

Tracing the thoughts of an LLM

Signs of introspection in LLMs

Hackworth@piefed.ca · 11 days ago

Would you like your tax return in tokens?