Most AI tools try to replace your thinking. I built one that doesn't

SuspciousCarrot78@lemmy.world · 4 hours ago

Good point! I’ll look to see if that’s possible (it should be? The blog is hosted on github and Codeberg pages, so will investigate) and will update here when I find out.

SuspciousCarrot78@lemmy.world · edit-2 2 days ago

Yes. And despite that, one in three Australian homes now has rooftop solar.

Renewables supplied over half the national grid in Q4 2025, with roughly 7 GW of new capacity added that year alone. Nearly 200,000 home batteries were installed in the second half of 2025.

One in three new vehicles sold now has some form of electrification, with hybrids leading the shift and petrol sales dropping 10% last year.

Even heavy industry is moving. Australia already operates the world’s largest fully driverless freight rail network - Rio Tinto’s AutoHaul runs 1,700km of heavy-haul trains across the Pilbara, controlled remotely from Perth, straight from the mine to the deep-water port at Cape Lambert.

Battery-electric locomotives are now in trial on those same lines. Electrification is happening at every scale here - rooftop, road, and rail - often despite the politics, not because of it.

SuspciousCarrot78@lemmy.world · edit-2 2 days ago

Some of the downstream processing infrastructure is already there, we just needed one more push. Hopefully this is it.

Time to stop exporting the raw goods (coal, steel, gas, hydrogen, lithium etc) offshore and then buying it back. Time to actually process it here and use it.

I’d like to think Trump is actually doing the world a favour by showing us what a fair weather friend America really is. His doctrine of America first may force the rest of the world to stop depending on America entirely.

SuspciousCarrot78@lemmy.world · 2 days ago

In capital cities, it’s…reasonable. Takes too long to get from A to B, but you can do it, usually.

In regional areas, generally not great.

Australia is heavily car centric for the most part.

SuspciousCarrot78@lemmy.world · edit-2 2 days ago

…

The Wii, Xbox 360 and the PS3 launched 20 years ago.

That’s roughly the window between the NES and the PS3.

#RetroAndProud

SuspciousCarrot78@lemmy.world · edit-2 3 days ago

If you mean the GCCWTV, google patched that out in 2023. You can still ADB into them just fine to ADB push or pull apps or files (after you install adbtools), but you cannot install CFW (like lineageOS) any more.

I think probably the best way to get something like lineageOS on your TV is to get a raspberry pi (or a CM4 module + HDMI board) and flash that. You should be able to pair your preexisting Chromecast remote with it and the interface should be seemless.

https://www.youtube.com/watch?v=fuViZWD0lfI&t=5

SuspciousCarrot78@lemmy.world · 5 days ago

^ I can confirm this. It’s why I refuse to interact on Reddit anymore. Fuck em.

https://lemmy.world/post/41398418/21528414

SuspciousCarrot78@lemmy.world · edit-2 5 days ago

I’m building one. It treats you how you treat it, by classifying tone and content and responding in kind, adapting on the fly / with decay curves. I do it using a local classifier swarm (7 micro, sub LLMs) and a decision tree.

You can set it to IDGAF mode by default. Still useful…just zero fucks given.

(Ignore the “Powered by OpenAi” bit. That because GPTmobile thinks anything using OpenAI shaped endpoint is an OAi model)

https://bobbyllm.github.io/llama-conductor/blog/claude-in-a-can-1/

SuspciousCarrot78@lemmy.world · edit-2 6 days ago

FWIW Extra shit I cooked last night. It’s live now, so deserves a PS: of its own

PPS: I inbuilt as spam blocker as well.

allow-list / deny-list domain filters
DDG-lite junk-domain blocklist
ad/tracker URL rejection
relevance gate before any provenance upgrade

Enjoy :) Blurb below

“But what if it just… Googled it?”

We can do that. But better.

You: Who won best picture at the 97th Academy Awards?

Model: Anora won best picture at the 97th Academy Awards.
See: https://www.wdsu.com/article/2025-oscars-biggest-moments/64003102
Confidence: medium | Source: Web

Without >>web, that same 4B model said “The Fabelmans.” Then when I pushed it, “Cannes Film Festival.” With web retrieval, the router searches the internet, scores every result deterministically (phrase match + token overlap + domain trust), and only accepts evidence that passes a hard threshold. Garbage results get rejected, not served. The model never touches the answer - it’s extracted straight from the evidence.

Retrieval cascade: Cheatsheets → Wiki → Web → Model. Each step fires only if the previous one missed. The model is last resort, not first call. Sound familiar?

I asked it who wrote a paper and it invented “Dr. David J. G. Smith” - a person who does not exist. After wiring DOI/Crossref fallback for academic metadata:

You: Who wrote The Anatomy of a Large-Scale Hypertextual Web Search Engine?

Model: Sergey Brin and Lawrence Page.
See: https://research.google/pubs/the-anatomy-of-a-large-scale-hypertextual-web-search-engine/
Confidence: medium | Source: Web

Deterministic extraction from metadata. No model synthesis.

>>web is provider-agnostic - ships with DuckDuckGo (no API key, no account) and supports Tavily, SearxNG, or your own adapter. Add your own trusted domains in one config line (there are a bunch baked in already, like pubmed). Every answer comes with a See: URL so you can verify with one click. Receipts, not pinky promises. PS: I even cooked in allow-list / deny-list domain filters, junk-domain blocklist and ad/tracker URL rejection so your results don’t get fouled with low quality spam shit.

SuspciousCarrot78@lemmy.world · edit-2 6 days ago

No, actually it’s probably one of the strongest 4Bs that you can run. On par with ChatGPT 4.1 in many benchmarks.

https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507

I use the DavidAU fine tune, which is even a touch better

https://huggingface.co/DavidAU/Qwen3-4B-Hivemind-Instruct-NEO-MAX-Imatrix-GGUF

The two models thing is a router back end switch that reduces hallucinations when using RAG. Separate but extra to the main stuff.

https://codeberg.org/BobbyLLM/llama-conductor/src/branch/main/FAQ.md#what-is-mentats

There are multiple duo / tag team orchestrations like this (eg: the vision model I use is Qwen 3-VL-4B, which does vision stuff and then feeds the output to “thinker” to work with etc).

One of the eventual goals is parallel swarm or model decomposition, with “thinker” acting as the main orchestrator.

The swarm idea is basically: instead of asking one 4B model to do everything at once (understand, retrieve, evaluate, synthesise, check its own work), you decompose the task into tiny (<1B) single-purpose workers -evidence extractors, contradiction detectors, refusal sentinels, a synthesis worker, and an arbiter (current “critic”) that makes the final call. Then the “thinker” uses that info to reason from.

Each worker is small and stupid at exactly one thing, which means it’s auditable and replaceable.

Think of it as breaking the 4B metacognitive ceiling by not asking any single model to be metacognitive.

The deterministic routing backbone stays -workers only handle the ambiguous semantic stuff that can’t be solved with pure Python. It’s not “more models = better” - it’s “right model, right job, fail-loud if they disagree.”

Basically, similar reasoning as to the research I cited in the Mentats section.

PS: when you load it up, you might notice it refers to itself internally as MoA router. That’s pulling double duty. In normal llm circles that means Mixture of Agents. In my world that means “Mixture of Assholes”. See below -

YOU (question) → ROUTER+DOCS (Ah shit, here we go again. I hate my life)

|

ROUTER+DOCS → Asshole 1: SmolLM2-135M (“I’m right”)

|

ROUTER+DOCS → Asshole 2: SmolLM2-360M (“No, I’m right”)

|

ROUTER+DOCS → Asshole 3: Gemma-3-270M (“Idiots, I’m right!”)

|

ROUTER+DOCS → Asshole 4: Qwen3-1.7B (“You’re all beneath me”)

|

ARBITER: Phi-4-mini (“Shut up, all of you.”) ← (all assholes)

|

→ THINKER: Qwen-4B (“I’m surrounded by idiots. Fine, I’ll do it myself.”)

|

ROUTER (please, let me die)

|

YOU (answer + mad cackle)

SuspciousCarrot78@lemmy.world · edit-2 7 days ago

Everything you see - every feature - is everything I use. None of it is ornamental.

But my head is in the code right now, so I don’t “use it” so much as try to break it and then fix it.

The end game is a local, expert system, that I can rely on, automate and audit. Because I built it and know exactly how it works.

If you’re asking for my most common uses for it right now (outside of kicking it and then picking it back up)

sentiment analysis ("what did they mean in this email by…)
document analysis
word etymology (I got the language thing with my ASD)
pilot project (see: https://lemmy.world/comment/22058968)
To-do lists
THINKING (and this is a big one for me: I’ll pose a problem, it will rubber duck it with me)
all the side cars (calculations, currency look ups, weather etc)
drafting ideas and research
shooting the shit when bored (local version of Claude-in-a-can is a bit more advanced then what’s on repo; not stable yet. But when it cooks, fuck me it cooks. Will not push it till it’s 100%).

Basically, all the shit you would ideally like to use an LLM for, but self hosted, private and non-bullshitty. I run on a potato (so don’t really use it for coding very much) but if you have a better rig than mine and can run bigger models - the router is agnostic and it should just work ™.

TLDR;

What I’m building towards: a local expert system that picks its own tools (I coded), executes them (how I taught it to), and gives me a single-line audit receipt for every decision (that I can check if it smells funny). I ask a question, the system decides whether to calculate, look up, search, retrieve from my docs, or reason from scratch - then tells me exactly which path it took and why. Think ChatGPT convenience but with a paper trail you can actually inspect.

And when that’s done…I’m probably stick it in a robot. Because why not? :)

https://github.com/poboisvert/GPTARS_Interstellar

(or tee it up with Home-Assistant)

PS: If you want to know the why behind this whole thing -

https://codeberg.org/BobbyLLM/llama-conductor/src/branch/main/DESIGN.md

PPS: Give me about … 15 mins. I’m just about to push a >>web sidecar. Needs one more tweak to properly parse DOIs / pubmed extraction. I was bored and it’s been on my TO-DO list for too long

PPPS: Those were some Planet Namek 15 minutes…but the deed is done. Enjoy

SuspciousCarrot78@lemmy.world · 8 days ago

You can probably do that right now, actually.

https://www.gsmarena.com/fairphone_5-12540.php + https://postmarketos.org/

or

https://www.ubuntu-touch.io/

SuspciousCarrot78@lemmy.world · edit-2 7 days ago

You’re describing trust dynamics right and that’s exactly why this project doesn’t ask you to trust the model. It asks you to trust observable outputs: provenance labels, deterministic lanes, fail-loud behaviour.

When it fails, you can see exactly which layer failed and why. Then you can fix it yourself. That’s more than you get right now (and in part why LLMs are considered toxic).

The correction mechanism is explicit rather than hoped for (“it learns” or “it earns my trust back”): you encode the fix via cheatsheets, memory, or lane contracts and it sticks permanently.

The model can’t drift back to the wrong answer. That’s not the model earning trust back - it’s you patching the ground truth it reasons from. Progress is measured in artifacts, not vibes.

Until someone makes better AI, that’s all we’ve got. Generally, we don’t get even this much.

Sadly, AI isn’t “one mind learning”; it can’t. So trust is earned by shrinking failure classes and proving it stuck again and again and again (aka making sure the tool does what it should be doing).

Whether that’s satisfying in the way a person earning trust back is satisfying - look, honestly, probably not. But it’s more auditable.

LLMs aren’t people and I’m ok with meeting them where they are.

SuspciousCarrot78@lemmy.world · 8 days ago

Nope.

Source: Model is not pretending otherwise
It is basically “priors lane.” That’s the point of the label: explicit uncertainty, not fake certainty.
Source footer is harness-generated, not model-authored
In this stack, footer normalization happens post-generation in Python. I’ve specifically hardened this because of earlier bleed cases. So the model does not get to self-award Wiki/Docs/Cheatsheets etc.
Model lane is controlled, not roulette

deterministic-first routing where applicable
fail-loud behavior in grounded lanes
provenance downgrade when grounding didn’t actually occur

So yes: Source: Model means “less trustworthy, verify me.” Always do that. Don’t trust the stochastic parrot.

But also no: it’s not equivalent to a silent hallucination system pretending to be grounded. That’s exactly what the provenance layer is there to prevent.

SuspciousCarrot78@lemmy.world · edit-2 8 days ago

Sure.

Source means where the answer was grounded, not whether an LLM wrote the sentence.

Quick split:

Source: Model
No reliable grounding lane fired. It’s model priors.
Source: Context (Contextual)
A deterministic lane fired and built a structured context for the turn (for example state/math carry-forward, bounded prior-turn facts, or a forced context frame), and the answer is expected to come from that frame.

Key clarification:

Not all user input = Context.
User input becomes Context only when it is captured into a bounded deterministic frame/lane and used as grounding.
If user input is just normal chat and no grounding lane fires, that is still Model.

Why this is more deterministic:

The routing decision is deterministic (same input pattern -> same lane).
The frame/evidence injected is deterministic (same extracted values -> same context block).
Wording can vary, but the answer is constrained to that frame.

Concrete example:

User: A Jar has 12 marbles. I remove 3. How many left?
Router hits deterministic state lane, computes 9, injects structured context.
Assistant answers with Source: Context.

If that lane doesn’t fire (or parse fails), it falls back to normal generation and you get Source: Model.

So Context is not “perfect truth”; it means “grounded via deterministic context pipeline, not free priors.”

I hope that clarifies. I can try a different way if not; my brain is inside the code so much sometimes I forget what’s obvious to me really isn’t obvious.

SuspciousCarrot78@lemmy.world · 8 days ago

^ that. Thank you.

SuspciousCarrot78@lemmy.world · edit-2 8 days ago

It’s for everyone to use :)

I get that it’s maybe an acquired taste though.

Steal what you can, make it better, and then I can steal it back.

And thanks for the star!

SuspciousCarrot78@lemmy.world · edit-2 8 days ago

I genuinely don’t know. A small part of llama-conductor is a triple pass RAG system, using Qdrant, but the interesting bit is what sits on top of it. It’s a thinker/critic/thinker pipeline over RAG retrieval.

Step 1 (Thinker): Draft answer using only the retrieved FACTS_BLOCK
Step 2 (Critic): Check for overstatement, constraint violations
Step 3 (Thinker): Fix issues, output structured answer

I built it that way based what the research shows works best to reduce hallucinations

Let’s Verify Step by Step,

Inverse Knowledge Search over Verifiable Reasoning

To be honest, I have been looking at converting to CAG (Cache Augmented Generation) or GAG (Graph Augmented Generation). The issues are - GAG still has hops, and CAG eats VRAM fast. Technically, for a small, curated domain, CAG potentially outperforms RAG (because you eliminate the retrieval lottery entirely). But on a potato that VRAM ceiling arrives fast.

OTOH, for a domain-specific knowledge base like you’re describing, CAG is worth serious evaluation.

Needs more braining on my end.

SuspciousCarrot78@lemmy.world · 8 days ago

You’re welcome. Hope it makes sense. If not, you can marvel at the (many, many) nestled swears in my -commit messages.

SuspciousCarrot78@lemmy.world · 8 days ago

Fair. I should have fed it a better article. OTOH, I’m confident that this quality of synthesis isn’t native to anything under 70B. So, if the tooling can uplift the reasoning ability of a 4B to that level, that’s pretty good in my book.

SuspciousCarrot78@lemmy.world · edit-2 8 days ago

Most AI tools try to replace your thinking. I built one that doesn't

SuspciousCarrot78@lemmy.world · edit-2 8 days ago

I made my LLM stop bullshitting. Nothing leaves your machine.

SuspciousCarrot78@lemmy.world · edit-2 16 days ago

Clanker Adjacent (my blog)

SuspciousCarrot78@lemmy.world · edit-2 2 months ago

I'm tired of LLM bullshitting. So I fixed it.