• 4 Posts
  • 49 Comments
Joined 7 months ago
cake
Cake day: August 27th, 2025

help-circle

  • Yes. And despite that, one in three Australian homes now has rooftop solar.

    Renewables supplied over half the national grid in Q4 2025, with roughly 7 GW of new capacity added that year alone. Nearly 200,000 home batteries were installed in the second half of 2025.

    One in three new vehicles sold now has some form of electrification, with hybrids leading the shift and petrol sales dropping 10% last year.

    Even heavy industry is moving. Australia already operates the world’s largest fully driverless freight rail network - Rio Tinto’s AutoHaul runs 1,700km of heavy-haul trains across the Pilbara, controlled remotely from Perth, straight from the mine to the deep-water port at Cape Lambert.

    Battery-electric locomotives are now in trial on those same lines. Electrification is happening at every scale here - rooftop, road, and rail - often despite the politics, not because of it.








  • FWIW Extra shit I cooked last night. It’s live now, so deserves a PS: of its own

    PPS: I inbuilt as spam blocker as well.

    • allow-list / deny-list domain filters
    • DDG-lite junk-domain blocklist
    • ad/tracker URL rejection
    • relevance gate before any provenance upgrade

    Enjoy :) Blurb below

    “But what if it just… Googled it?”

    We can do that. But better.

    You: Who won best picture at the 97th Academy Awards?
    
    Model: Anora won best picture at the 97th Academy Awards.
    See: https://www.wdsu.com/article/2025-oscars-biggest-moments/64003102
    Confidence: medium | Source: Web
    

    Without >>web, that same 4B model said “The Fabelmans.” Then when I pushed it, “Cannes Film Festival.” With web retrieval, the router searches the internet, scores every result deterministically (phrase match + token overlap + domain trust), and only accepts evidence that passes a hard threshold. Garbage results get rejected, not served. The model never touches the answer - it’s extracted straight from the evidence.

    Retrieval cascade: Cheatsheets → Wiki → Web → Model. Each step fires only if the previous one missed. The model is last resort, not first call. Sound familiar?

    I asked it who wrote a paper and it invented “Dr. David J. G. Smith” - a person who does not exist. After wiring DOI/Crossref fallback for academic metadata:

    You: Who wrote The Anatomy of a Large-Scale Hypertextual Web Search Engine?
    
    Model: Sergey Brin and Lawrence Page.
    See: https://research.google/pubs/the-anatomy-of-a-large-scale-hypertextual-web-search-engine/
    Confidence: medium | Source: Web
    

    Deterministic extraction from metadata. No model synthesis.

    >>web is provider-agnostic - ships with DuckDuckGo (no API key, no account) and supports Tavily, SearxNG, or your own adapter. Add your own trusted domains in one config line (there are a bunch baked in already, like pubmed). Every answer comes with a See: URL so you can verify with one click. Receipts, not pinky promises. PS: I even cooked in allow-list / deny-list domain filters, junk-domain blocklist and ad/tracker URL rejection so your results don’t get fouled with low quality spam shit.


  • No, actually it’s probably one of the strongest 4Bs that you can run. On par with ChatGPT 4.1 in many benchmarks.

    https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507

    I use the DavidAU fine tune, which is even a touch better

    https://huggingface.co/DavidAU/Qwen3-4B-Hivemind-Instruct-NEO-MAX-Imatrix-GGUF

    The two models thing is a router back end switch that reduces hallucinations when using RAG. Separate but extra to the main stuff.

    https://codeberg.org/BobbyLLM/llama-conductor/src/branch/main/FAQ.md#what-is-mentats

    There are multiple duo / tag team orchestrations like this (eg: the vision model I use is Qwen 3-VL-4B, which does vision stuff and then feeds the output to “thinker” to work with etc).

    One of the eventual goals is parallel swarm or model decomposition, with “thinker” acting as the main orchestrator.

    The swarm idea is basically: instead of asking one 4B model to do everything at once (understand, retrieve, evaluate, synthesise, check its own work), you decompose the task into tiny (<1B) single-purpose workers -evidence extractors, contradiction detectors, refusal sentinels, a synthesis worker, and an arbiter (current “critic”) that makes the final call. Then the “thinker” uses that info to reason from.

    Each worker is small and stupid at exactly one thing, which means it’s auditable and replaceable.

    Think of it as breaking the 4B metacognitive ceiling by not asking any single model to be metacognitive.

    The deterministic routing backbone stays -workers only handle the ambiguous semantic stuff that can’t be solved with pure Python. It’s not “more models = better” - it’s “right model, right job, fail-loud if they disagree.”

    Basically, similar reasoning as to the research I cited in the Mentats section.

    PS: when you load it up, you might notice it refers to itself internally as MoA router. That’s pulling double duty. In normal llm circles that means Mixture of Agents. In my world that means “Mixture of Assholes”. See below -

    YOU (question) → ROUTER+DOCS (Ah shit, here we go again. I hate my life)

    |

    ROUTER+DOCS → Asshole 1: SmolLM2-135M (“I’m right”)

    |

    ROUTER+DOCS → Asshole 2: SmolLM2-360M (“No, I’m right”)

    |

    ROUTER+DOCS → Asshole 3: Gemma-3-270M (“Idiots, I’m right!”)

    |

    ROUTER+DOCS → Asshole 4: Qwen3-1.7B (“You’re all beneath me”)

    |

    ARBITER: Phi-4-mini (“Shut up, all of you.”) ← (all assholes)

    |

    → THINKER: Qwen-4B (“I’m surrounded by idiots. Fine, I’ll do it myself.”)

    |

    ROUTER (please, let me die)

    |

    YOU (answer + mad cackle)


  • Everything you see - every feature - is everything I use. None of it is ornamental.

    But my head is in the code right now, so I don’t “use it” so much as try to break it and then fix it.

    The end game is a local, expert system, that I can rely on, automate and audit. Because I built it and know exactly how it works.

    If you’re asking for my most common uses for it right now (outside of kicking it and then picking it back up)

    • sentiment analysis ("what did they mean in this email by…)
    • document analysis
    • word etymology (I got the language thing with my ASD)
    • pilot project (see: https://lemmy.world/comment/22058968)
    • To-do lists
    • THINKING (and this is a big one for me: I’ll pose a problem, it will rubber duck it with me)
    • all the side cars (calculations, currency look ups, weather etc)
    • drafting ideas and research
    • shooting the shit when bored (local version of Claude-in-a-can is a bit more advanced then what’s on repo; not stable yet. But when it cooks, fuck me it cooks. Will not push it till it’s 100%).

    Basically, all the shit you would ideally like to use an LLM for, but self hosted, private and non-bullshitty. I run on a potato (so don’t really use it for coding very much) but if you have a better rig than mine and can run bigger models - the router is agnostic and it should just work ™.

    TLDR;

    What I’m building towards: a local expert system that picks its own tools (I coded), executes them (how I taught it to), and gives me a single-line audit receipt for every decision (that I can check if it smells funny). I ask a question, the system decides whether to calculate, look up, search, retrieve from my docs, or reason from scratch - then tells me exactly which path it took and why. Think ChatGPT convenience but with a paper trail you can actually inspect.

    And when that’s done…I’m probably stick it in a robot. Because why not? :)

    https://github.com/poboisvert/GPTARS_Interstellar

    (or tee it up with Home-Assistant)

    PS: If you want to know the why behind this whole thing -

    https://codeberg.org/BobbyLLM/llama-conductor/src/branch/main/DESIGN.md

    PPS: Give me about … 15 mins. I’m just about to push a >>web sidecar. Needs one more tweak to properly parse DOIs / pubmed extraction. I was bored and it’s been on my TO-DO list for too long

    PPPS: Those were some Planet Namek 15 minutes…but the deed is done. Enjoy



  • You’re describing trust dynamics right and that’s exactly why this project doesn’t ask you to trust the model. It asks you to trust observable outputs: provenance labels, deterministic lanes, fail-loud behaviour.

    When it fails, you can see exactly which layer failed and why. Then you can fix it yourself. That’s more than you get right now (and in part why LLMs are considered toxic).

    The correction mechanism is explicit rather than hoped for (“it learns” or “it earns my trust back”): you encode the fix via cheatsheets, memory, or lane contracts and it sticks permanently.

    The model can’t drift back to the wrong answer. That’s not the model earning trust back - it’s you patching the ground truth it reasons from. Progress is measured in artifacts, not vibes.

    Until someone makes better AI, that’s all we’ve got. Generally, we don’t get even this much.

    Sadly, AI isn’t “one mind learning”; it can’t. So trust is earned by shrinking failure classes and proving it stuck again and again and again (aka making sure the tool does what it should be doing).

    Whether that’s satisfying in the way a person earning trust back is satisfying - look, honestly, probably not. But it’s more auditable.

    LLMs aren’t people and I’m ok with meeting them where they are.


  • Nope.

    1. Source: Model is not pretending otherwise
      It is basically “priors lane.” That’s the point of the label: explicit uncertainty, not fake certainty.

    2. Source footer is harness-generated, not model-authored
      In this stack, footer normalization happens post-generation in Python. I’ve specifically hardened this because of earlier bleed cases. So the model does not get to self-award Wiki/Docs/Cheatsheets etc.

    3. Model lane is controlled, not roulette

    • deterministic-first routing where applicable
    • fail-loud behavior in grounded lanes
    • provenance downgrade when grounding didn’t actually occur

    So yes: Source: Model means “less trustworthy, verify me.” Always do that. Don’t trust the stochastic parrot.

    But also no: it’s not equivalent to a silent hallucination system pretending to be grounded. That’s exactly what the provenance layer is there to prevent.


  • Sure.

    Source means where the answer was grounded, not whether an LLM wrote the sentence.

    Quick split:

    • Source: Model
      No reliable grounding lane fired. It’s model priors.

    • Source: Context (Contextual)
      A deterministic lane fired and built a structured context for the turn (for example state/math carry-forward, bounded prior-turn facts, or a forced context frame), and the answer is expected to come from that frame.

    Key clarification:

    • Not all user input = Context.
    • User input becomes Context only when it is captured into a bounded deterministic frame/lane and used as grounding.
    • If user input is just normal chat and no grounding lane fires, that is still Model.

    Why this is more deterministic:

    • The routing decision is deterministic (same input pattern -> same lane).
    • The frame/evidence injected is deterministic (same extracted values -> same context block).
    • Wording can vary, but the answer is constrained to that frame.

    Concrete example:

    1. User: A Jar has 12 marbles. I remove 3. How many left?
    2. Router hits deterministic state lane, computes 9, injects structured context.
    3. Assistant answers with Source: Context.

    If that lane doesn’t fire (or parse fails), it falls back to normal generation and you get Source: Model.

    So Context is not “perfect truth”; it means “grounded via deterministic context pipeline, not free priors.”

    I hope that clarifies. I can try a different way if not; my brain is inside the code so much sometimes I forget what’s obvious to me really isn’t obvious.




  • I genuinely don’t know. A small part of llama-conductor is a triple pass RAG system, using Qdrant, but the interesting bit is what sits on top of it. It’s a thinker/critic/thinker pipeline over RAG retrieval.

    • Step 1 (Thinker): Draft answer using only the retrieved FACTS_BLOCK
    • Step 2 (Critic): Check for overstatement, constraint violations
    • Step 3 (Thinker): Fix issues, output structured answer

    I built it that way based what the research shows works best to reduce hallucinations

    Let’s Verify Step by Step,

    Inverse Knowledge Search over Verifiable Reasoning

    To be honest, I have been looking at converting to CAG (Cache Augmented Generation) or GAG (Graph Augmented Generation). The issues are - GAG still has hops, and CAG eats VRAM fast. Technically, for a small, curated domain, CAG potentially outperforms RAG (because you eliminate the retrieval lottery entirely). But on a potato that VRAM ceiling arrives fast.

    OTOH, for a domain-specific knowledge base like you’re describing, CAG is worth serious evaluation.

    Needs more braining on my end.