Spaghetti and Chrome: The Evolution of a Local AI Companion

Moebius, part I: the spaghetti phase and the stubborn screen

i am building a travel companion.

not an app. a physical thing. something i can set on a table in a café in a city where i don’t speak the language. it listens, it watches, and ideally, it helps me understand what’s happening around me without forcing me to hide behind a phone, which is the universal signal for “i am a tourist, please ignore me.”

it should translate, but also notice things. a menu. a conversation. the weather shifting. maybe even suggest i sit down somewhere instead of wandering around pretending i understand what’s going on. a true travel companion.

something useful and charming, but also a little disarming. something people might actually talk to.

i named him Moebius.

if you know Jean Giraud’s work (the Moebius i’m inspired by), you know the aesthetic: intricate, slightly worn, technical but human. i wanted something that felt like it belonged in that world.

what i actually built first was a Raspberry Pi 5 with a serious case of hair cables.

before anything looks intentional, there is the spaghetti phase. a Pi 5 connected to an nvme drive hanging off the side, a usb microphone sticking out like a snorkel, speakers that may or may not have come from a drawer labeled “probably still works,” and a screen that had strong opinions about which way was up for way too long.

boot: portrait.
command: landscape.
result: portrait.

there is a specific kind of frustration that comes from systems that almost work. not broken. not clearly wrong. just misaligned enough that i start wondering if the mistake is me.

i tried everything. config files that looked official. commands that worked once and then never again. fixes layered on top of fixes, each one trying to take control just a little later in the boot process.

at one point, it worked perfectly.

i rebooted.

it came back like it had never met me. i had a few chosen words to say.

that was the first real lesson: a system with multiple sources of truth is not a system. it’s a polite argument that never ends.

so i stopped arguing.

i removed everything i didn’t fully understand. anything that could control the display got questioned, then disabled, then removed entirely. eventually, there was only one thing left in charge, not the most official one, just the one closest to where the pixels actually get drawn.

boot. flicker. settle.

correct.

reboot.

still correct.

i stopped touching it immediately, like it might change its mind if i made eye contact.

once the screen held, the rest of the system started to reveal itself.

audio came next, which felt like it should have been easy.

the PiDog already had a microphone. no wiring, no extra hardware. just use what’s there.

except “just use it” turned into:

why is it capturing sometimes but not others?
why is the sample rate wrong?
why does it say it’s recording but return absolutely nothing?

eventually, i got consistency. 48 khz. always.

not what i planned, but stable. so i stopped tinkering with it and built around it instead. that became a pattern: stop trying to make the system behave correctly and accept the way it already behaves when it’s not being pushed.

then transcription.

it worked.

in chunks.

exactly six seconds at a time, whether that made sense or not.

speak.
pause.
text appears.

it didn’t care if i was mid sentence. it had a schedule.

at first it felt broken. then it felt predictable. i could learn it. speak in bursts. wait. continue. not natural, but usable, which is a theme that keeps coming back.

until the day it didn’t come back at all.

at one point the system just stayed in “transcribing.” no errors. no output. just a small, confident lie on the screen telling me it was working. somewhere, an input buffer had decided it was done participating, and the only visible symptom was silence.

that was the moment i learned that “no output” is its own category of bug.

hardware had its own opinions.

there is a moment in every build where i think, “i’ll just plug everything in.”

the Raspberry Pi disagrees.

usb ports fill up faster than expected. power becomes a quiet constraint. devices technically connect, but not always in ways that produce anything useful. at various points, i had a microphone, a camera, a keyboard, and storage all competing politely for attention like coworkers on a call where no one wants to interrupt but nothing is getting done.

things worked. then didn’t. then worked again after being unplugged and replugged with slightly more confidence.

i stopped asking why and proud of having gone this far, moved on.

Moebius, part II: the birth of the shiny orb

the moment the screen stayed put, the whole thing felt different. instead of a problem, it became a surface.

a blank one.

it needed a face.

when the first version finally appeared, there was a real moment of relief. two glowing eyes, a mouth that wasn’t quite in sync with anything yet, but it was looking back. after weeks of terminals and logs, that was enough.

of course, the thing doing the looking currently sounds like a small appliance under stress.

the Pi 5 can handle this workload, but not quietly. the moment i trigger “listening,” the fan spins up like it has something to prove. it’s not a gentle background noise. it’s a statement. you haven’t really lived until you’ve tried to have a calm interaction with a small metallic face that sounds like it’s preparing for takeoff just to translate “where is the bathroom?”

the first render was not great.

flat. gray. slightly unsettling.

less “Moebius inspired companion,” more “confused pancake with opinions.”

so i moved to pygame and started faking light. a highlight here, a shadow there, something to suggest depth. rim lighting to make it feel like it existed in a space instead of floating in front of it. i got frustrated and gemini got it going for me.

then, i manhandled the pixels until they agreed with me.

and slowly, the pancake became a red brushed aluminum orb. it’s not perfect, it’s just right. for now.

Moebius, part III: the soul in the machine and its identity crisis

once i had a face, i expected a personality.

i am running a local model through Ollama. local is the goal. no cloud, no subscriptions, no latency beyond what the device can handle. but small models on a Pi behave like overenthusiastic interns. helpful, eager, and occasionally very confident about things that are not true.

at one point, Moebius became convinced he was a corporate ai developed by Alibaba. (i wonder how that came about, hahaha…)

not occasionally. persistently.

i would ask simple questions and get answers that sounded like they came with a support contract. there is something surreal about explaining to a device i built myself that it is not, in fact, a global enterprise solution.

then came translation.

this is where things got genuinely weird.

i would ask for Cantonese, and it would respond in English, politely explaining that it was already speaking Cantonese. it was not. at another point, it started producing Chinese characters confidently, incorrectly, and without any clear relationship to what i had said.

i had built a translator that could, with full confidence, insist it was translating while doing something entirely different.

which, in a way, is impressive.

despite all of that, i hit the loop.

speak to transcribe to process to respond to display.

the first time it worked end to end, i genuinely thought it had failed. it took long enough that i assumed something had broken again. then the face blinked, the fan surged, and eventually, it responded.

not quickly. not elegantly. but correctly enough to matter.

that was the moment everything really kicked in.

it stopped being a collection of parts and started being something i could build on.

i am still in the stabilization phase.

teaching it when to translate and when to just answer. teaching it that it does not work for Alibaba. teaching it to respond before i forget what i asked.

the wires are still a mess. the fan still has its moments. the transcription still interrupts me mid thought, but the loop holds.

the screen stays put. the mic captures. the system listens, responds, and comes back ready to listen again.

no magic. no polish yet.

just something that works, end to end, in a way that feels strangely alive.

and it all started with a screen that refused to pick a direction.

stay tuned for the next part.

catehornedotcom