It's 6:30 AM on an ordinary Tuesday. It's still dark outside, the house is silent, and I'm still dreaming. But on the Mac Studio in the corner of my desk, something has just awakened: the scheduler launched the pipeline, and a tiny invisible newsroom has started its shift.

In the following minutes, the "magic" happens. Hundreds of sources are combed through, noise is discarded, genuinely important news is selected and ranked. One AI agent reads them and decides which ones deserve to be told. Another writes them — in both Italian and English — with cited sources and historical context. A third checks for errors. Then come the cover images — a real photo (with credit) when available, a generated image when needed — and finally, publication: website, Telegram, LinkedIn, and WhatsApp.

By the time I sit at my desk with coffee in hand, the articles are already online. I did nothing.

The first time I witnessed a full run without a single manual intervention, I felt the same sensation as when a very long line of dominoes falls exactly as planned: it's not magic, it's the result of weeks of attempts, failures, corrections, and small but steady improvements.

The result is already available online, for everyone to see: https://www.alextech.ai/news/ .

This article is the story of how I got there. It's not a tutorial, and you won't find parameters, prompts, or ready-made recipes: it's the story of the choices I made, the compromises I accepted, and the mistakes that taught me the most. The human is still in the system, but wearing the hat of an editor-in-chief, not a reporter. Everything else is handled by two language models, one embedding model, and two image generation pipelines, orchestrated by a multi-agent architecture that reproduces the entire editorial workflow: source research, selection, writing, correction, cover images, and publication.

Of course, the system isn't perfect: every now and then a grammatical slip gets through, some cover images are off-topic, and inevitably, it lacks that touch of intuition, empathy, and sensitivity that only a real person can bring. Nevertheless, in its own small way, this experiment redefines the rules of the game. It proves that today, with very limited resources, you can explode your productivity to previously unimaginable levels, turning your computer into a 24/7 content factory.

The results are certainly improvable and lack creative spark, but they are absolutely usable and useful. And we're talking about results achieved with open-source LLM models running locally, on my trusty Mac Studio M2 Ultra with 64GB of RAM: a machine that, while not the latest, is still very respectable, and before the recent price surge, was affordable on the used market even for a hobbyist. But I want to emphasize again that all models used in this project are free, with just a few tens of billions of parameters — imagine what could be achieved with a commercial model on the level of the recent Claude Fable 5.

One final note: I know the online news world inside out, and I can say I'm among its pioneers. It all started in 1995, when I founded the university newsletter Mondo Bit. A couple of years later, I had the privilege of joining the editorial staff of what was then Punto Informatico, starting as a simple writer and eventually becoming a minority partner alongside the two founders, brothers Andrea and Paolo De Andreis. It's this field experience that guided my crucial choices in the project, first and foremost in writing the prompts for the LLM models.

1. Why Build an AI Newsroom Locally

The obvious question: why not simply use APIs from OpenAI, Anthropic, or Google?

Cost. An automated newsroom capable of generating dozens of articles per day requires processing millions of tokens. Relying on commercial provider APIs, the monthly bill would quickly reach unsustainable figures for an independent project. Running locally, on the other hand, drives the marginal cost to zero: once the hardware investment is amortized, each article costs only in terms of electricity (and this is where Apple Silicon architecture truly makes a difference: low power consumption, no fan noise, stable system even under heavy loads).

Latency and control. When you need a continuous feedback loop between selection and writing, going through the internet introduces friction. Locally, communication between models is direct, and throughput is limited only by the GPU.

Independence. No quotas, no rate limits, no model changes decided by the provider, no doubts about content privacy. The data never leaves my Mac.


2. The Architecture: Three Layers, Three Languages

The system is layered across three tiers, each using the right language for its task.

The frontend is a React dashboard with Vite and TailwindCSS. Two side-by-side panels — Articles and News — with ratings, inline editing, cover regeneration, and one-click publishing. News can be "pinned" to survive automatic filters.

The Rust layer (Tauri v2) is the operational brain. It launches the Python daemon, monitors its health, and manages the scheduler — configurable to start the editorial pipeline at multiple times during the day.

The Python daemon (FastAPI) exposes an internal REST API and orchestrates the specialized agents. The database is SQLite: for a single-machine system handling a few thousand articles, it's perfect — zero configuration, and backup is a simple file copy.

It goes without saying that I built the project through vibe coding, fully leveraging all the AI models available to me through free credits provided by coding agents like OpenCode, Google Antigravity, Kilo Code, and some paid API access. In general, I never used expensive frontier models like ChatGPT and Claude Code, relying mostly on more affordable or freely available models such as MiniMAX, GLM, Kimi, and most recently DeepSeek v4 Flash, which I consider the absolute best in its class for price-performance ratio.


3. The Arsenal: The Models, and Why These Ones

I spent more time here than on everything else combined. The models running today are the result of weeks of trials, errors, and improvised benchmarks. The journey is worth telling, because every choice stems from a specific hardware constraint.

The Constraint: Mac Studio M2 Ultra, 64 GB Unified Memory

Everything runs on a Mac Studio M2 Ultra with 64 GB of unified memory. Unified memory is the key: on Apple Silicon, CPU and GPU share the same RAM pool, so "VRAM" equals system RAM. This is a huge advantage over a discrete 24 GB GPU, but sets an unbreakable ceiling.

In those 64 GB must fit the operating system, the IDE, the browser, the daemon, plus one or two models at 20–30 GB each. A dense 70B+ LLM, even quantized, is out of the question: it would occupy almost the entire machine by itself, leaving no room for context or a second rotating model.

This constraint determined every subsequent choice.

What I Tried (and Why It Didn't Work)

Large, dense models. Llama 3.3 70B (or Llama 4 Behemoth/Maverick), Qwen 3.7-Max, Mixtral 8x22B (or Mistral Large): very high quality in English, but they drop off in Italian, and they devour the memory budget.

Small, generic models. Phi-3, Llama 4 Scout (or Llama 3.2 3B), Gemma 4 12B, Qwen 2.5 7B: no memory issues, but Italian is often mediocre and they struggle with complex instructions. For news selection — which requires structured output, taxonomies, and reasoning — and for writing — which must follow a multi-line editorial decalogue — you need models that truly "understand" the task.

A single model for everything. I tried using the same model for every role. Result: none excels everywhere. Models strong at structured output (Qwen) are weaker at long-form prose; those good at prose (Gemma) are less precise on taxonomies and classifications. The compromise was mediocre quality across the board.

What Runs Today (and Why)

| Model | Role | Why This One | | ---------------------------------- | ------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Qwen3.6-35B-A3B (MoE, 3B active) | News selection, duplicate arbitration, guardrails, publication metadata, visual verification | The Mixture-of-Experts architecture delivers the "knowledge" of a 35B model at the computational cost of a much smaller one, and quantized it fits in about 20 GB. Excellent at structured output and multi-step reasoning. It's the editor-in-chief. | | Gemma-4-31B-it | Article writing, cover image classification | The best 31B model I tried for long-form Italian prose. It follows style instructions without deviation, doesn't fabricate citations, and has a clean technical vocabulary. It also proved the most accurate at extracting discriminative English keywords, where Qwen tended to confuse similar terms. | | Qwen3-Embedding-4B | Vector embeddings for deduplication, similarity, and user profiling | Small, fast, multilingual. For semantic comparison between news stories the quality is more than sufficient, and it leaves memory free for everything else. | | ERNIE TURBO (diffusion, local) | Cover image generation (fallback) | A very recent diffusion model that runs entirely locally even on systems with only 8GB of VRAM. In the Turbo version, photographic quality isn't always excellent, but the ability to generate decent images in just 8 steps (about 3 minutes of processing on my Mac) makes it unbeatable. | | Higgsfield (optional, cloud) | Cover image generation (high quality) | The only cloud exception, optional. Superior photographic quality: I use it when an article describes a very specific subject and I want a cover that doesn't look "synthetic." |

The Load/Unload Choreography

The point isn't which models run, but how they run. On 64 GB of unified memory, you can't keep two LLMs at 20–30 GB simultaneously. So there's a director ensuring that only one LLM model is in memory at any given phase of the process.


4. The Pipeline at a Glance

Here is the diagram of the entire workflow: all phases, the models involved, and the output channels, without diving into implementation details.

Now let's go phase by phase, at a functional level. The implementation lives in the code, not in this article.


5. The Phases: From Noise to News

5.1 Acquisition

Everything starts with about seventy RSS feeds organized into ten topic areas: general tech, artificial intelligence, software development, academic research, gaming, science, space, hardware, health, and a group dedicated to Italian sources (only for Italy-related news). To these, targeted web searches are added to catch what falls through the feeds, plus a manual path from the dashboard for news I flag myself.

An initial set of filters immediately removes the most obvious noise: articles that are too old, URLs already seen recently, content too sparse to be useful.

5.2 Deduplication and Date Recovery

Two news items with different URLs can tell the same story. The embedding model vectorizes each article and compares it against history: if two pieces are semantically the same news, the richer one survives; if they cover the same topic from different angles, they're merged into a single source.

Many articles arrive without a reliable date — feeds often omit it. A dedicated agent attempts to reconstruct it through various strategies and penalizes articles where it fails: news without a date is news you can't fully trust.

5.3 Pre-filter and Selection

Before the actual selection, an AI gatekeeper discards spam, clickbait, commercial press releases, reviews, personal opinions, guides, and tutorials. The filter also tracks source reputation: sources that are systematically discarded end up being skipped upstream.

Then the editor-in-chief steps in: an LLM with a system prompt refined over many iterations. It decides which articles deserve to become news, with what importance, and in which category — using a taxonomy with cross-cutting exclusion rules to prevent, for example, a vulnerability in an AI model from being classified as "AI" instead of "cybersecurity." My editorial profile is injected into the prompt, so the system knows what interests me and what doesn't.

5.4 Category Quotas and Arbitration

An LLM curator left to its own devices tends to over-represent categories with more available material (surprise: AI). The system therefore applies a dynamic quota: the more a category has been covered recently, the higher the bar to let new stories in.

After selection, approved articles are compared semantically against each other and against history. Similarity falls into three bands:

  • Low: unique article, passes through.
  • High: duplicate, discarded.
  • Gray: a second LLM pass acts as judge and decides if the article truly adds something.
The gray zone is where the system's quality shows: you need a model that understands whether two pieces are "the same update" or "the same story from two different angles." The difference is subtle, and it makes all the difference.

5.5 The Final Score

Each article receives a score combining three dimensions: the importance assigned by the curator, the affinity with my editorial profile (how close it is to what I've appreciated in the past, how far from what I've rejected), and a bonus for topics covered by many independent sources. Overly colloquial tone and missing dates weigh negatively.

5.6 Sister Sources and Anti-Echo Chamber

When two articles are moderately similar, instead of discarding the second, the system merges it as an additional source for the first. The "lead" article can thus inherit multiple sister sources, which the writing agent uses to build a multi-angle piece. The result is more complete and more credible.

There's also a mirror mechanism: before publishing, every article is compared to what I've recently published, to prevent the site from turning into an echo chamber repeating itself.

5.7 Writing

The writing agent receives a system prompt defining a senior Italian tech journalist, with precise rules: original reworking (never paraphrase the source), valid markdown with links on meaningful anchor text, no first person, historical context retrieved from past articles to avoid self-repetition, and an "Italy rule" — the impact on the Italian market is only mentioned if sources mention it, never fabricated to please the local audience.

One subtle rule saves a lot: news from Italian sources doesn't generate the English version of the article. The bilingual pipeline serves the international audience; Italian news stays Italian. Fewer tokens, less waiting, less waste.

5.8 Guardrails and Inline Sources

Before publication, a surgical guardrail corrects only specific errors — macaronic neologisms, wrong accents, agreement errors, double spaces — without touching structure, tone, or style.

Immediately after, a post-processing step scans the text and collects all links to relevant external sources, promoting them to declared article sources. The result: a piece that openly cites five, ten, fifteen real sources even when the model had received only a couple. Backlinks for SEO, attributions for credibility, zero extra work for the model.

5.9 The Cover Image: Three Paths to One Picture

This is the most iterated part of the entire project. Generating a relevant image for a tech news story is surprisingly difficult: diffusion models tend to produce the usual "server room with neon lights" and cheerfully confuse brands and products.

The solution was to place a decision gate before generation. An LLM — the same one that writes articles, because it proved the most accurate on English keywords — classifies the article's subject:

  • Real and specific subject (a product, a person, a place) → search for real photos on free stock archives like Unsplash, Pexels, Wikimedia Commons, and finally DuckDuckGo Images.
  • Real but generic subject, or abstract concept ("AI ethics") → go directly to generation.
In the stock path, each candidate photo is examined by Qwen 3.6 (which is also a vision model) that answers simple but decisive questions: does the subject actually match the article? Is the composition editorial-cover quality? If at least one candidate passes, it's downloaded and used — with full author credits.

If no photo is good enough, or if the subject is abstract, generation kicks in: with ERNIE locally, or as a fallback with Higgsfield in the cloud when superior photographic quality is needed.

Since the decision gate with visual verification was introduced, real photos have replaced most generations. The perceived quality is much higher: a real photo of a datacenter or a chip has an editorial credibility that diffusion models struggle to match.

5.10 Multi-Platform Publication

Once the article is written and the cover image is ready, the content goes out to four independent channels. The architecture follows a non-blocking failure pattern: an error on Telegram doesn't prevent LinkedIn from publishing, and vice versa.

1. Website. The system writes the article in the static site generator's format, copies the cover image, and triggers build and deploy via SSH, falling back to FTP if SSH fails.

2. Telegram. Sends a notification to the channel with cover photo, preview, and link, with a cumulative summary for batch publications.

3. LinkedIn. A digest summarizing the day's news; if the text exceeds platform limits, it's split into multiple numbered posts without ever breaking titles or links mid-way.

4. WhatsApp. The strangest integration of all. There's no usable official API, so I use a very useful open-source CLI called agent-desktop. Although this app was primarily designed for use with AI agents, to let them control on-screen applications, it can also be used effectively through a simple Python script. The system uses it to:

  • open the WhatsApp app if not already visible;
  • find the "AlexTech" channel in the interface (searching for "AlexTech" text among UI elements under the Updates tab);
  • click "Compose" to open the message field;
  • paste the article text with the link (pbcopy + cmd+v);
  • press Enter to publish, repeating steps 3 to 5 for each post.
Everything happens in the background, as if someone were sitting at the computer typing on WhatsApp for you — but automatically, every time a new article is published. Naturally, for this to work you need to be already authenticated in the WhatsApp app. But beware: on WhatsApp this automation works, but if you tried to use it on social platforms like LinkedIn, Facebook, or Instagram, you'd likely run into those services' strict anti-bot filters.

6. The Feedback System: How the AI Learns from Me

Every interaction from the dashboard — a like, a star, a rejection, an edit — feeds three parallel systems.

The editorial profile. It analyzes my judgment history, giving more weight to recent ones and more weight to strong signals over weak ones. It produces two portraits: one for selection, capturing my topic preferences, and one for writing, capturing the style and length I prefer. Both are injected into the prompts: the more I interact, the more the system chooses and writes the way I like.

Vector centroids. Articles I've appreciated form, in embedding space, a "center of gravity" for my interests; rejected ones form an opposing one. Every new story is also evaluated based on how close it is to the first and how far from the second. An additional clustering layer identifies my recurring "interest pockets" and rewards articles that resonate with them.

Self-calibration. After each run, a calibration engine analyzes overall statistics and gradually tweaks its internal parameters, with dampening mechanisms to prevent oscillations. The system slowly adjusts itself.


7. The Frontend: The Command Bridge

The React dashboard isn't an accessory: it's the cockpit from which I direct the operation.

Articles Panel on the left: all written pieces, with title, cover image, and rating. I can read, edit inline, regenerate the cover choosing the path, copy the social post, publish.

News Panel on the right: approved but unwritten stories, ordered by score. I can approve, reject, pin, or have a custom single article written. This is also where manually entered news goes.

Then: a status bar with system state and the next scheduler slot; real-time logs, where every agent reports what it's doing; a calibration panel with sliders to manually adjust thresholds when I want to override automation; and a progress bar showing approximately how much time remains for each process in the editorial pipeline.


8. The Numbers

In a typical morning run:

  • ~ 100 sources scanned
  • 200–400 articles incoming after fetch
  • 100–170 survivors after deduplication and pre-filter
  • 40–70 after selection and arbitration
  • 15–20 articles written and published
  • 45–80 minutes end-to-end pipeline
After weeks of operation, the database already contains hundreds of articles, thousands of embeddings, and a feedback history that is itself a resource: it's the reason the system improves over time instead of staying the same.

9. Lessons Learned

Vibe coding works, but you need guardrails. Much of the system was born by iterating quickly, in continuous dialogue with AI tools, with frequent commits. But without the right protections — tests, schema versioning, safe migrations — the project would have collapsed under its own weight.

The right model for the right task. I tried having Qwen write. I tried having Gemma select. I tried having Qwen classify covers, and it confused one title for another, producing uselessly vague keywords. Every model has a specific talent, and forcing it outside its role degrades quality measurably. The load/unload choreography is complex, but it's the right choice.

Prompts are code. The curator prompt reached its twenty-third iteration, and it shows. The writer prompt is a multi-line decalogue. I spent more time refining prompts than writing Python. The difference between a mediocre article and a good one often lies in a single line of instruction.

Feedback loops are the differentiator. Without the editorial profile and centroids, the system would produce generically good but misaligned content. Human feedback turns a content generator into a personal editor.

Multi-platform publishing is an engineering problem, not an AI one. Telegram, LinkedIn, and WhatsApp require three completely different approaches. The hard part isn't generating the texts, but managing states, failures, rate limits, and expired tokens. The non-blocking failure architecture was the most important decision.

Real photos beat generative AI, when they exist. Since the decision gate with visual verification was introduced, stock images have replaced most generations. The perceived quality is incomparable. Generation remains valuable, but as a safety net, not a first choice.

Inline sources are a credibility multiplier. Extracting links from the text and promoting them to declared sources had two effects: articles cite many more real sources, and backlinks return traffic to the original outlets. Minimal cost, massive gain in SEO, editorial reliability, and intellectual honesty.


10. What's Next

The system is in production and working. Open development directions:

  • Multi-tenancy: separate editorial profiles for different sites
  • Audio: local text-to-speech to turn every article into a podcast
  • Automated fact-checking: cross-referencing with primary sources
  • Video covers: short animated clips instead of static images

11. Conclusion

What I built is not a substitute for human journalism. It's an amplifier: it gives a single person the ability to cover the tech landscape with a depth and frequency that would otherwise require a team of five or six editors.

The most surprising part isn't that it works. It's that it works entirely locally, on a single computer, without depending on any cloud API. Three language models, one embedding model, an image generator with fallback to real photographs, about seventy sources, a dozen specialized agents, four publication channels, and one dashboard to rule them all. All in 64 GB of unified memory.

I'm not saying anyone can or should do this. I'm saying it can be done, and the result is surprisingly good.

If you're curious to see the articles, they're on the AlexTech website, a project I'm running with my lifelong friend and partner Alessandro Vignoli. Soon, on our YouTube channel, I'll also publish a video showing the system in action. Stay tuned, and if you enjoyed this article, share it!