Martins Udris PRO

martinsu

AI & ML interests

None yet

Recent Activity

reacted to RakshitAralimatti's post with 🚀 about 6 hours ago

I built something crazy you never saw before. Please check - https://huggingface.co/blog/RakshitAralimatti/streaming-data-rag A real-time Streaming Data to RAG system that listens to live radio, transcribes it on-the-fly, and lets you query across TIME. Not just "what was discussed" – but "what happened in the last 10 minutes on channel 0?" or "at 9 AM, what was the breaking news?" This is RAG that understands temporal context.

upvoted an article about 6 hours ago

I Built a RAG System That Listens to Live BBC News and Answers Questions About "What Happened 10 Minutes Ago"

replied to unmodeled-tyler's post about 6 hours ago

New Preview Model: https://huggingface.co/spaces/unmodeled-tyler/vanta-research-loux-preview VANTA Research is excited to announce a small lab preview of our new 675B fine tune, Loux-Large. Loux is an AI model with a sophisticated, rebellious edge designed to assist and collaborate with engineers, builders, and people working on technical projects. If you enjoy working with Loux and would like full access, let us know by liking the space or opening a discussion in the community!

View all activity

Organizations

None yet

Posts 2

Post

2764

https://huggingface.co/blog/martinsu/potus-broke-my-pipeline

How POTUS Completely Broke My Flash 2.5-Based Guardrail

Did quite a bit of deep research on this one, since it IMHO matters. At first I used this story to amuse fellow MLOps guys, but then I went deeper and was surprised.

To those who don't want to read too much, in plain English: when you give the model a high-stakes statement that clashes with what it "knows" about the world, it gets more brittle. Sometimes to a point of being unusable.

Or an even shorter version: do not clash with the model's given worldview—it will degrade to some extent.

And in practice, it means that in lower-resource languages like Latvian and Finnish (and probably others), Flash 2.5 is an unreliable guardrail model when something clashes with the model's general "worldview".

However, I'm sure this degradation applies to other languages and models as well to varying extents.

In one totally normal week of MLOps, my news summarization pipeline started failing intermittently. Nothing was changed. No deploys. No prompt edits. No model version bump (as far as I could tell). Yet the guardrail would suddenly turn into a grumpy judge and reject outputs for reasons that felt random, sometimes even contradicting itself between runs. It was the worst kind of failure: silent, flaky, and impossible to reproduce on demand.

Then I noticed the pattern: it started when one specific named entity appeared in the text — Donald Trump ** (**and later in tests — Bernie Sanders too ).

And then down the rabbit hole I went.

View all Posts