Miro Doporto's picture

Building on HF

Miro Doporto PRO

spanofzero

·

AI & ML interests

None yet

Recent Activity

repliedto lulavc's post about 3 hours ago

To the Deepseek Team I had some issues with my Google Workspace account (DNS got all messes up) and basically im not able to access my Gmail and im not able to fix my DNS because hostingeris not helping too much on that. I gave all my documents, IDs, payment receipts etc... But i still not able to access. WHY im asking for help... Because Deepseek is one of the models that i PAY The API for long time and im not able to access my account. I have one API that i dont know if Will work with v4 and i would like tô the Deepseek Team tô help me. The e-mail registered is the same email i use on Huggingface. I have IDs payment receipts from top ups... Everything but i already sent like 5 e-mails tô Deepseek support with no answer. Sorry for talking about that here but i had no other way tô reach you up. And for Huggingface...i iwe you about 2 USD that i Will PAY... But im waiting my New CC arrive. Im a long time user and i wont do any wrong thing like not paying but you could please reactivate some of my services. I go tô Hospital every week for radiotherapy and sometimes i dont feel good and basically i forgot some hardware on and charged me those 2 USD. I promess all Will be resolved soon. Thank you all!!!

repliedto Zoberzzz's post about 3 hours ago

Hackernews post · TXT Show HN: I compressed a 160GB KV cache to 640MB at 0.9994 fidelity on a $300 GPU Title: Show HN: DenseMem — 256x KV cache compression, 0.9994 fidelity, runs on consumer hardware --- A 72B model at 32K context needs 160GB of KV cache. That's an H100 and $32,000 in HBM3e memory. I built a protocol that stores the same KV cache in 640MB of DDR5 RAM — on a consumer RTX 4090 and Core i9. 256x compression. 0.9994 cosine similarity. 1.95ms average fetch latency. Verified. **How:** Transformer KV cache activations are highly structured and correlated. SVD at rank=64 exploits that structure. Random noise compresses to 0.12 fidelity. Real KV cache activations compress to 0.9994. The math works because the data isn't random — it has geometry. The system manages a two-tier hierarchy: VRAM is the hot tier, DDR5 is the warm tier. An attention-weighted evictor (0.5 attn + 0.3 recency + 0.2 freq) decides what stays hot. A prefetcher using layer lookahead and token prediction pre-positions pages before they're needed. Average fetch latency: 1.95ms. Max under load: 3.96ms. Current hit rate is 25% — bottlenecked by my i9's 2-channel DDR5 bandwidth (~38 GB/s). On an 8-channel Threadripper PRO (~224 GB/s) I'm projecting 65-75%. **Running live:** - Qwen2.5-7B on RTX 4090 at 32K context (was 4K) - Every inference tick compressed INT8 via PCA → DDR5 - 2.4s cold start **The cost math:** - Uncompressed 72B KV cache: $32,000 in HBM3e - FoldedMemory: $1.88 in DDR5 - 99.4% cost reduction. Verified on consumer hardware. GitHub: https://github.com/thorshammerztp-arch/densemem-protocol Patent Pending: US 64/045,595 Solo developer. Navy veteran. No funding. Consumer hardware.

repliedto Zoberzzz's post about 3 hours ago

Hackernews post · TXT Show HN: I compressed a 160GB KV cache to 640MB at 0.9994 fidelity on a $300 GPU Title: Show HN: DenseMem — 256x KV cache compression, 0.9994 fidelity, runs on consumer hardware --- A 72B model at 32K context needs 160GB of KV cache. That's an H100 and $32,000 in HBM3e memory. I built a protocol that stores the same KV cache in 640MB of DDR5 RAM — on a consumer RTX 4090 and Core i9. 256x compression. 0.9994 cosine similarity. 1.95ms average fetch latency. Verified. **How:** Transformer KV cache activations are highly structured and correlated. SVD at rank=64 exploits that structure. Random noise compresses to 0.12 fidelity. Real KV cache activations compress to 0.9994. The math works because the data isn't random — it has geometry. The system manages a two-tier hierarchy: VRAM is the hot tier, DDR5 is the warm tier. An attention-weighted evictor (0.5 attn + 0.3 recency + 0.2 freq) decides what stays hot. A prefetcher using layer lookahead and token prediction pre-positions pages before they're needed. Average fetch latency: 1.95ms. Max under load: 3.96ms. Current hit rate is 25% — bottlenecked by my i9's 2-channel DDR5 bandwidth (~38 GB/s). On an 8-channel Threadripper PRO (~224 GB/s) I'm projecting 65-75%. **Running live:** - Qwen2.5-7B on RTX 4090 at 32K context (was 4K) - Every inference tick compressed INT8 via PCA → DDR5 - 2.4s cold start **The cost math:** - Uncompressed 72B KV cache: $32,000 in HBM3e - FoldedMemory: $1.88 in DDR5 - 99.4% cost reduction. Verified on consumer hardware. GitHub: https://github.com/thorshammerztp-arch/densemem-protocol Patent Pending: US 64/045,595 Solo developer. Navy veteran. No funding. Consumer hardware.

View all activity

Organizations

spanofzero 's datasets 1

spanofzero/SpaceTravelersUniversalPlaylist

Viewer • Updated 10 days ago • 121 • 30 • 1