Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
Building on HF
1
1
4
Miro Doporto
PRO
spanofzero
Follow
lulavc's profile picture
papaver-somniferum's profile picture
cahlen's profile picture
3 followers
·
7 following
AI & ML interests
None yet
Recent Activity
replied
to
lulavc
's
post
about 3 hours ago
To the Deepseek Team I had some issues with my Google Workspace account (DNS got all messes up) and basically im not able to access my Gmail and im not able to fix my DNS because hostingeris not helping too much on that. I gave all my documents, IDs, payment receipts etc... But i still not able to access. WHY im asking for help... Because Deepseek is one of the models that i PAY The API for long time and im not able to access my account. I have one API that i dont know if Will work with v4 and i would like tô the Deepseek Team tô help me. The e-mail registered is the same email i use on Huggingface. I have IDs payment receipts from top ups... Everything but i already sent like 5 e-mails tô Deepseek support with no answer. Sorry for talking about that here but i had no other way tô reach you up. And for Huggingface...i iwe you about 2 USD that i Will PAY... But im waiting my New CC arrive. Im a long time user and i wont do any wrong thing like not paying but you could please reactivate some of my services. I go tô Hospital every week for radiotherapy and sometimes i dont feel good and basically i forgot some hardware on and charged me those 2 USD. I promess all Will be resolved soon. Thank you all!!!
replied
to
Zoberzzz
's
post
about 3 hours ago
Hackernews post · TXT Show HN: I compressed a 160GB KV cache to 640MB at 0.9994 fidelity on a $300 GPU Title: Show HN: DenseMem — 256x KV cache compression, 0.9994 fidelity, runs on consumer hardware --- A 72B model at 32K context needs 160GB of KV cache. That's an H100 and $32,000 in HBM3e memory. I built a protocol that stores the same KV cache in 640MB of DDR5 RAM — on a consumer RTX 4090 and Core i9. 256x compression. 0.9994 cosine similarity. 1.95ms average fetch latency. Verified. **How:** Transformer KV cache activations are highly structured and correlated. SVD at rank=64 exploits that structure. Random noise compresses to 0.12 fidelity. Real KV cache activations compress to 0.9994. The math works because the data isn't random — it has geometry. The system manages a two-tier hierarchy: VRAM is the hot tier, DDR5 is the warm tier. An attention-weighted evictor (0.5 attn + 0.3 recency + 0.2 freq) decides what stays hot. A prefetcher using layer lookahead and token prediction pre-positions pages before they're needed. Average fetch latency: 1.95ms. Max under load: 3.96ms. Current hit rate is 25% — bottlenecked by my i9's 2-channel DDR5 bandwidth (~38 GB/s). On an 8-channel Threadripper PRO (~224 GB/s) I'm projecting 65-75%. **Running live:** - Qwen2.5-7B on RTX 4090 at 32K context (was 4K) - Every inference tick compressed INT8 via PCA → DDR5 - 2.4s cold start **The cost math:** - Uncompressed 72B KV cache: $32,000 in HBM3e - FoldedMemory: $1.88 in DDR5 - 99.4% cost reduction. Verified on consumer hardware. GitHub: https://github.com/thorshammerztp-arch/densemem-protocol Patent Pending: US 64/045,595 Solo developer. Navy veteran. No funding. Consumer hardware.
replied
to
Zoberzzz
's
post
about 3 hours ago
Hackernews post · TXT Show HN: I compressed a 160GB KV cache to 640MB at 0.9994 fidelity on a $300 GPU Title: Show HN: DenseMem — 256x KV cache compression, 0.9994 fidelity, runs on consumer hardware --- A 72B model at 32K context needs 160GB of KV cache. That's an H100 and $32,000 in HBM3e memory. I built a protocol that stores the same KV cache in 640MB of DDR5 RAM — on a consumer RTX 4090 and Core i9. 256x compression. 0.9994 cosine similarity. 1.95ms average fetch latency. Verified. **How:** Transformer KV cache activations are highly structured and correlated. SVD at rank=64 exploits that structure. Random noise compresses to 0.12 fidelity. Real KV cache activations compress to 0.9994. The math works because the data isn't random — it has geometry. The system manages a two-tier hierarchy: VRAM is the hot tier, DDR5 is the warm tier. An attention-weighted evictor (0.5 attn + 0.3 recency + 0.2 freq) decides what stays hot. A prefetcher using layer lookahead and token prediction pre-positions pages before they're needed. Average fetch latency: 1.95ms. Max under load: 3.96ms. Current hit rate is 25% — bottlenecked by my i9's 2-channel DDR5 bandwidth (~38 GB/s). On an 8-channel Threadripper PRO (~224 GB/s) I'm projecting 65-75%. **Running live:** - Qwen2.5-7B on RTX 4090 at 32K context (was 4K) - Every inference tick compressed INT8 via PCA → DDR5 - 2.4s cold start **The cost math:** - Uncompressed 72B KV cache: $32,000 in HBM3e - FoldedMemory: $1.88 in DDR5 - 99.4% cost reduction. Verified on consumer hardware. GitHub: https://github.com/thorshammerztp-arch/densemem-protocol Patent Pending: US 64/045,595 Solo developer. Navy veteran. No funding. Consumer hardware.
View all activity
Organizations
spanofzero
's datasets
1
Sort: Recently updated
spanofzero/SpaceTravelersUniversalPlaylist
Viewer
•
Updated
10 days ago
•
121
•
30
•
1