sai_reddy's picture

sai_reddy

saireddy

·

AI & ML interests

None yet

Organizations

New activity in Qwen/Qwen3.5-27B 2 months ago

Fine tune with lora

#26 opened 2 months ago by

New activity in Qwen/Qwen3.5-397B-A17B 3 months ago

Memory Requirements to run `Qwen/Qwen3.5-397B-A17B`

#20 opened 3 months ago by

New activity in Qwen/Qwen3.5-397B-A17B-FP8 3 months ago

can we deploy this using tp (H100-80GB each)=6 ?

#3 opened 3 months ago by

New activity in moonshotai/Kimi-Linear-48B-A3B-Instruct 6 months ago

insights on comparisons with Qwen/Qwen3-Next-80B-A3B-Instruct ?

#14 opened 6 months ago by

New activity in Qwen/Qwen3-VL-235B-A22B-Instruct-FP8 7 months ago

function calling

#4 opened 7 months ago by

New activity in Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 9 months ago

possible to extend context to 1m tokens ?

#5 opened 9 months ago by

New activity in google/gemma-2-9b over 1 year ago

RuntimeError: Index put requires the source and destination dtypes match, got BFloat16 for the destination and Float for the source.

#24 opened almost 2 years ago by

New activity in google/gemma-2-9b almost 2 years ago

model.generate is throwing AttributeError: 'HybridCache' object has no attribute 'float'

#18 opened almost 2 years ago by

base vs instruct model

#17 opened almost 2 years ago by

Inference error

#20 opened almost 2 years ago by

New activity in google/gemma-7b almost 2 years ago

8-bit precision error

#32 opened about 2 years ago by

New activity in google/gemma-7b-it almost 2 years ago

ValueError with multi A100 GPUS

#28 opened about 2 years ago by

New activity in meta-llama/Meta-Llama-3-8B-Instruct about 2 years ago

ValueError: You can't train a model that has been loaded in 8-bit precision on a different device than the one you're training on.

#35 opened about 2 years ago by

New activity in meta-llama/Meta-Llama-3-70B-Instruct about 2 years ago

Base vs instruct

#17 opened about 2 years ago by

New activity in google/gemma-7b-it about 2 years ago

Could not find GemmaForCausalLM neither in <module 'transformers.models.gemma'

#36 opened about 2 years ago by