Anyone use nvlink on 2x3090s?
Mistral Small 3 24B GGUF quantization Evaluation results
How do I set kv cache quantisation in Ollama?
mistralai/Mistral-Small-24B-Base-2501 · Hugging Face
Berkley AI research team claims to reproduce DeepSeek core technologies for $30
Deepseek is heavily overrated IMO, give me your opinion. Sonnet still better with API
DeepSeek added recommandations for R1 local use to model card
This free Chinese AI just crushed OpenAI's $200 o1 model...
Be honest, who would play a meth cooking simulator?
Guys, did Google just crack the Alberta Plan? Continual learning during inference?
This lady is on fire
New Model from https://novasky-ai.github.io/ Sky-T1-32B-Preview, open-source reasoning model that matches o1-preview on popular reasoning and coding benchmarks — trained under $450!
Tech lead of Qwen Team, Alibaba Group: "I often recommend people to read the blog of Anthropic to learn more about what agent really is. Then you will realize you should invest on it as much as possible this year." Blog linked in body text.
Train a 7B model that outperforms GPT-4o ?
B42: do farm animals disappear after a while or could i still find them late game?
LLMs are not reasoning models
I had missed this patent filed by OpenAI
Hypnagogic Visions
Claude does something extremely Human; writes a partial codeblock, then a comment explaining it has no effin clue what to do next
Be careful where you load your credits...
Agent swarm framework aces spatial reasoning test.
How do you define rewards for RL on chain of thought reasoning? Trying to understanding a bit more into how o3 from OpenAI was trained.
I asked Claude to "Please print this as one paragraph, without page breaks" and forgot to paste my text, and it gave me its entire ruleset 😐 Is this common knowledge or...
*asking to users who use qwen QwQ (or others open weights compute-scaling models)... *
What the fuck happened here?