gpt-4-5

Star

Here are 8 public repositories matching this topic...

dongri / openai-api-rs

Sponsor

Star

OpenAI API client library for Rust (unofficial)

api rust realtime openai o1 gpt-4 gpt-3-5-turbo openrouter deepseek gpt-4o gpt-4o-mini gpt-4-5

Updated Aug 28, 2025
Rust

lechmazur / elimination_game

Star

A multi-player tournament benchmark that tests LLMs in social reasoning, strategy, and deception. Players engage in public and private conversations, form alliances, and vote to eliminate each other

game benchmark multi-agent eval strategy-game llm deepseek-r1 o3-mini claude-3-7-sonnet gpt-4-5

Updated Aug 14, 2025

lechmazur / writing

Star

This benchmark tests how well LLMs incorporate a set of 10 mandatory story elements (characters, objects, core concepts, attributes, motivations, etc.) in a short creative story

gemini llama claude o1 llm deepseek deepseek-r1 claude-3-7-sonnet gpt-4-5

Updated Aug 8, 2025
Batchfile

lechmazur / nyt-connections

Star

Benchmark that evaluates LLMs using 651 NYT Connections puzzles extended with extra trick words

testing benchmark evaluation puzzles reasoning llm llms-benchmarking gpt-4o sonnet3-7 gpt-4-5

Updated Aug 23, 2025
Python

anyofai / chatgpt-plus-hezu

Star

最新ChatGPT Plus合租攻略：国内最靠谱的ChatGPT Plus拼车平台推荐(提供原生ChatGPT Plus独立帐号)！支持使用GPT-5、GPT-4o、Grok-4和Gemini-2.5 Pro等AI大模型，每月仅需27元！

openai gpt gpt-4 gpt4 chatgpt openai-chatgpt chatgpt-4 chatgpt4 chatgptplus gpt-4o gpt-4-5

Updated Sep 2, 2025

Multi-Agent Step Race Benchmark: Assessing LLM Collaboration and Deception Under Pressure. A multi-player “step-race” that challenges LLMs to engage in public conversation before secretly picking a move (1, 3, or 5 steps). Whenever two or more players choose the same number, all colliding players fail to advance.

game benchmark evaluation multi-agent eval o1 llm deepseek gpt-4o deepseek-r1 o3-mini sonnet3-7 gpt-4-5

Updated Aug 29, 2025

lechmazur / generalization

Star

Thematic Generalization Benchmark: measures how effectively various LLMs can infer a narrow or specific "theme" (category/rule) from a small set of examples and anti-examples, then detect which item truly fits that theme among a collection of misleading candidates.

benchmark evaluation generalization llm llms llms-benchmarking sonnet3-7 gpt-4-5

Updated Aug 7, 2025

lechmazur / pgg_bench

Star

Public Goods Game (PGG) Benchmark: Contribute & Punish is a multi-agent benchmark that tests cooperative and self-interested strategies among Large Language Models (LLMs) in a resource-sharing economic scenario. Our experiment extends the classic PGG with a punishment phase, allowing players to penalize free-riders or retaliate against others.

game benchmark multi-agent eval llm deepseek-r1 claude-3-7-sonnet gpt-4-5 qwq-32b

Updated Apr 10, 2025

Improve this page

Add a description, image, and links to the gpt-4-5 topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the gpt-4-5 topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gpt-4-5

Here are 8 public repositories matching this topic...

dongri / openai-api-rs

lechmazur / elimination_game

lechmazur / writing

lechmazur / nyt-connections

anyofai / chatgpt-plus-hezu

lechmazur / step_game

lechmazur / generalization

lechmazur / pgg_bench

Improve this page

Add this topic to your repo

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!