Content-Length: 358514 | pFad | http://github.com/All-Hands-AI/OpenHands/pull/10270/files/4a9736026554560ca2b7c2dc059ed9acb1794f31

3E feat(evaluation): Added INSTRUCTION_TEMPLATE_NAME to run_infer.py in swe_bench by KevinMusgrave · Pull Request #10270 · All-Hands-AI/OpenHands · GitHub
Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions evaluation/benchmarks/swe_bench/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,9 @@ export USE_HINT_TEXT=true # Ignore this if you are not sure.

# Specify a condenser configuration for memory management (default: NoOpCondenser)
export EVAL_CONDENSER=summarizer_for_eval # Name of the condenser config group in config.toml

# Specify the instruction prompt template file name
export INSTRUCTION_TEMPLATE_NAME=swe_custom.j2 # Name of the file in the swe_bench/prompts folder.
```

Let's say you'd like to run 10 instances using `llm.eval_gpt4_1106_preview` and CodeActAgent,
Expand Down
5 changes: 4 additions & 1 deletion evaluation/benchmarks/swe_bench/run_infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,9 @@ def get_instruction(instance: pd.Series, metadata: EvalMetadata) -> MessageActio
llm_model = metadata.llm_config.model

# Determine the template file based on mode and LLM
if mode.startswith('swt'):
if metadata.instruction_template_name:
template_name = metadata.instruction_template_name
elif mode.startswith('swt'):
template_name = 'swt.j2'
elif mode == 'swe':
if 'gpt-4.1' in llm_model:
Expand All @@ -122,6 +124,7 @@ def get_instruction(instance: pd.Series, metadata: EvalMetadata) -> MessageActio
logger.error(f'Unexpected evaluation mode: {mode}. Falling back to default.')
template_name = 'swe_default.j2'

logger.debug(f'Using instruction template file: {template_name}')
# Set up Jinja2 environment
# Assuming templates are in 'evaluation/benchmarks/swe_bench/prompts' relative to this script
prompts_dir = os.path.join(os.path.dirname(__file__), 'prompts')
Expand Down
2 changes: 2 additions & 0 deletions evaluation/utils/shared.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ class EvalMetadata(BaseModel):
data_split: str | None = None
details: dict[str, Any] | None = None
condenser_config: CondenserConfig | None = None
instruction_template_name: str | None = None


class EvalOutput(BaseModel):
Expand Down Expand Up @@ -205,6 +206,7 @@ def make_metadata(
condenser_config=condenser_config
if condenser_config
else NoOpCondenserConfig(),
instruction_template_name=os.environ.get('INSTRUCTION_TEMPLATE_NAME')
)
metadata_json = metadata.model_dump_json()
logger.info(f'Metadata: {metadata_json}')
Expand Down
Loading








ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: http://github.com/All-Hands-AI/OpenHands/pull/10270/files/4a9736026554560ca2b7c2dc059ed9acb1794f31

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy