pFad - Phone/Frame/Anonymizer/Declutterfier! Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

URL: http://github.com/FAbrahamDev/unstructured/tree/main/scripts/performance

="all" rel="stylesheet" href="https://github.githubassets.com/assets/primer-0fcd9af82350aeda.css" /> unstructured/scripts/performance at main · FAbrahamDev/unstructured · GitHub
Skip to content

Latest commit

 

History

History
 
 

README.md

Performance

This is a collection of tools helpful for inspecting and tracking performance of the Unstructured library.

The benchmarking script allows a user to track performance time to partitioning results against a fixed set of test documents and store those results with indication of architecture, instance type, and git hash, in S3.

The profiling script allows a user to inspect how time time and memory are spent across called functions when performing partitioning on a given document.

Install

Benchmarking requires no additional dependencies and should work without any initial setup. Profiling has a few dependencies which can be installed with:

pip install -r scripts/performance/requirements.txt
npm install -g speedscope

The second dependency speedscope provides a tool to view profiling results from py-spy locally. Alternatively you can also drop the profile result *.speedscope into https://www.speedscope.app/ to view the results online.

Run

Benchmark

Export / assign desired environment variable settings:

  • DOCKER_TEST: Set to true to run benchmark inside a Docker container (default: false)
  • NUM_ITERATIONS: Number of iterations for benchmark (e.g., 100) (default: 3)
  • INSTANCE_TYPE: Type of benchmark instance (e.g., "c5.xlarge") (default: unspecified)
  • PUBLISH_RESULTS: Set to true to publish results to S3 bucket (default: false)

Usage: ./scripts/performance/benchmark.sh

Profile

Export / assign desired environment variable settings:

  • DOCKER_TEST: Set to true to run profiling inside a Docker container (default: false)

Usage:

on Linux: ./scripts/performance/profile.sh

on macOS: sudo -E ./scripts/performance/profile.sh; py-spy requires su to run on macOS

  • Run the script and choose the profiling mode: 'run' or 'view'.
  • In the 'run' mode, you can profile custom files or select existing test files.
  • In the 'view' mode, you can view previously generated profiling results.
  • The script supports time profiling with cProfile and memory profiling with memray.
  • Users can choose different visualization options such as flamegraphs, tables, trees, summaries, and statistics.
  • Test documents are synced from an S3 bucket to a local directory before running the profiles
pFad - Phonifier reborn

Pfad - The Proxy pFad © 2024 Your Company Name. All rights reserved.





Check this box to remove all script contents from the fetched content.



Check this box to remove all images from the fetched content.


Check this box to remove all CSS styles from the fetched content.


Check this box to keep images inefficiently compressed and original size.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy