pFad - Phone/Frame/Anonymizer/Declutterfier! Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

URL: http://github.com/aws/sagemaker-python-sdk/discussions/4412

.com/assets/github-e2770156926cc31d.css" /> Multi GPU training on g5.12xlarge · aws/sagemaker-python-sdk · Discussion #4412 · GitHub
Skip to content
Discussion options

You must be logged in to vote

Hello, I would suggest trying the torch_distributed distribution in the estimator. This uses torchrun under the hood and should work on any multi-gpu instance type. And you can make sure to set the distributed backend to "nccl" in your training script. This will leverage DDP training without the instance type constraints of SMDDP

distribution={
        "torch_distributed": {
            "enabled": True
        }
    }

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by ruhanprasad
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Help
Labels
None yet
2 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad © 2024 Your Company Name. All rights reserved.





Check this box to remove all script contents from the fetched content.



Check this box to remove all images from the fetched content.


Check this box to remove all CSS styles from the fetched content.


Check this box to keep images inefficiently compressed and original size.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy