pFad - Phone/Frame/Anonymizer/Declutterfier! Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

URL: http://github.com/intel/sycl-tla/pull/738/files

ef="https://github.githubassets.com/assets/primer-primitives-6da842159062d25e.css" /> Fix to allow all gemm tile shapes by vidyasiv · Pull Request #738 · intel/sycl-tla · GitHub
Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,25 @@ If these environment variables are not set, the installation process will infer
* `CUTLASS_PATH`: either one directory level above the current directory (i.e., `$(pwd)/..`) if installed locally or in the `source` directory of the location in which `cutlass_library` was installed
* `ONEAPI_ROOT`: the default Intel oneAPI installation path

#### Performance related environment variables

For improving performance on Intel PVC/BMG you could try the following:

* `export IGC_ExtraOCLOptions="-cl-intel-256-GRF-per-thread"`

Please refer to [Building with Sycl Support](../media/docs/cpp/build/building_with_sycl_support.md#building-with-sycl-for-intel-gpu-support) for the omplete environment setup.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: omplete ->complete


* `CUTLASS_SYCL_ADDITIONAL_TILE_SHAPES` : Path to JSON file containing workgroup and subgroup tile sizes meant for Intel Xe architecture. Expected format s a list of dictionaries like [{"wg": [256, 256, 32], "sg": [8,4,1]}, ...]. Here `wg` refers to the workgroup tile shape and `sg` refers to the subgroup tile layout. This is enabled only for BF16/FP16 kernels.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can please point out where we are reading this env value to update the tile descriptions?


Sample JSON file that may be used for adding tile shapes.
```
[{"wg":[512, 256, 32],"sg":[8,4,1]},
{"wg":[256, 128, 16],"sg":[8,4,1]}]
```
> Note: This feature is meant for advanced users and should be used only if the existing tile shapes don't match desired performance. We recommend you first validate and benchmark any custom tile shapes with SYCL-TLA GEMM examples which can be found [here](../examples/).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also remove example comment since we don't really validate example for performance.

Please note additional tile shapes also increase the torch inductor's autotune benchmarking duration.


#### Installation

Stable releases of the SYCL*TLA Python interface are available via the `sycl-tla` PyPI package.
Expand Down
41 changes: 26 additions & 15 deletions python/cutlass_library/generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@
import sys
import copy
from typing import Any, Dict, Optional, Sequence, Tuple
import json

_LOGGER = logging.getLogger(__name__)

Expand Down Expand Up @@ -200,12 +201,10 @@ def CreateGemmUniversal3xOperator(

operations = []

# by default, only generate the largest tile and largest alignment
# but generate all tiles when --kernels=all is specified
# generate all tiles when --kernels=all is specified
if manifest.kernel_filter == '' or manifest.kernel_filter == 'all':
if len(tile_descriptions) == 0:
return operations
tile_descriptions = [tile_descriptions[0]]

combinations = product(layouts, tile_descriptions, data_types, complex_transforms, schedules, tile_schedulers)
for layout, tile_description, data_type, complex_transform, schedules, tile_scheduler in combinations:
Expand Down Expand Up @@ -10901,21 +10900,33 @@ def GenerateXe_TensorOp_16b_DPAS_gemm(manifest, cuda_version, min_cc=20):
MathOperation.multiply_add)
]

default_tiles_wg_sg = [
([256, 256, 32],[8,4,1]),
([128, 256, 32],[4,8,1]),
([256, 128, 32],[8,4,1]),
([128, 128, 32],[4,4,1]),
([64, 128, 32],[2,4,1]),
]

max_cc = min_cc

# Expecting JSON of format i.e list of dictionaries [{"wg": [256, 256, 32], "sg": [8,4,1]}, ...]
custom_tile_shapes = []
if os.getenv("CUTLASS_SYCL_ADDITIONAL_TILE_SHAPES"):
custom_json = os.getenv("CUTLASS_SYCL_ADDITIONAL_TILE_SHAPES")
with open(custom_json, "r") as f:
try:
custom_tile_shapes = json.load(f)
except json.JSONDecodeError:
raise ValueError(f"Error decoding JSON : {custom_json}")
for tile in custom_tile_shapes:
default_tiles_wg_sg.append((tile["wg"],tile["sg"]))

tile_descriptions=[]
for math_inst in math_instructions:
tile_descriptions = [
TileDescription([256, 256, 32],
0, [8, 4, 1], math_inst, min_cc, max_cc, [1, 1, 1]),
TileDescription([128, 256, 32],
0, [4, 8, 1], math_inst, min_cc, max_cc, [1, 1, 1]),
TileDescription([256, 128, 32],
0, [8, 4, 1], math_inst, min_cc, max_cc, [1, 1, 1]),
TileDescription([128, 128, 32],
0, [4, 4, 1], math_inst, min_cc, max_cc, [1, 1, 1]),
TileDescription([64, 128, 32],
0, [2, 4, 1], math_inst, min_cc, max_cc, [1, 1, 1]),
]
for wg_tile,sg_tile in default_tiles_wg_sg:
tile_descriptions.append(TileDescription(wg_tile,
0, sg_tile, math_inst, min_cc, max_cc, [1, 1, 1]))

# Generate kernels for different output (D) types
# Default: accumulator type (FP32 for mixed precision, same as input for native precision)
Expand Down
Loading
pFad - Phonifier reborn

Pfad - The Proxy pFad © 2024 Your Company Name. All rights reserved.





Check this box to remove all script contents from the fetched content.



Check this box to remove all images from the fetched content.


Check this box to remove all CSS styles from the fetched content.


Check this box to keep images inefficiently compressed and original size.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy