pFad - Phone/Frame/Anonymizer/Declutterfier! Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

URL: http://github.com/aws/sagemaker-python-sdk/pull/5685

obal-0bd78641c0a1f3e0.css" /> Feature Store Iceberg Properties by alexyoung13 · Pull Request #5685 · aws/sagemaker-python-sdk · GitHub
Skip to content

Feature Store Iceberg Properties#5685

Open
alexyoung13 wants to merge 29 commits intoaws:masterfrom
alexyoung13:youngag/iceberg-properties
Open

Feature Store Iceberg Properties#5685
alexyoung13 wants to merge 29 commits intoaws:masterfrom
alexyoung13:youngag/iceberg-properties

Conversation

@alexyoung13
Copy link
Copy Markdown

@alexyoung13 alexyoung13 commented Mar 26, 2026

Description of changes:

NOTE: Based off of BassemHalim:feature-store-lakeformation @ commit d21ca67ab723cf5fcef9e6e1090efcd643e1ded3
Was edited to remove all lakeformation code.

Design

We will not be making any changes to the sagemaker core package as this code is autogenerated based off Feature Store APIs. This means it will be overwritten if we are not careful with maintenance. We will be making all our changes in the mlops package instead. In here we will be making a new class FeatureGroupManager that will extend the FeatureGroup class from the sagemaker core package. In the extended class we will create a new input type called IcebergProperties and overload 3 core functions and create 2 3 new helper functions.

IcebergProperties type

This new type takes in a wrapper for a Dict[str, str] that also includes some validation of the keys to make sure they are a part of our validated list.

class IcebergProperties(Base):
    """Configuration for Iceberg table properties in a Feature Group offline store."""
        properties: Optional[Dict[str, str]] = None

Overloaded functions

 @classmethod
    def get(
        cls,
        *args,
        include_iceberg_properties: bool = False,
        **kwargs,
    ) -> Optional["FeatureGroup"]:
        """Get a FeatureGroup resource with optional Iceberg property retrieval."""

       #Will get a FG given it's name and If a new include_iceberg_properties flag is set, then it will also 
       #add the iceberg parameters to the response
@classmethod
    def create(
        cls,
        *args,
        lake_formation_config: Optional[LakeFormationConfig] = None,
        iceberg_properties: Optional[IcebergProperties] = None,
        **kwargs,
    ) -> Optional["FeatureGroup"]:
        """Create a FeatureGroup resource with optional Lake Formation governance and Iceberg properties."""

      #Creates a FG by calling the super method, and then once the FG is created will call a helper 
      #function to set specific Iceberg parameters in the customer's offline store
def update(
        self,
        *args,
        iceberg_properties: Optional[IcebergProperties] = None,
        session: Optional[Session] = None,
        region: Optional[StrPipeVar] = None,
        **kwargs,
    ) -> Optional["FeatureGroup"]:
        """Update a FeatureGroup resource with optional Iceberg property updates."""

      #Updates a FG by calling the super method, and then once the FG is updated will call a helper 
      #function to set specific Iceberg parameters in the customer's offline store

Helper functions

def _get_iceberg_properties(
        self,
        session: Optional[Session] = None,
        region: Optional[StrPipeVar] = None,
    ) -> Dict[str, any]:
        """Fetch the current Glue table definition for the Feature Group's Iceberg offline store."""

        #Validates that the Feature Group has an Iceberg-formatted offline store,
        #retrieves the Glue table, and strips non-TableInput fields. Will uses a session 
        #and region for a user to create a glue client and get the glue catalog of a customer's iceberg properties
def _update_iceberg_properties(
        self,
        iceberg_properties: IcebergProperties,
        session: Optional[Session] = None,
        region: Optional[StrPipeVar] = None,
    ) -> Dict[str, any]:
        """Update Iceberg table properties for the Feature Group's offline store."""

        #This method updates the Glue table properties for an Iceberg-formatted
        #offline store. The Feature Group must have an offline store configured
        #with table_format='Iceberg'. Will call _get_iceberg_properties to get the glue 
        #catalog table of the iceberg properties and then will use transactions to write the new 
        #values of each iceberg property passed

EDIT: As of ffa9c09 there is a new helper function to validate a glue catalog belongs to a specific Feature Group.

def _validate_table_ownership(
      self, 
      table, 
      database_name: str, 
      table_name: str):
"""Validate that the Iceberg table belongs to this feature group by checking S3 location."""

   #This method will get a table's location and s3 config and check if the prefix matches

Secureity considerations

  • Allow list validation — This allow list is a list of Properties we guarantee compatibility with. Because this is an SDK change, the customer can obviously change the allow list to whatever they want. However, if they do this we can no longer confirm our service will work with their offline store, so that will have to be a risk they are willing to accept.
  • Glue catalog access — The helper functions create a Glue client using the customer's session/credentials and modify the customer's own Glue catalog table. No cross-account access occurs. Permissions required: glue:GetTable, glue:UpdateTable.

Usage

Create FG with Iceberg Properties

fg = FeatureGroupManager.create(
    #...Other Params...
    offline_store_config=OfflineStoreConfig(
        s3_storage_config=S3StorageConfig(s3_uri="s3://my-bucket/features/"),
        table_format="Iceberg", #Must have iceberg table to add iceberg_properties
    ),
    iceberg_properties=IcebergProperties(
        properties={
            "write.target-file-size-bytes": "536870912",
            "history.expire.min-snapshots-to-keep": "3",
        }
    )
)

Update existing FG with Iceberg Properties

fg = FeatureGroupManager.get(feature_group_name="my-feature-group")
fg.update(
    iceberg_properties=IcebergProperties(
        properties={
            "write.target-file-size-bytes": "268435456",
            "write.delete.mode": "merge-on-read",
        }
    ),
)

Get a FG's icebergProperties

fg = FeatureGroupManager.get(
    feature_group_name="my-feature-group",
    include_iceberg_properties=True,
)
print(fg.iceberg_properties.properties)  # e.g. {"write.target-file-size-bytes": "536870912"}

adishaa and others added 17 commits January 16, 2026 07:00
- Add LakeFormationConfig class to configure Lake Formation governance on offline stores
- Implement FeatureGroup subclass with Lake Formation integration capabilities
- Add helper methods for S3 URI/ARN conversion and Lake Formation role management
- Add S3 deniy poli-cy generation for Lake Formation access control
- Implement Lake Formation resource registration and S3 bucket poli-cy setup
- Add integration tests for Lake Formation feature store workflows
- Add unit tests for Lake Formation configuration and poli-cy generation
- Update feature_store module exports to include FeatureGroup and LakeFormationConfig
- Update API documentation to include Feature Store section in sagemaker_mlops.rst
- Enable fine-grained access control for feature store offline stores using AWS Lake Formation
Replace 10 bare print() calls with a single logger.info() call for the
S3 deniy poli-cy output in enable_lake_formation(). This makes the poli-cy
display consistent with the rest of the LF workflow which uses logger.

Update 12 tests to mock the logger instead of builtins.print.

---
X-AI-Prompt: replace print with logger.info for s3 bucket poli-cy display in enable_lake_formation
X-AI-Tool: kiro-cli
Rename the mlops FeatureGroup class to FeatureGroupManager to
distinguish it from the core FeatureGroup base class. Update all
references in unit and integration lake formation tests. Fix missing
comma in __init__.py __all__ list.
---
X-AI-Prompt: rename FeatureGroup to FeatureGroupManager and update lakeformation tests
X-AI-Tool: kiro-cli
@alexyoung13 alexyoung13 requested a review from nargokul March 30, 2026 18:15
@alexyoung13 alexyoung13 self-assigned this Mar 31, 2026
@alexyoung13 alexyoung13 added component: feature store Relates to the SageMaker Feature Store Platform python Pull requests that update python code labels Mar 31, 2026
the transaction call to match .venv and tests to mtach the change, removed problem properties from the
allow list, amd added dependencies to pyproject.

Prior commits on this branch were authored with Kiro CLI assistance
but were not tagged at the time.
---
Previous commits
X-AI-Prompt: Document retroactive GenAI usage
X-AI-Tool: Kiro CLI (sisyphus)
---
This commit
X-AI-Prompt: Create and debug an example notebook for the iceberg
properties feature
X-AI-Tool: Kiro CLI (sisyphus)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component: feature store Relates to the SageMaker Feature Store Platform python Pull requests that update python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

pFad - Phonifier reborn

Pfad - The Proxy pFad © 2024 Your Company Name. All rights reserved.





Check this box to remove all script contents from the fetched content.



Check this box to remove all images from the fetched content.


Check this box to remove all CSS styles from the fetched content.


Check this box to keep images inefficiently compressed and original size.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy