-
Notifications
You must be signed in to change notification settings - Fork 1k
sfn-glue-sam: Replace deprecated Python shell 3.6 with 3.9 #3130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -6,15 +6,15 @@ This pattern deploys a Step Functions that includes a Glue Job as one of its ste | |
|
|
||
| The SAM template deploys: | ||
| * A Step Functions State Machine | ||
| * An EventBridge rule that triggers the Step Functions every 2 days | ||
| * An EventBridge rule that triggers the Step Functions every 2 days (disabled by default) | ||
| * A Glue Job | ||
| * IAM roles required to run the application. | ||
|
|
||
|
|
||
| ## Download | ||
| 1. Create a new directory, navigate to that directory in a terminal and clone the GitHub repository: | ||
| ```bash | ||
| > git clone https://github.com/NicoliAraujo/serverless-patterns.git | ||
| > git clone https://github.com/aws-samples/serverless-patterns.git | ||
| ``` | ||
|
|
||
| 2. Change directory to the pattern directory: | ||
|
|
@@ -24,19 +24,25 @@ The SAM template deploys: | |
|
|
||
|
|
||
| ## Deploy | ||
| 1. Copy glue job script and libs: | ||
| 1. Manually create two Amazon S3 buckets: | ||
| - `ARTIFACTS_BUCKET` — stores the AWS Glue job script. | ||
| - `DATA_BUCKET` — stores the source data for the AWS Glue job. | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. note: These two S3 buckets are not created by the stack. They need to exist before deployment. The original README didn't mention this step at all, so first-time users would likely hit a deploy error without knowing why. Added explicit creation commands with a short note on each bucket's role😀 |
||
|
|
||
| ```bash | ||
| > aws s3 cp code/glue s3://<ARTIFACTS_BUCKET>/glue --recursive | ||
| > aws s3 mb s3://<ARTIFACTS_BUCKET> | ||
| > aws s3 mb s3://<DATA_BUCKET> | ||
| ``` | ||
|
|
||
| 2. Build and deploy stack: | ||
| 2. Copy AWS Glue job script: | ||
|
|
||
| ```bash | ||
| > aws s3 cp code/glue s3://<ARTIFACTS_BUCKET>/glue --recursive | ||
| ``` | ||
|
|
||
| > sam build --template code/cloudformation/stack.yaml | ||
|
|
||
| 3. Build and deploy stack. During the `sam deploy --guided` prompts, pass the bucket names to `ArtifactsBucket` and `SourceBucket`: | ||
|
|
||
| ```bash | ||
| > sam build --template code/cloudformation/stack.yaml | ||
| > sam deploy --guided | ||
| ``` | ||
|
|
||
|
|
@@ -58,6 +64,7 @@ The output should be the description and status of Glue Job run: | |
| "ExecutionTime": 49, | ||
| "GlueVersion": "1.0", | ||
| "Id": "jr_ee276c9398f2d981c850f79cb7b54a28b57d9f6484deb8a9051db4bcbbc12d1f", | ||
| "JobMode": "SCRIPT", | ||
| "JobName": "Feature Engineering", | ||
| "JobRunState": "SUCCEEDED", | ||
| "LastModifiedOn": 1638296059130, | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -24,14 +24,13 @@ Resources: | |
| Properties: | ||
| Command: | ||
| Name: pythonshell | ||
| PythonVersion: "3" | ||
| PythonVersion: "3.9" | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| ScriptLocation: !Sub "s3://${ArtifactsBucket}/glue/scripts/feature_engineering.py" | ||
| ExecutionProperty: | ||
| MaxConcurrentRuns: 1 | ||
| DefaultArguments: | ||
| "--extra-py-files": !Sub "s3://${ArtifactsBucket}/glue/libs/awswrangler-2.12.0-py3-none-any.whl" | ||
| "--library-set": "analytics" | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. note: The bundled |
||
| "--config_path": !Sub "${ETLParametersPath}" | ||
| GlueVersion: "1.0" | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. note:
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| MaxCapacity: 1 | ||
| MaxRetries: 0 | ||
| Name: "Feature Engineering" | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: The schedule is
Enabled: Falseinstack.yaml, so it doesn't trigger automatically after deploy.