diff --git a/sfn-glue-sam/Readme.md b/sfn-glue-sam/Readme.md index d721d4d36..6a92955b2 100644 --- a/sfn-glue-sam/Readme.md +++ b/sfn-glue-sam/Readme.md @@ -6,7 +6,7 @@ This pattern deploys a Step Functions that includes a Glue Job as one of its ste The SAM template deploys: * A Step Functions State Machine -* An EventBridge rule that triggers the Step Functions every 2 days +* An EventBridge rule that triggers the Step Functions every 2 days (disabled by default) * A Glue Job * IAM roles required to run the application. @@ -14,7 +14,7 @@ The SAM template deploys: ## Download 1. Create a new directory, navigate to that directory in a terminal and clone the GitHub repository: ```bash -> git clone https://github.com/NicoliAraujo/serverless-patterns.git +> git clone https://github.com/aws-samples/serverless-patterns.git ``` 2. Change directory to the pattern directory: @@ -24,19 +24,25 @@ The SAM template deploys: ## Deploy -1. Copy glue job script and libs: +1. Manually create two Amazon S3 buckets: +- `ARTIFACTS_BUCKET` — stores the AWS Glue job script. +- `DATA_BUCKET` — stores the source data for the AWS Glue job. ```bash -> aws s3 cp code/glue s3:///glue --recursive +> aws s3 mb s3:// +> aws s3 mb s3:// ``` -2. Build and deploy stack: +2. Copy AWS Glue job script: ```bash +> aws s3 cp code/glue s3:///glue --recursive +``` -> sam build --template code/cloudformation/stack.yaml - +3. Build and deploy stack. During the `sam deploy --guided` prompts, pass the bucket names to `ArtifactsBucket` and `SourceBucket`: +```bash +> sam build --template code/cloudformation/stack.yaml > sam deploy --guided ``` @@ -58,6 +64,7 @@ The output should be the description and status of Glue Job run: "ExecutionTime": 49, "GlueVersion": "1.0", "Id": "jr_ee276c9398f2d981c850f79cb7b54a28b57d9f6484deb8a9051db4bcbbc12d1f", + "JobMode": "SCRIPT", "JobName": "Feature Engineering", "JobRunState": "SUCCEEDED", "LastModifiedOn": 1638296059130, diff --git a/sfn-glue-sam/code/cloudformation/stack.yaml b/sfn-glue-sam/code/cloudformation/stack.yaml index 56781b5c4..b700f6084 100644 --- a/sfn-glue-sam/code/cloudformation/stack.yaml +++ b/sfn-glue-sam/code/cloudformation/stack.yaml @@ -24,14 +24,13 @@ Resources: Properties: Command: Name: pythonshell - PythonVersion: "3" + PythonVersion: "3.9" ScriptLocation: !Sub "s3://${ArtifactsBucket}/glue/scripts/feature_engineering.py" ExecutionProperty: MaxConcurrentRuns: 1 DefaultArguments: - "--extra-py-files": !Sub "s3://${ArtifactsBucket}/glue/libs/awswrangler-2.12.0-py3-none-any.whl" + "--library-set": "analytics" "--config_path": !Sub "${ETLParametersPath}" - GlueVersion: "1.0" MaxCapacity: 1 MaxRetries: 0 Name: "Feature Engineering" diff --git a/sfn-glue-sam/code/glue/libs/awswrangler-2.12.0-py3-none-any.whl b/sfn-glue-sam/code/glue/libs/awswrangler-2.12.0-py3-none-any.whl deleted file mode 100644 index c85c93a0a..000000000 Binary files a/sfn-glue-sam/code/glue/libs/awswrangler-2.12.0-py3-none-any.whl and /dev/null differ