Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 14 additions & 7 deletions sfn-glue-sam/Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,15 @@ This pattern deploys a Step Functions that includes a Glue Job as one of its ste

The SAM template deploys:
* A Step Functions State Machine
* An EventBridge rule that triggers the Step Functions every 2 days
* An EventBridge rule that triggers the Step Functions every 2 days (disabled by default)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: The schedule is Enabled: False in stack.yaml, so it doesn't trigger automatically after deploy.

* A Glue Job
* IAM roles required to run the application.


## Download
1. Create a new directory, navigate to that directory in a terminal and clone the GitHub repository:
```bash
> git clone https://github.com/NicoliAraujo/serverless-patterns.git
> git clone https://github.com/aws-samples/serverless-patterns.git
```

2. Change directory to the pattern directory:
Expand All @@ -24,19 +24,25 @@ The SAM template deploys:


## Deploy
1. Copy glue job script and libs:
1. Manually create two Amazon S3 buckets:
- `ARTIFACTS_BUCKET` — stores the AWS Glue job script.
- `DATA_BUCKET` — stores the source data for the AWS Glue job.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: These two S3 buckets are not created by the stack. They need to exist before deployment. The original README didn't mention this step at all, so first-time users would likely hit a deploy error without knowing why. Added explicit creation commands with a short note on each bucket's role😀


```bash
> aws s3 cp code/glue s3://<ARTIFACTS_BUCKET>/glue --recursive
> aws s3 mb s3://<ARTIFACTS_BUCKET>
> aws s3 mb s3://<DATA_BUCKET>
```

2. Build and deploy stack:
2. Copy AWS Glue job script:

```bash
> aws s3 cp code/glue s3://<ARTIFACTS_BUCKET>/glue --recursive
```

> sam build --template code/cloudformation/stack.yaml

3. Build and deploy stack. During the `sam deploy --guided` prompts, pass the bucket names to `ArtifactsBucket` and `SourceBucket`:

```bash
> sam build --template code/cloudformation/stack.yaml
> sam deploy --guided
```

Expand All @@ -58,6 +64,7 @@ The output should be the description and status of Glue Job run:
"ExecutionTime": 49,
"GlueVersion": "1.0",
"Id": "jr_ee276c9398f2d981c850f79cb7b54a28b57d9f6484deb8a9051db4bcbbc12d1f",
"JobMode": "SCRIPT",
"JobName": "Feature Engineering",
"JobRunState": "SUCCEEDED",
"LastModifiedOn": 1638296059130,
Expand Down
5 changes: 2 additions & 3 deletions sfn-glue-sam/code/cloudformation/stack.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,14 +24,13 @@ Resources:
Properties:
Command:
Name: pythonshell
PythonVersion: "3"
PythonVersion: "3.9"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ScriptLocation: !Sub "s3://${ArtifactsBucket}/glue/scripts/feature_engineering.py"
ExecutionProperty:
MaxConcurrentRuns: 1
DefaultArguments:
"--extra-py-files": !Sub "s3://${ArtifactsBucket}/glue/libs/awswrangler-2.12.0-py3-none-any.whl"
"--library-set": "analytics"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: The bundled awswrangler-2.12.0 whl is from 2021 and there's no real need to bundle it individually anymore. We can use --library-set: analytics option 👍
See https://docs.aws.amazon.com/glue/latest/dg/add-job-python.html

"--config_path": !Sub "${ETLParametersPath}"
GlueVersion: "1.0"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: GlueVersion is ignored for Python shell jobs.

You don't need to specify the version of AWS Glue since the parameter --glue-version doesn't apply for AWS Glue shell jobs. Any version specified will be ignored.
https://docs.aws.amazon.com/glue/latest/dg/add-job-python.html

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MaxCapacity: 1
MaxRetries: 0
Name: "Feature Engineering"
Expand Down
Binary file not shown.