AWS Data Engineering Pipeline

This is a repository for the Duke University Cloud Computing course project on Serverless Data Engineering Pipeline. For this project, I recreated the below pipeline in iCloud9 (reference: https://github.com/noahgift/awslambda):

Below are the steps of how to build this pipeline in AWS:

1️⃣ Create a new iCloud9 environment dedicated to this project.

🤔 Need a refresher? Please check this repo.

⚠️ Make sure to use name as your unique id for your items in the fang table.

2️⃣ Create a fang table in DynamoDB and SQS queue.

You can check how to do it here.

3️⃣ Build producer Lambda Function

In iCloud9, initialize a serverless application with SAM template:
```
sam init 
```

Inputs: 1, 2, 4, "producer"

Set virtual environment and source it:

# I called my virtual environment "comprehendProducer"
python3 -m venv ~/.comprehendProducer
source ~/.comprehendProducer/bin/activate

Add the code for your application to app.py
Add relevant packages used in your app to requirements.txt file

Install requirements

 cd hello_world/
 pip install -r requirements.txt 
 cd ..

Create a repository (producer) in Elastic Container Registry (ECR) and copy its URI
Build and deploy your serverless application:
```
sam build 
sam deploy --guided
```
When prompted to input URI, paste the URI for the producer repository that you've just created.
Create IAM Role granting Administrator Access to the Producer Lambda function.

🤔 Not sure how to create IAM Role? Check out this video (17 min ).
Add the execution role that you created to the Producer Lambda function.

In case you forgot how to do it:

In AWS console: Lambda ➡️ click on producer function ➡️ configuration ➡️ permissions ➡️ Edit ➡️ Select the role under Existing role.
You are all set with the producer function! Now deactivate virtual environment:
```
deactivate 
cd .. 
```

4️⃣ Create an S3 bucket and note its name

5️⃣ Build consumer Lambda Function

Repeat steps in 3️⃣.

⚠️ In #3 when you add the code for a consumer app to app.py, make sure to replace bucket="fangsentiment" with the name of your S3 bucket.

6️⃣ Add triggers to Lambda Functions

🤔 Not sure how to do it? Check out this video (start times are noted below):

Producer Lambda Function: CloudWatchEvent(30 min)

Consumer Lambda Function: SQS (42 min)

7️⃣ If all goes well, you will see sentiment results in your S3 bucket:

💡Tip: If you've already deployed your Lambda function but need to edit your application, you can make the necessary edits to your app and build and deploy the app again:

sam build && sam deploy

💡Tip: If you don't have space left on disk, you may want to remove a few docker containers that you don't use.

#list containers 
docker image ls 
# remove a container 
docker image rm <containerId>

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
consumer		consumer
images		images
producer		producer
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

consumer

consumer

images

images

producer

producer

.gitignore

.gitignore

README.md

README.md

Repository files navigation

AWS Data Engineering Pipeline

About

Releases

Packages

Languages

Klalena/AWS-Serverless-Data-Engineering-Pipeline

Folders and files

Latest commit

History

Repository files navigation

AWS Data Engineering Pipeline

About

Resources

Stars

Watchers

Forks

Languages