In this lab, you will work with DynamoDB as a key-value database and apply some Create, Read, Update and Delete (CRUD) operations on this NoSQL database.
Note:
First, let’s import some packages. Among these packages, you can find boto3
, which is the AWS Software Development Kit (SDK) for Python that allows you to interact with various AWS services using Python code. With boto3
, you can programmatically access AWS resources such as EC2 instances, S3 buckets, Amazon DynamoDB tables, and more. It provides you with a simple and intuitive interface for managing and integrating AWS services into your Python applications efficiently.
For more information on each of the methods that you will use throughout this lab, you can check out boto3 documentation.
import decimal
import json
import logging
from typing import Any, Dict, List
import boto3
from botocore.exceptions import ClientError
Let’s define the following variable that you will use throughout this lab.
COURSE_PREFIX = 'de-c2w1-dynamodb'
The dataset that you will use in this lab is the sample data from the Amazon DynamoDB Developer Guide (dataset zip file). The sample data consists of 4 JSON files that you can find in the folder data/aws_sample_data
:
ProductCatalog
: Catalog of products that contains information about some products such as the product ID and its characteristics.Forum
: Information about some AWS forums where users post questions or start a thread (i.e., conversation) about AWS services. The information includes the name of the forum and the total number of threads, messages, and views in each forum.Thread
: Information about each forum thread (i.e., conversation), such as the thread subject, the thread message, the total number of views and replies to the given thread, and who lastly posted on the thread.Reply
: Information about the replies of each thread, such as the time of the reply, the reply message, and the user who posted the reply.In this lab, you will create 4 DynamoDB tables (de-c2w1-dynamodb-ProductCatalog
, de-c2w1-dynamodb-Forum
, de-c2w1-dynamodb-Thread
, de-c2w1-dynamodb-Reply
) and load in each the data from the corresponding JSON file.
Note: if you check the content of each JSON file, you will notice the use of letters such as N, S, B. These are known as Data type descriptors that tell DynamoDB how to interpret the type of each field. We will talk more about it later in this lab.
What is a DynamoDB table?
DynamoDB database is a key-value store that stores a set of key-value pairs. Let’s say you have a set of key-value items where each item represents a product. Each item is characterized by a unique key (product ID) and has a set of corresponding attributes (the value of the key). DynamoDB stores this key-value data in a table where each row contains the attributes of one product and it uses the key to uniquely identify each row. This table is different from relational tables because it’s schemaless, which means that neither the attributes nor their data types need to be defined beforehand. Each item can have its own distinct attributes. For example in the product table that you will create in this section, you will have one item that represents a book (Title, Authors, ISBN, Price) and another item that represents a bicycle (BicycleType, Brand, Price, Color) both stored in the same DynamoDB table.
What is DynamoDB table’s primary key?
When you create a dynamoDB table, you need to specify the primary key which is the key that uniquely identifies each item. The primary key could be a simple key - partition key - or a composite primary key - partition key and sort key.
You can learn more about the core components of DynamoDB here.
How will you create the tables?
You will use the DyanmoDB create_table() method. This method expects 3 required parameters:
TableName
: the name of the table.KeySchema
: an array of the attributes that make up the primary key for a table. For each element in this array, you need to specify: AttributeName
: the name of the attribute, and KeyType
: the role that the key attribute will assume (HASH
if it is a partition key and RANGE
if it is a sort key). For example,
'KeySchema'= [
{'AttributeName': 'ForumName', 'KeyType': 'HASH'},
{'AttributeName': 'Subject', 'KeyType': 'RANGE'}
]
AttributeDefinitions
: an array that describes the attributes that make up the primary key. For each element in this array, you need to specify AttributeName
and AttributeType
: the data type for the attribute (S: String, N: Number, B: Binary,…). For example,
'AttributeDefinitions': [
{'AttributeName': 'ForumName', 'AttributeType': 'S'},
{'AttributeName': 'Subject', 'AttributeType': 'S'}
]
There is an additional parameter that you can specify if you don’t wish to pay for DynamoDB based on demand and you want to choose the provisioned mode:
ProvisionedThroughput
: a dictionary that specifies the read/write capacity (or throughput) for a specified table. It consists of two items:
ReadCapacityUnits
: the maximum number of strongly consistent reads consumed per second;WriteCapacityUnits
: the maximum number of writes consumed per second.In this lab, you will create 4 tables, and for each table, you need to specify the parameters that we just listed here. To make it easy for you to access the properties of each table throughout this notebook, we created the following dictionaries that specify the properties for each table.
capacity_units = {'ReadCapacityUnits': 10, 'WriteCapacityUnits': 5}
product_catalog_table = {'table_name': f'{COURSE_PREFIX}-ProductCatalog',
'kwargs': {
'KeySchema': [{'AttributeName': 'Id', 'KeyType': 'HASH'}],
'AttributeDefinitions': [{'AttributeName': 'Id', 'AttributeType': 'N'}],
'ProvisionedThroughput': capacity_units
}
}
forum_table = {'table_name': f'{COURSE_PREFIX}-Forum',
'kwargs': {
'KeySchema': [{'AttributeName': 'Name', 'KeyType': 'HASH'}],
'AttributeDefinitions': [{'AttributeName': 'Name', 'AttributeType': 'S'}],
'ProvisionedThroughput': capacity_units
}
}
thread_table = {'table_name': f'{COURSE_PREFIX}-Thread',
'kwargs': {
'KeySchema': [{'AttributeName': 'ForumName', 'KeyType': 'HASH'},
{'AttributeName': 'Subject', 'KeyType': 'RANGE'}],
'AttributeDefinitions': [{'AttributeName': 'ForumName', 'AttributeType': 'S'},
{'AttributeName': 'Subject', 'AttributeType': 'S'}],
'ProvisionedThroughput': capacity_units
}
}
reply_table = {'table_name': f'{COURSE_PREFIX}-Reply',
'kwargs': {
'KeySchema': [{'AttributeName': 'Id', 'KeyType': 'HASH'},
{'AttributeName': 'ReplyDateTime', 'KeyType': 'RANGE'}],
'AttributeDefinitions': [{'AttributeName': 'Id', 'AttributeType': 'S'},
{'AttributeName': 'ReplyDateTime', 'AttributeType': 'S'}],
'ProvisionedThroughput': capacity_units
}
}
Note that the thread and reply tables will both use a composite primary key, and that the product and forum tables will use a simple primary key.
Note: To interact with AmazonDynamoDB throughout this notebook, you are going to create a boto3
client object. This object allows you to make API requests directly to AWS services to create, delete, or modify resources. When you create a boto3
client object, you will need to specify the AWS services you want to interact with, and then, with the created client object, you can call methods to perform various operations on that resource.
To create the 4 tables, you will use the function create_table_db()
provided in the following cell. This function calls the DynamoDB create_table()
method, and takes in two arguments:
table_name
: the name of the table;kwargs
: A dictionary that specifies the additional arguments for DynamoDB create_table()
such as KeySchema
, AttributeDefinitions
and ProvisionedThroughput
as shown in the previous cell. **kwargs
means that the elements in the dictionary are unpacked into a sequence of arguments.In this first exercise, you will need to replace None
with the appropriate values.
def create_table_db(table_name: str, **kwargs):
client = boto3.client("dynamodb")
### START CODE HERE ### (~ 1 line of code)
response = client.create_table(TableName=table_name, **kwargs)
### END CODE HERE ###
waiter = client.get_waiter("table_exists")
waiter.wait(TableName=table_name)
return response
Now that the create_table_db()
function is ready, you can test it by creating the ProductCatalog
table. The execution should take less than a minute.
response = create_table_db(table_name=product_catalog_table['table_name'], **product_catalog_table["kwargs"])
print(response)
{'TableDescription': {'AttributeDefinitions': [{'AttributeName': 'Id', 'AttributeType': 'N'}], 'TableName': 'de-c2w1-dynamodb-ProductCatalog', 'KeySchema': [{'AttributeName': 'Id', 'KeyType': 'HASH'}], 'TableStatus': 'CREATING', 'CreationDateTime': datetime.datetime(2024, 9, 23, 4, 9, 45, 788000, tzinfo=tzlocal()), 'ProvisionedThroughput': {'NumberOfDecreasesToday': 0, 'ReadCapacityUnits': 10, 'WriteCapacityUnits': 5}, 'TableSizeBytes': 0, 'ItemCount': 0, 'TableArn': 'arn:aws:dynamodb:us-east-1:322144634018:table/de-c2w1-dynamodb-ProductCatalog', 'TableId': '8ca9426b-fae8-43d8-a39a-5ab8026ffb4e', 'DeletionProtectionEnabled': False}, 'ResponseMetadata': {'RequestId': 'DAM05E2JTB5JQM2BO4BJBVB2QJVV4KQNSO5AEMVJF66Q9ASUAAJG', 'HTTPStatusCode': 200, 'HTTPHeaders': {'server': 'Server', 'date': 'Mon, 23 Sep 2024 04:09:45 GMT', 'content-type': 'application/x-amz-json-1.0', 'content-length': '557', 'connection': 'keep-alive', 'x-amzn-requestid': 'DAM05E2JTB5JQM2BO4BJBVB2QJVV4KQNSO5AEMVJF66Q9ASUAAJG', 'x-amz-crc32': '3979629113'}, 'RetryAttempts': 0}}
{'TableDescription': {'AttributeDefinitions': [{'AttributeName': 'Id', 'AttributeType': 'N'}], 'TableName': 'de-c2w1-dynamodb-ProductCatalog', 'KeySchema': [{'AttributeName': 'Id', 'KeyType': 'HASH'}], 'TableStatus': 'CREATING', 'CreationDateTime': datetime.datetime(2024, 2, 14, 6, 42, 38, 872000, tzinfo=tzlocal()), 'ProvisionedThroughput': {'NumberOfDecreasesToday': 0, 'ReadCapacityUnits': 10, 'WriteCapacityUnits': 5}, 'TableSizeBytes': 0, 'ItemCount': 0, 'TableArn': 'arn:aws:dynamodb:us-east-1:631295702609:table/de-c2w1-dynamodb-ProductCatalog', 'TableId': '639df373-f498-4a2d-9851-6c6f6c26d908', 'DeletionProtectionEnabled': False}, 'ResponseMetadata': {'RequestId': 'OJ9GC0U10JH5ILK020C4PM094VVV4KQNSO5AEMVJF66Q9ASUAAJG', 'HTTPStatusCode': 200, 'HTTPHeaders': {'server': 'Server', 'date': 'Wed, 14 Feb 2024 06:42:38 GMT', 'content-type': 'application/x-amz-json-1.0', 'content-length': '557', 'connection': 'keep-alive', 'x-amzn-requestid': 'OJ9GC0U10JH5ILK020C4PM094VVV4KQNSO5AEMVJF66Q9ASUAAJG', 'x-amz-crc32': '1500356689'}, 'RetryAttempts': 0}}
Execute the following command to create the other three tables. The creation of all tables can take around 2 minutes.
for dynamodb_tab in [forum_table, thread_table, reply_table]:
response = create_table_db(dynamodb_tab["table_name"], **dynamodb_tab["kwargs"])
print(response)
{'TableDescription': {'AttributeDefinitions': [{'AttributeName': 'Name', 'AttributeType': 'S'}], 'TableName': 'de-c2w1-dynamodb-Forum', 'KeySchema': [{'AttributeName': 'Name', 'KeyType': 'HASH'}], 'TableStatus': 'CREATING', 'CreationDateTime': datetime.datetime(2024, 9, 23, 4, 11, 8, 283000, tzinfo=tzlocal()), 'ProvisionedThroughput': {'NumberOfDecreasesToday': 0, 'ReadCapacityUnits': 10, 'WriteCapacityUnits': 5}, 'TableSizeBytes': 0, 'ItemCount': 0, 'TableArn': 'arn:aws:dynamodb:us-east-1:322144634018:table/de-c2w1-dynamodb-Forum', 'TableId': 'ee19182e-a625-429f-8c90-829a90ec2c19', 'DeletionProtectionEnabled': False}, 'ResponseMetadata': {'RequestId': 'RI3NNPURAG0JKTHI6NRJ3AKUIBVV4KQNSO5AEMVJF66Q9ASUAAJG', 'HTTPStatusCode': 200, 'HTTPHeaders': {'server': 'Server', 'date': 'Mon, 23 Sep 2024 04:11:08 GMT', 'content-type': 'application/x-amz-json-1.0', 'content-length': '543', 'connection': 'keep-alive', 'x-amzn-requestid': 'RI3NNPURAG0JKTHI6NRJ3AKUIBVV4KQNSO5AEMVJF66Q9ASUAAJG', 'x-amz-crc32': '702040401'}, 'RetryAttempts': 0}}
{'TableDescription': {'AttributeDefinitions': [{'AttributeName': 'ForumName', 'AttributeType': 'S'}, {'AttributeName': 'Subject', 'AttributeType': 'S'}], 'TableName': 'de-c2w1-dynamodb-Thread', 'KeySchema': [{'AttributeName': 'ForumName', 'KeyType': 'HASH'}, {'AttributeName': 'Subject', 'KeyType': 'RANGE'}], 'TableStatus': 'CREATING', 'CreationDateTime': datetime.datetime(2024, 9, 23, 4, 11, 28, 456000, tzinfo=tzlocal()), 'ProvisionedThroughput': {'NumberOfDecreasesToday': 0, 'ReadCapacityUnits': 10, 'WriteCapacityUnits': 5}, 'TableSizeBytes': 0, 'ItemCount': 0, 'TableArn': 'arn:aws:dynamodb:us-east-1:322144634018:table/de-c2w1-dynamodb-Thread', 'TableId': '9f3c5685-8f81-427f-bc03-b05399ef6804', 'DeletionProtectionEnabled': False}, 'ResponseMetadata': {'RequestId': '9FK93PUFDT7Q4T7DJGJKNMNFKNVV4KQNSO5AEMVJF66Q9ASUAAJG', 'HTTPStatusCode': 200, 'HTTPHeaders': {'server': 'Server', 'date': 'Mon, 23 Sep 2024 04:11:28 GMT', 'content-type': 'application/x-amz-json-1.0', 'content-length': '649', 'connection': 'keep-alive', 'x-amzn-requestid': '9FK93PUFDT7Q4T7DJGJKNMNFKNVV4KQNSO5AEMVJF66Q9ASUAAJG', 'x-amz-crc32': '2195994683'}, 'RetryAttempts': 0}}
{'TableDescription': {'AttributeDefinitions': [{'AttributeName': 'Id', 'AttributeType': 'S'}, {'AttributeName': 'ReplyDateTime', 'AttributeType': 'S'}], 'TableName': 'de-c2w1-dynamodb-Reply', 'KeySchema': [{'AttributeName': 'Id', 'KeyType': 'HASH'}, {'AttributeName': 'ReplyDateTime', 'KeyType': 'RANGE'}], 'TableStatus': 'CREATING', 'CreationDateTime': datetime.datetime(2024, 9, 23, 4, 11, 48, 646000, tzinfo=tzlocal()), 'ProvisionedThroughput': {'NumberOfDecreasesToday': 0, 'ReadCapacityUnits': 10, 'WriteCapacityUnits': 5}, 'TableSizeBytes': 0, 'ItemCount': 0, 'TableArn': 'arn:aws:dynamodb:us-east-1:322144634018:table/de-c2w1-dynamodb-Reply', 'TableId': '8556a55a-9bd0-426e-af19-dcc179d4e9d5', 'DeletionProtectionEnabled': False}, 'ResponseMetadata': {'RequestId': '1B0C2P146076P3D04360UC0N5NVV4KQNSO5AEMVJF66Q9ASUAAJG', 'HTTPStatusCode': 200, 'HTTPHeaders': {'server': 'Server', 'date': 'Mon, 23 Sep 2024 04:11:48 GMT', 'content-type': 'application/x-amz-json-1.0', 'content-length': '645', 'connection': 'keep-alive', 'x-amzn-requestid': '1B0C2P146076P3D04360UC0N5NVV4KQNSO5AEMVJF66Q9ASUAAJG', 'x-amz-crc32': '2267847364'}, 'RetryAttempts': 0}}
Go to the AWS Console, search for DynamoDB, click on Tables on the left, and check that the tables have been created.
You will now load data into each table from the following JSON files:
Forum.json
ProductCatalog.json
Reply.json
Thread.json
You can load data item by item or as a batch of items. Let’s explore each option.
To load data item by item, you will use the method: DynamoDB put_item(). This method expects two required arguments (1) the table name and (2) the item you need to add. The item should be a dictionary that contains the attributes of the item (and most importantly the value of its primary key), for example, here’s the format of what the item should look like (an item in the reply table):
item = {
"Id": {
"S": "Amazon DynamoDB#DynamoDB Thread 1"
},
"ReplyDateTime": {
"S": "2015-09-15T19:58:22.947Z"
},
"Message": {
"S": "DynamoDB Thread 1 Reply 1 text"
},
"PostedBy": {
"S": "User A"
}
}
This JSON structure that looks as follows:
{
"<AttributeName>": {
"<DataType>": "<Value>"
},
"<ListAttribute>": {
"<DataType>": [
{
"<DataType>": "<Value1>"
},
{
"<DataType>": "<Value2>"
}]
}
}
is called Marshal JSON. This is similar to a regular JSON file but it also includes the types of each value. The <DataType>
placeholders specify the data type of the corresponding value; you can learn more about the Data type conventions for DynamoDB in the documentation. The good news is that all of the items provided in the sample JSON files are already in this expected format for DynamoDB put_item()
.
In this section, you are provided with two functions:
read_data()
: reads a sample JSON file and returns the items as a Python dictionary;put_item_db()
: This function takes in as arguments the table name and the details of the item as a Python dictionary, calls DynamoDB put_item()
and passes to it the table name and the item.In the exercise of this section, you just need to replace None
inside the function put_item_db()
. You don’t need to modify anything inside read_data()
. You will use the read_data()
function to read all items from the JSON file, and then you will use the function put_item_db()
to load each item to a given DynamoDB table.
def read_data(file_path: str) -> Dict[str, Any]:
with open(file_path, "r") as json_file:
items = json.load(json_file)
return items
In this exercise, you need to replace None
with the appropriate values:
client.put_item()
method of the object client
to load the data, which expects three arguments: TableName
, the Item
to be loaded, and some keyword arguments.def put_item_db( table_name: str, item: Dict[str, Any], **kwargs):
### START CODE HERE ### (~ 2 lines of code)
client = boto3.client("dynamodb")
response = client.put_item(TableName=table_name, Item=item, **kwargs)
### END CODE HERE ###
return response
Now, let’s load the items from ProductCatalog
and Thread
files one by one to the corresponding tables.
for dynamodb_tab in [product_catalog_table, thread_table]:
file_name = dynamodb_tab['table_name'].split('-')[-1]
items = read_data(file_path=f'./data/aws_sample_data/{file_name}.json')
for item in items[dynamodb_tab["table_name"]]:
put_item_db(table_name=dynamodb_tab["table_name"], item=item['PutRequest']['Item'])
Now, you will create the batch_write_item_db()
function which calls the DynamoDB batch_write_item(). This later method allows you to put or delete multiple items in one or more tables.
Again, you will have to read the two JSON files Reply
and Forum
and then load the items into the tables. Let’s load the data into the Reply
and Forum
tables.
In this exercise, you need to replace None
with the appropriate values:
client.batch_write_item()
method of the client
object. It should receive the items that need to be loaded and some keyword arguments. Assume that the input items
is in the right format that batch_write_item()
(the format of the items stored in the sample JSON files is exactly the format that batch_write_item()
expects. For more info, you can check the documentation here).def batch_write_item_db(items: Dict[str, Any], **kwargs):
### START CODE HERE ### (~ 2 lines of code)
client = boto3.client("dynamodb")
response = client.batch_write_item(RequestItems=items, **kwargs)
### END CODE HERE ###
return response
Now, let’s read the data from the JSON sample files: Reply
and Forum
and then load the items as a batch into the corresponding tables.
for dynamodb_tab in [reply_table, forum_table]:
file_name = dynamodb_tab['table_name'].split('-')[-1]
items = read_data(file_path=f'./data/aws_sample_data/{file_name}.json')
response = batch_write_item_db(items=items)
print(response)
{'UnprocessedItems': {}, 'ResponseMetadata': {'RequestId': '000045D81B5VRT7H6CQREDNENFVV4KQNSO5AEMVJF66Q9ASUAAJG', 'HTTPStatusCode': 200, 'HTTPHeaders': {'server': 'Server', 'date': 'Mon, 23 Sep 2024 04:18:29 GMT', 'content-type': 'application/x-amz-json-1.0', 'content-length': '23', 'connection': 'keep-alive', 'x-amzn-requestid': '000045D81B5VRT7H6CQREDNENFVV4KQNSO5AEMVJF66Q9ASUAAJG', 'x-amz-crc32': '4185382651'}, 'RetryAttempts': 0}}
{'UnprocessedItems': {}, 'ResponseMetadata': {'RequestId': 'S1FMCT8BADAURF0Q6C8RH5SB9RVV4KQNSO5AEMVJF66Q9ASUAAJG', 'HTTPStatusCode': 200, 'HTTPHeaders': {'server': 'Server', 'date': 'Mon, 23 Sep 2024 04:18:30 GMT', 'content-type': 'application/x-amz-json-1.0', 'content-length': '23', 'connection': 'keep-alive', 'x-amzn-requestid': 'S1FMCT8BADAURF0Q6C8RH5SB9RVV4KQNSO5AEMVJF66Q9ASUAAJG', 'x-amz-crc32': '4185382651'}, 'RetryAttempts': 0}}
{'UnprocessedItems': {}, 'ResponseMetadata': {'RequestId': '4P678E81BOHRCUN82FFREOTC8NVV4KQNSO5AEMVJF66Q9ASUAAJG', 'HTTPStatusCode': 200, 'HTTPHeaders': {'server': 'Server', 'date': 'Wed, 14 Feb 2024 06:44:36 GMT', 'content-type': 'application/x-amz-json-1.0', 'content-length': '23', 'connection': 'keep-alive', 'x-amzn-requestid': '4P678E81BOHRCUN82FFREOTC8NVV4KQNSO5AEMVJF66Q9ASUAAJG', 'x-amz-crc32': '4185382651'}, 'RetryAttempts': 0}}
{'UnprocessedItems': {}, 'ResponseMetadata': {'RequestId': 'R53NDPHFEH0UEL0MG6PFG8FEJRVV4KQNSO5AEMVJF66Q9ASUAAJG', 'HTTPStatusCode': 200, 'HTTPHeaders': {'server': 'Server', 'date': 'Wed, 14 Feb 2024 06:44:36 GMT', 'content-type': 'application/x-amz-json-1.0', 'content-length': '23', 'connection': 'keep-alive', 'x-amzn-requestid': 'R53NDPHFEH0UEL0MG6PFG8FEJRVV4KQNSO5AEMVJF66Q9ASUAAJG', 'x-amz-crc32': '4185382651'}, 'RetryAttempts': 0}
}```
<a name='5'></a>
## 5 - Read Data from the Tables
In this section, you will experiment with various approaches to read data from the DynamoDB tables.
<a name='5.1'></a>
### 5.1 - Scan the Full Table
You can perform a `DynamoDB scan()` operation on a DynamoDB table that fully scans the table and returns the items in 1MB chunks. Scanning is the slowest and most expensive way to get data out of DynamoDB. Let's first explore this approach.
<a name='ex04'></a>
### Exercise 4
In this exercise, you need to replace `None` with the appropriate values:
1. Create the Client object `client`.
2. Call the `client.scan()` method of the `client` object. It should receive the table name and keyword arguments. In the [DynamoDB boto3 documentation](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb.html), search for the `scan` method to check what it takes as parameters.
```python
def scan_db(table_name: str, **kwargs):
### START CODE HERE ### (~ 2 lines of code)
client = boto3.client("dynamodb")
response = client.scan(TableName=table_name, **kwargs)
### END CODE HERE ###
return response
Let’s make a full scan on the ProductCatalog
table:
response = scan_db(product_catalog_table['table_name'])
print(f"Queried data for table {product_catalog_table['table_name']}:\n{response}")
Queried data for table de-c2w1-dynamodb-ProductCatalog:
{'Items': [{'Title': {'S': '18-Bike-204'}, 'Price': {'N': '500'}, 'Brand': {'S': 'Brand-Company C'}, 'Description': {'S': '205 Description'}, 'Color': {'L': [{'S': 'Red'}, {'S': 'Black'}]}, 'ProductCategory': {'S': 'Bicycle'}, 'Id': {'N': '205'}, 'BicycleType': {'S': 'Hybrid'}}, {'Title': {'S': '19-Bike-203'}, 'Price': {'N': '300'}, 'Brand': {'S': 'Brand-Company B'}, 'Description': {'S': '203 Description'}, 'Color': {'L': [{'S': 'Red'}, {'S': 'Green'}, {'S': 'Black'}]}, 'ProductCategory': {'S': 'Bicycle'}, 'Id': {'N': '203'}, 'BicycleType': {'S': 'Road'}}, {'Title': {'S': '21-Bike-202'}, 'Price': {'N': '200'}, 'Brand': {'S': 'Brand-Company A'}, 'Description': {'S': '202 Description'}, 'Color': {'L': [{'S': 'Green'}, {'S': 'Black'}]}, 'ProductCategory': {'S': 'Bicycle'}, 'Id': {'N': '202'}, 'BicycleType': {'S': 'Road'}}, {'Title': {'S': '18-Bike-201'}, 'Price': {'N': '100'}, 'Brand': {'S': 'Mountain A'}, 'Description': {'S': '201 Description'}, 'Color': {'L': [{'S': 'Red'}, {'S': 'Black'}]}, 'ProductCategory': {'S': 'Bicycle'}, 'Id': {'N': '201'}, 'BicycleType': {'S': 'Road'}}, {'Title': {'S': '18-Bike-204'}, 'Price': {'N': '400'}, 'Brand': {'S': 'Brand-Company B'}, 'Description': {'S': '204 Description'}, 'Color': {'L': [{'S': 'Red'}]}, 'ProductCategory': {'S': 'Bicycle'}, 'Id': {'N': '204'}, 'BicycleType': {'S': 'Mountain'}}, {'Title': {'S': 'Book 102 Title'}, 'InPublication': {'BOOL': True}, 'PageCount': {'N': '600'}, 'Dimensions': {'S': '8.5 x 11.0 x 0.8'}, 'ISBN': {'S': '222-2222222222'}, 'Authors': {'L': [{'S': 'Author1'}, {'S': 'Author2'}]}, 'Price': {'N': '20'}, 'ProductCategory': {'S': 'Book'}, 'Id': {'N': '102'}}, {'Title': {'S': 'Book 103 Title'}, 'InPublication': {'BOOL': False}, 'PageCount': {'N': '600'}, 'Dimensions': {'S': '8.5 x 11.0 x 1.5'}, 'ISBN': {'S': '333-3333333333'}, 'Authors': {'L': [{'S': 'Author1'}, {'S': 'Author2'}]}, 'Price': {'N': '2000'}, 'ProductCategory': {'S': 'Book'}, 'Id': {'N': '103'}}, {'Title': {'S': 'Book 101 Title'}, 'InPublication': {'BOOL': True}, 'PageCount': {'N': '500'}, 'Dimensions': {'S': '8.5 x 11.0 x 0.5'}, 'ISBN': {'S': '111-1111111111'}, 'Authors': {'L': [{'S': 'Author1'}]}, 'Price': {'N': '2'}, 'ProductCategory': {'S': 'Book'}, 'Id': {'N': '101'}}], 'Count': 8, 'ScannedCount': 8, 'ResponseMetadata': {'RequestId': '6ADEF83A3G8JFMEQFJA38MKBUVVV4KQNSO5AEMVJF66Q9ASUAAJG', 'HTTPStatusCode': 200, 'HTTPHeaders': {'server': 'Server', 'date': 'Mon, 23 Sep 2024 04:19:53 GMT', 'content-type': 'application/x-amz-json-1.0', 'content-length': '2043', 'connection': 'keep-alive', 'x-amzn-requestid': '6ADEF83A3G8JFMEQFJA38MKBUVVV4KQNSO5AEMVJF66Q9ASUAAJG', 'x-amz-crc32': '873234463'}, 'RetryAttempts': 0}}
You can that the returned data has the same input structure that the method DynamoDB put_item()
expects, which is the Marshal JSON. Marshal JSON is different from the usual JSON format that looks like the following:
{
"AttributeName": "Value",
"ListAttribute": [
"Value1",
"Value2"
]
}
The usual JSON format is the typical format you will find in real life, as it can be easily parsed into Python Dictionaries. So you may need to convert the output returned by the DynamoDB scan()
method into the usual JSON format, or you may need to convert data that is in the usual JSON format into Marshal JSON before inserting it into a DynamoDB table. The next optional part shows you how you can convert data in Marshal JSON into the usual JSON format. You can try the optional part or feel free to skip it.
Now, if you want to process data returned from DynamoDB operations with Python, you have to convert the data format to the usual JSON. boto3
provides some utilities to help you with this process.
To convert the ProductCatalog
data returned by the scan method into a regular JSON format to be used in Python dictionaries, you can use the data_deserializer()
function provided below that takes in as input the data in Marshal JSON. This function consists of the following:
boto3
resource instantiation: Resources is a higher-level abstraction class built on top of Client that is used to represent AWS resources as Python objects, providing in this way a Pythonic and Object Oriented interface. With that resource, you can create a deserializer object by calling the method TypeDeserializer()
.deserializer
object to call the deserializer.deserialize()
method and apply it to each value to convert it into its deserialized version. (Note: if the returned value from deserializer.deserialize(v)
is an instance of decimal.Decimal
, you should convert it to float. This process of checking if the returned value is an instance of decimal.Decimal
should be done because, by default, numerical values in DynamoDB are deserialized as decimals, which need to be handled properly if you want to work with the result; the easiest way is to convert them directly to float data type).The below function uses dictionary comprehension to iterate through the dictionary items.
def data_deserializer(data: Dict[str, Any]):
boto3.resource("dynamodb")
deserializer = boto3.dynamodb.types.TypeDeserializer()
deserialized_data = {
k: (
float(deserializer.deserialize(v))
if isinstance(deserializer.deserialize(v), decimal.Decimal)
else deserializer.deserialize(v)
)
for k, v in data.items()
}
return deserialized_data
Execute the method over the previous response to see the difference in the format.
for item in response['Items']:
print(f"DynamoDB returned Marshal JSON:\n{item}")
print(f"Deserialized python dictionary:\n {data_deserializer(item)}")
DynamoDB returned Marshal JSON:
{'Title': {'S': '18-Bike-204'}, 'Price': {'N': '500'}, 'Brand': {'S': 'Brand-Company C'}, 'Description': {'S': '205 Description'}, 'Color': {'L': [{'S': 'Red'}, {'S': 'Black'}]}, 'ProductCategory': {'S': 'Bicycle'}, 'Id': {'N': '205'}, 'BicycleType': {'S': 'Hybrid'}}
Deserialized python dictionary:
{'Title': '18-Bike-204', 'Price': 500.0, 'Brand': 'Brand-Company C', 'Description': '205 Description', 'Color': ['Red', 'Black'], 'ProductCategory': 'Bicycle', 'Id': 205.0, 'BicycleType': 'Hybrid'}
DynamoDB returned Marshal JSON:
{'Title': {'S': '19-Bike-203'}, 'Price': {'N': '300'}, 'Brand': {'S': 'Brand-Company B'}, 'Description': {'S': '203 Description'}, 'Color': {'L': [{'S': 'Red'}, {'S': 'Green'}, {'S': 'Black'}]}, 'ProductCategory': {'S': 'Bicycle'}, 'Id': {'N': '203'}, 'BicycleType': {'S': 'Road'}}
Deserialized python dictionary:
{'Title': '19-Bike-203', 'Price': 300.0, 'Brand': 'Brand-Company B', 'Description': '203 Description', 'Color': ['Red', 'Green', 'Black'], 'ProductCategory': 'Bicycle', 'Id': 203.0, 'BicycleType': 'Road'}
DynamoDB returned Marshal JSON:
{'Title': {'S': '21-Bike-202'}, 'Price': {'N': '200'}, 'Brand': {'S': 'Brand-Company A'}, 'Description': {'S': '202 Description'}, 'Color': {'L': [{'S': 'Green'}, {'S': 'Black'}]}, 'ProductCategory': {'S': 'Bicycle'}, 'Id': {'N': '202'}, 'BicycleType': {'S': 'Road'}}
Deserialized python dictionary:
{'Title': '21-Bike-202', 'Price': 200.0, 'Brand': 'Brand-Company A', 'Description': '202 Description', 'Color': ['Green', 'Black'], 'ProductCategory': 'Bicycle', 'Id': 202.0, 'BicycleType': 'Road'}
DynamoDB returned Marshal JSON:
{'Title': {'S': '18-Bike-201'}, 'Price': {'N': '100'}, 'Brand': {'S': 'Mountain A'}, 'Description': {'S': '201 Description'}, 'Color': {'L': [{'S': 'Red'}, {'S': 'Black'}]}, 'ProductCategory': {'S': 'Bicycle'}, 'Id': {'N': '201'}, 'BicycleType': {'S': 'Road'}}
Deserialized python dictionary:
{'Title': '18-Bike-201', 'Price': 100.0, 'Brand': 'Mountain A', 'Description': '201 Description', 'Color': ['Red', 'Black'], 'ProductCategory': 'Bicycle', 'Id': 201.0, 'BicycleType': 'Road'}
DynamoDB returned Marshal JSON:
{'Title': {'S': '18-Bike-204'}, 'Price': {'N': '400'}, 'Brand': {'S': 'Brand-Company B'}, 'Description': {'S': '204 Description'}, 'Color': {'L': [{'S': 'Red'}]}, 'ProductCategory': {'S': 'Bicycle'}, 'Id': {'N': '204'}, 'BicycleType': {'S': 'Mountain'}}
Deserialized python dictionary:
{'Title': '18-Bike-204', 'Price': 400.0, 'Brand': 'Brand-Company B', 'Description': '204 Description', 'Color': ['Red'], 'ProductCategory': 'Bicycle', 'Id': 204.0, 'BicycleType': 'Mountain'}
DynamoDB returned Marshal JSON:
{'Title': {'S': 'Book 102 Title'}, 'InPublication': {'BOOL': True}, 'PageCount': {'N': '600'}, 'Dimensions': {'S': '8.5 x 11.0 x 0.8'}, 'ISBN': {'S': '222-2222222222'}, 'Authors': {'L': [{'S': 'Author1'}, {'S': 'Author2'}]}, 'Price': {'N': '20'}, 'ProductCategory': {'S': 'Book'}, 'Id': {'N': '102'}}
Deserialized python dictionary:
{'Title': 'Book 102 Title', 'InPublication': True, 'PageCount': 600.0, 'Dimensions': '8.5 x 11.0 x 0.8', 'ISBN': '222-2222222222', 'Authors': ['Author1', 'Author2'], 'Price': 20.0, 'ProductCategory': 'Book', 'Id': 102.0}
DynamoDB returned Marshal JSON:
{'Title': {'S': 'Book 103 Title'}, 'InPublication': {'BOOL': False}, 'PageCount': {'N': '600'}, 'Dimensions': {'S': '8.5 x 11.0 x 1.5'}, 'ISBN': {'S': '333-3333333333'}, 'Authors': {'L': [{'S': 'Author1'}, {'S': 'Author2'}]}, 'Price': {'N': '2000'}, 'ProductCategory': {'S': 'Book'}, 'Id': {'N': '103'}}
Deserialized python dictionary:
{'Title': 'Book 103 Title', 'InPublication': False, 'PageCount': 600.0, 'Dimensions': '8.5 x 11.0 x 1.5', 'ISBN': '333-3333333333', 'Authors': ['Author1', 'Author2'], 'Price': 2000.0, 'ProductCategory': 'Book', 'Id': 103.0}
DynamoDB returned Marshal JSON:
{'Title': {'S': 'Book 101 Title'}, 'InPublication': {'BOOL': True}, 'PageCount': {'N': '500'}, 'Dimensions': {'S': '8.5 x 11.0 x 0.5'}, 'ISBN': {'S': '111-1111111111'}, 'Authors': {'L': [{'S': 'Author1'}]}, 'Price': {'N': '2'}, 'ProductCategory': {'S': 'Book'}, 'Id': {'N': '101'}}
Deserialized python dictionary:
{'Title': 'Book 101 Title', 'InPublication': True, 'PageCount': 500.0, 'Dimensions': '8.5 x 11.0 x 0.5', 'ISBN': '111-1111111111', 'Authors': ['Author1'], 'Price': 2.0, 'ProductCategory': 'Book', 'Id': 101.0}
If you want to understand more about the transformation process between Marshall JSON and JSON/Python dictionaries, you can find tools like this one that will let you practice with them. You can also take a look at the boto3
’s documentation to see how TypeSerializer and TypeDeserializer are implemented.
The DynamoDB scan()
method returns all items in a table. If you want to read a single item, you could use the DynamoDB get_item()
method. This method expects the name of the table and the primary key of the requested item. It is the cheapest and fastest way to get data from DynamoDB.
In the following function, call the client.get_item()
method of the client
object. It should receive the table name, key and keyword arguments. For more information about this method, you can search for the get_item
in the documentation.
def get_item_db(table_name, key: Dict[str, Any], **kwargs):
client = boto3.client("dynamodb")
try:
### START CODE HERE ### (~ 1 line of code)
response = client.get_item(TableName=table_name, Key=key, **kwargs)
### END CODE HERE ###
except ClientError as e:
error = e.response.get("Error", {})
logging.error(
f"Failed to query DynamoDB. Error: {error.get('Message')}"
)
response = {}
return response
Get the item with Id 101 from the ProductCatalog
table.
response = get_item_db(table_name=product_catalog_table['table_name'],
key={'Id': {'N': '101'}})
print(response)
{'Item': {'Title': {'S': 'Book 101 Title'}, 'InPublication': {'BOOL': True}, 'PageCount': {'N': '500'}, 'Dimensions': {'S': '8.5 x 11.0 x 0.5'}, 'ISBN': {'S': '111-1111111111'}, 'Authors': {'L': [{'S': 'Author1'}]}, 'Price': {'N': '2'}, 'ProductCategory': {'S': 'Book'}, 'Id': {'N': '101'}}, 'ResponseMetadata': {'RequestId': 'FU95D5B56JKK4H8CJ62OTCR5VBVV4KQNSO5AEMVJF66Q9ASUAAJG', 'HTTPStatusCode': 200, 'HTTPHeaders': {'server': 'Server', 'date': 'Mon, 23 Sep 2024 04:21:37 GMT', 'content-type': 'application/x-amz-json-1.0', 'content-length': '263', 'connection': 'keep-alive', 'x-amzn-requestid': 'FU95D5B56JKK4H8CJ62OTCR5VBVV4KQNSO5AEMVJF66Q9ASUAAJG', 'x-amz-crc32': '3181387427'}, 'RetryAttempts': 0}}
{'Item': {'Title': {'S': 'Book 101 Title'}, 'InPublication': {'BOOL': True}, 'PageCount': {'N': '500'}, 'Dimensions': {'S': '8.5 x 11.0 x 0.5'}, 'ISBN': {'S': '111-1111111111'}, 'Authors': {'L': [{'S': 'Author1'}]}, 'Price': {'N': '2'}, 'ProductCategory': {'S': 'Book'}, 'Id': {'N': '101'}}, 'ResponseMetadata': {'RequestId': '08VIS0M7LH396M766IOPU54E9JVV4KQNSO5AEMVJF66Q9ASUAAJG', 'HTTPStatusCode': 200, 'HTTPHeaders': {'server': 'Server', 'date': 'Wed, 14 Feb 2024 06:44:53 GMT', 'content-type': 'application/x-amz-json-1.0', 'content-length': '263', 'connection': 'keep-alive', 'x-amzn-requestid': '08VIS0M7LH396M766IOPU54E9JVV4KQNSO5AEMVJF66Q9ASUAAJG', 'x-amz-crc32': '3181387427'}, 'RetryAttempts': 0}}
By default, a read from DynamoDB will use eventual consistency. A consistent read in DynamoDB is cheaper than a strongly consistent read. Several options can be added to the read methods, some of the regularly used ones are:
ConsistentRead
: specifies that a strongly consistent read of the table is required;ProjectionExpression
: specifies what attributes should be returned;ReturnConsumedCapacity
: determines what level of detail about the consumed capacity the response should return.You can find more information about the parameters that the DynamoDB.Client.get_item()
accepts by reading the documentation.
In the following code, you will:
ConsistentRead
to True
to ensure strongly consistent reads.ProductCategory
, Price
, and Title
using the ProjectionExpression
attribute.ReturnConsumedCapacity
to 'TOTAL'
.Id=101
from the ProductCatalog
table.kwargs = {'ConsistentRead': True,
'ProjectionExpression': 'ProductCategory, Price, Title',
'ReturnConsumedCapacity': 'TOTAL'}
response = get_item_db(table_name=product_catalog_table['table_name'], key={'Id': {'N': '101'}}, **kwargs)
print(response)
{'Item': {'Price': {'N': '2'}, 'Title': {'S': 'Book 101 Title'}, 'ProductCategory': {'S': 'Book'}}, 'ConsumedCapacity': {'TableName': 'de-c2w1-dynamodb-ProductCatalog', 'CapacityUnits': 1.0}, 'ResponseMetadata': {'RequestId': 'FL2GOVEBK5DB701N31NT6F89FBVV4KQNSO5AEMVJF66Q9ASUAAJG', 'HTTPStatusCode': 200, 'HTTPHeaders': {'server': 'Server', 'date': 'Mon, 23 Sep 2024 04:21:48 GMT', 'content-type': 'application/x-amz-json-1.0', 'content-length': '177', 'connection': 'keep-alive', 'x-amzn-requestid': 'FL2GOVEBK5DB701N31NT6F89FBVV4KQNSO5AEMVJF66Q9ASUAAJG', 'x-amz-crc32': '2450922370'}, 'RetryAttempts': 0}}
The previous request consumed 1.0 RCU because this item is less than 4KB. (RCU stands for Read Capacity Unit: “One read capacity unit represents one strongly consistent read per second, or two eventually consistent reads per second, for an item up to 4 KB in size”, reference).
If you run again the command but remove the ConsistentRead option, you can see that eventually consistent reads consume half as much capacity:
kwargs = {'ReturnConsumedCapacity': 'TOTAL',
'ProjectionExpression': 'ProductCategory, Price, Title'
}
response = get_item_db(table_name=product_catalog_table['table_name'],
key={'Id': {'N': '101'}}, **kwargs
)
print(response)
{'Item': {'Price': {'N': '2'}, 'Title': {'S': 'Book 101 Title'}, 'ProductCategory': {'S': 'Book'}}, 'ConsumedCapacity': {'TableName': 'de-c2w1-dynamodb-ProductCatalog', 'CapacityUnits': 0.5}, 'ResponseMetadata': {'RequestId': 'JHD3INLPGG15LUL05NF1PB6SRRVV4KQNSO5AEMVJF66Q9ASUAAJG', 'HTTPStatusCode': 200, 'HTTPHeaders': {'server': 'Server', 'date': 'Mon, 23 Sep 2024 04:21:53 GMT', 'content-type': 'application/x-amz-json-1.0', 'content-length': '177', 'connection': 'keep-alive', 'x-amzn-requestid': 'JHD3INLPGG15LUL05NF1PB6SRRVV4KQNSO5AEMVJF66Q9ASUAAJG', 'x-amz-crc32': '685575682'}, 'RetryAttempts': 0}}
In DynamoDB, an item collection is a group of items that share the same partition key value, which means that items are related. You can query the items that belong to an item collection (i.e., that have the same partition key) using DynamoDB query() method. In this method, you need to specify the particular value of the partition key of the items in interest.
Item Collections only exist in tables that have both a Partition Key and a Sort Key. Optionally, you can provide the query method a sort key attribute and use a comparison operator to refine the search results.
In the following exercise, you will use the Reply
table as it has both a Partition and a Sort key. Let’s first check its content.
response = scan_db(reply_table['table_name'])
print(response)
{'Items': [{'ReplyDateTime': {'S': '2015-09-15T19:58:22.947Z'}, 'Message': {'S': 'DynamoDB Thread 1 Reply 1 text'}, 'PostedBy': {'S': 'User A'}, 'Id': {'S': 'Amazon DynamoDB#DynamoDB Thread 1'}}, {'ReplyDateTime': {'S': '2015-09-22T19:58:22.947Z'}, 'Message': {'S': 'DynamoDB Thread 1 Reply 2 text'}, 'PostedBy': {'S': 'User B'}, 'Id': {'S': 'Amazon DynamoDB#DynamoDB Thread 1'}}, {'ReplyDateTime': {'S': '2015-09-29T19:58:22.947Z'}, 'Message': {'S': 'DynamoDB Thread 2 Reply 1 text'}, 'PostedBy': {'S': 'User A'}, 'Id': {'S': 'Amazon DynamoDB#DynamoDB Thread 2'}}, {'ReplyDateTime': {'S': '2015-10-05T19:58:22.947Z'}, 'Message': {'S': 'DynamoDB Thread 2 Reply 2 text'}, 'PostedBy': {'S': 'User A'}, 'Id': {'S': 'Amazon DynamoDB#DynamoDB Thread 2'}}], 'Count': 4, 'ScannedCount': 4, 'ResponseMetadata': {'RequestId': 'DHP6GEQCSE36BF7RLDSAR5S2AFVV4KQNSO5AEMVJF66Q9ASUAAJG', 'HTTPStatusCode': 200, 'HTTPHeaders': {'server': 'Server', 'date': 'Mon, 23 Sep 2024 04:22:08 GMT', 'content-type': 'application/x-amz-json-1.0', 'content-length': '730', 'connection': 'keep-alive', 'x-amzn-requestid': 'DHP6GEQCSE36BF7RLDSAR5S2AFVV4KQNSO5AEMVJF66Q9ASUAAJG', 'x-amz-crc32': '1330173439'}, 'RetryAttempts': 0}}
Each reply in this table has an Id
(Partition Key) that specifies in which thread the given reply appeared. The data consists of two threads in total that belong to the forum “Amazon DynamoDB”, and each thread has 2 replies. Let’s query the replies that belong to Thread 1.
You will use the query_db()
function defined below. This function calls the method DynamoDB query()
which expects the particular value of the partition key and returns all items that have the specified partition key value. You can assume that the kwargs
input of query_db()
method contains the needed information (particular primary key value) for the DynamoDB query()
method.
def query_db(table_name: str,**kwargs,):
client = boto3.client("dynamodb")
try:
response = client.query(
TableName=table_name,
**kwargs,
)
logging.info(f"Response {response}")
except ClientError as e:
error = e.response.get("Error", {})
logging.error(
f"Failed to query DynamoDB. Error: {error.get('Message')}"
)
raise
else:
logging.info(f"Query result {response.get('Items', {})}")
return response
Now let’s get into the details of the dictionary kwargs
that is passed to client.query()
.
The following cell shows an example of what kwargs
should contain, as expected by the DynamoDB query()
method:
KeyConditionExpression
: is the condition that specifies the partition key value of the items that need to be retrieved; you can see in this syntax the name of the partition key which is Id
and its particular value is denoted with another parameter :Id
which is defined in the next argument ExpressionAttributeValues
. To understand more about this syntax, you can always check the documentation. The parameter: ReturnedConsumedCapacity
determines what level of detail about the consumed capacity the response should return.
kwargs = {'ReturnConsumedCapacity': 'TOTAL',
'KeyConditionExpression': 'Id = :Id',
'ExpressionAttributeValues': {':Id': {'S': 'Amazon DynamoDB#DynamoDB Thread 1'}}
}
# returns the items that has ID = 'Amazon DynamoDB#DynamoDB Thread 1'
response = query_db(table_name=reply_table['table_name'], **kwargs)
print(response)
{'Items': [{'ReplyDateTime': {'S': '2015-09-15T19:58:22.947Z'}, 'Message': {'S': 'DynamoDB Thread 1 Reply 1 text'}, 'PostedBy': {'S': 'User A'}, 'Id': {'S': 'Amazon DynamoDB#DynamoDB Thread 1'}}, {'ReplyDateTime': {'S': '2015-09-22T19:58:22.947Z'}, 'Message': {'S': 'DynamoDB Thread 1 Reply 2 text'}, 'PostedBy': {'S': 'User B'}, 'Id': {'S': 'Amazon DynamoDB#DynamoDB Thread 1'}}], 'Count': 2, 'ScannedCount': 2, 'ConsumedCapacity': {'TableName': 'de-c2w1-dynamodb-Reply', 'CapacityUnits': 0.5}, 'ResponseMetadata': {'RequestId': 'UPCPJTA04R22TEA23V7MP2TCQVVV4KQNSO5AEMVJF66Q9ASUAAJG', 'HTTPStatusCode': 200, 'HTTPHeaders': {'server': 'Server', 'date': 'Mon, 23 Sep 2024 04:22:37 GMT', 'content-type': 'application/x-amz-json-1.0', 'content-length': '462', 'connection': 'keep-alive', 'x-amzn-requestid': 'UPCPJTA04R22TEA23V7MP2TCQVVV4KQNSO5AEMVJF66Q9ASUAAJG', 'x-amz-crc32': '584739435'}, 'RetryAttempts': 0}}
You can also query the items that share the same partition key and also satisfy a certain condition on the sort key. Since the sort key of the Reply table is a timestamp, you can add a condition to KeyConditionExpression
to get the replies of a particular thread that were posted after a certain time. Take a closer look at how the sort key is compared with the :ts
parameter and how this parameter is defined in the ExpressionAttributeValues
.
kwargs = {'ReturnConsumedCapacity': 'TOTAL',
'KeyConditionExpression': 'Id = :Id and ReplyDateTime > :ts',
'ExpressionAttributeValues': {':Id': {'S': 'Amazon DynamoDB#DynamoDB Thread 1'},
':ts' : {'S':"2015-09-21"}
}
}
response = query_db(table_name=reply_table['table_name'], **kwargs)
print(response)
{'Items': [{'ReplyDateTime': {'S': '2015-09-22T19:58:22.947Z'}, 'Message': {'S': 'DynamoDB Thread 1 Reply 2 text'}, 'PostedBy': {'S': 'User B'}, 'Id': {'S': 'Amazon DynamoDB#DynamoDB Thread 1'}}], 'Count': 1, 'ScannedCount': 1, 'ConsumedCapacity': {'TableName': 'de-c2w1-dynamodb-Reply', 'CapacityUnits': 0.5}, 'ResponseMetadata': {'RequestId': 'V3OP7U9HHDPEVHG558QFAP947JVV4KQNSO5AEMVJF66Q9ASUAAJG', 'HTTPStatusCode': 200, 'HTTPHeaders': {'server': 'Server', 'date': 'Mon, 23 Sep 2024 04:22:41 GMT', 'content-type': 'application/x-amz-json-1.0', 'content-length': '289', 'connection': 'keep-alive', 'x-amzn-requestid': 'V3OP7U9HHDPEVHG558QFAP947JVV4KQNSO5AEMVJF66Q9ASUAAJG', 'x-amz-crc32': '4120834871'}, 'RetryAttempts': 0}}
In addition to keyConditionExpression
, you can also use FilterExpression
to filter the results based on non-key attributes. For example, to find all the replies to Thread 1 that were posted by User B, you can do:
kwargs = {'ReturnConsumedCapacity': 'TOTAL',
'KeyConditionExpression': 'Id = :Id ',
'FilterExpression': 'PostedBy = :user',
'ExpressionAttributeValues': {':Id': {'S': 'Amazon DynamoDB#DynamoDB Thread 1'},
':user' : {'S':'User B'}
}
}
response = query_db(table_name=reply_table['table_name'], **kwargs)
print(response)
{'Items': [{'ReplyDateTime': {'S': '2015-09-22T19:58:22.947Z'}, 'Message': {'S': 'DynamoDB Thread 1 Reply 2 text'}, 'PostedBy': {'S': 'User B'}, 'Id': {'S': 'Amazon DynamoDB#DynamoDB Thread 1'}}], 'Count': 1, 'ScannedCount': 2, 'ConsumedCapacity': {'TableName': 'de-c2w1-dynamodb-Reply', 'CapacityUnits': 0.5}, 'ResponseMetadata': {'RequestId': '1IJAIUKN3FQ2CI7B1US47I5QUNVV4KQNSO5AEMVJF66Q9ASUAAJG', 'HTTPStatusCode': 200, 'HTTPHeaders': {'server': 'Server', 'date': 'Mon, 23 Sep 2024 04:22:47 GMT', 'content-type': 'application/x-amz-json-1.0', 'content-length': '289', 'connection': 'keep-alive', 'x-amzn-requestid': '1IJAIUKN3FQ2CI7B1US47I5QUNVV4KQNSO5AEMVJF66Q9ASUAAJG', 'x-amz-crc32': '3736314100'}, 'RetryAttempts': 0}}
Note that in the response you will see these lines:
"Count": 1,
"ScannedCount": 2,
This tells you that the Key Condition Expression matched 2 items (ScannedCount based on the value of the partition key) and that’s what you were charged to read, but the Filter Expression reduced the result set size down to 1 item (Count).
Open the documentation for the DynamoDB query()
method and search for the Limit
and ScanIndexForward
parameters. In this exercise, you need to write the following query: return only the first reply to Thread 1.
Limit
and ScanIndexForward
parameters. If you want to sort items in ascending order based on the sort key, use the parameter ScanIndexForward
. If you want to limit the number of items then use the Limit
parameter. This would be analogous in SQL to: ORDER BY ReplyDateTime ASC LIMIT 1
.