Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
CS/COE 0449
Introduction to
Systems SoftwareManagement
6
Memory
Our Story So Far
You Hear a Voice Whisper: “The Memory Layout is a Lie”
2
Reallocating our thoughts
• A program has several sections:
• Code
• Static data
• Stack
• Heap
• Today, we take a deeper dive at how dynamic
memory is allocated in the heap.
3
currently unused but
available memory
code
static data
heap
stack
~ 0xFFFFFFFF
~ 0x00000000
Potential Layout
(32-bit addresses)
Reallocating our thoughts
• We have looked at malloc and calloc.
• They stake out space in the heap and return an
address.
• Right now, we live in a nice ideal world.
• No other programs are running.
• We have access to all of the memory.
• Muhahahaha!!
• The OS is lying to our program.
• This memory is… virtual... reality.
• We will investigate this lie later in the course.
4
currently unused but
available memory
code
static data
heap
stack
~ 0xFFFFFFFF
~ 0x00000000
Potential Layout
(32-bit addresses)
The World of Allocation
It is a puzzle without any optimal solution. Welcome to computers!
5
A heap of possibilities
• Stack access often does not deviate much.
• We allocate a little bit at a time.
• We allocate and free the memory VERY often.
• Heap allocations have many access patterns that are
possible.
• You might allocate a lot at a time and keep it around for a
long time. Or a short time.
• You might allocate a lot of small things, instead.
• Maybe you do a little bit of everything?
• Often, such patterns are not easy to predict.
• Do you get a big file as input? A small file?
6
currently unused but
available memory
code
static data
heap
stack
~ 0xFFFFFFFF
~ 0x00000000
Potential Layout
(32-bit addresses)
available memory
available memory
available memory
A heaping helping of good luck
• Allocations could happen in a nice order.
• When something is allocated, it can be allocated
after everything else.
• When freed, it makes room for new things.
• IF ONLY.
• I mean, it’s possible… but like…
• the heap and stack are different things for a reason.
7
code
static data
heap
stack
~ 0x00000000
available memoryavailable memory
available memoryavailable memory
Digital potholes… as annoying as real ones
• Small allocations interfere with large ones.
• When small gaps interfere with allocation, this is
called fragmentation.
8
available memory
code
static data
heap
stack
~ 0x00000000
?
Next
Allocation
Ugh
Ugh U
gh
free()
malloc()
if we had omniscience of future
allocations, we could avoid this…
but we can’t know ahead of time!
The worst case
• When you allocate a lot of small things…
• Free every other one…
• And then attempt to allocate a bigger thing…
• Even though there is technically enough memory…
• There is no continuous space.
• Therefore, our naïve malloc will fail.
• We have to come up with some strategy.
9
code
static data
heap
stack
~ 0x00000000
malloc() ? ? ?
Moving is never easy
• Why not move things around??
• A defragmentation process/algorithm
• Moving around something in the heap is hard!
• Any pointers referring to data within a block must be updated.
• Finding these pointers automatically is effectively as difficult
as garbage collection.
• Because of this, moving blocks around is discouraged.
(Easier to solve it another way.)
10
code
static data
heap
stack
~ 0x00000000
malloc() ? ? ?
Moving is NEVER easy
• When blocks move, pointers
to anything within them must be updated.
• This is hard to keep track of!
• C does not check validity of pointers after free()
11
available memory
code
static data
heap
stack
~ 0x00000000
int* my_int
float* my_float
3.14-1.8e-6
42?????
old data: int arr[100]
Stressing it out
• If we allocate a large array it will be allocated on the
heap somewhere.
• Other allocations can also happen, and they go “above”
that array.
• What happens when you need to append a 101st
element to this array?
• Uh oh!
• You will need to allocate more space.
• And then copy the array contents.
• Free the old array.
• How long does that take?
12
available memory
heap
stack
~ 0x00000000
int rr[100]
int arr[200]
fragmentation
old data: int arr[100]
Stressing it out: Big Arrays
• This happens in very practical situations!
• Reallocating means getting rid of a small thing
• And replacing it with a larger thing.
• You could have TiBs of memory and this will be a problem.
• This affects performance: (in terms of writes:)
• Appending item arr[0]: 1
• Appending item arr[1]: 1
• …
• Appending item arr[99]: 1
• Appending item arr[100]: + 1 oh no!
• When you would overflow the buffer…
• You then need to copy all previous values as well.
13
available memory
heap
stack
~ 0x00000000
int rr[100]
old data: int arr[100]
Stressing it out: Performance Consistency
• Big arrays want to be continuous.
• Ensuring continuous space is difficult when you do not know
how much you will ultimately need.
• This is exactly why linked lists exist!
• Since a linked list allocates on every append.
• Each append takes the same amount of time.
• However, everything is a trade-off.
• Dang it!!!
• One cost is extra overhead for metadata.
• Linked list traversal can stress memory caches.
• It means traversing the array is slower.
• However, we will mostly ignore this for now. 14
available memory
heap
stack
~ 0x00000000
int rr[100]
The Linked List
A story about trade-offs.
15
What is a linked list?
• A linked list is a non-continuous data structure representing an ordered list.
• Each item in the linked list is represented by metadata called a node.
• This metadata indirectly refers to the actual data.
• Furthermore, it indirectly refers to at least one other item in the list.
16
Node
Node* next
char* data
typedef struct _Node {
struct _Node* next;
char* data;
} Node; “struct” required since
Node is not technically
defined until after it is
defined!
Keeping ahead of the list.
• Creation of a list occurs when one allocates a single node and tracks it in a
pointer. This is the head of our list (first element.)
17
Node
Node* next
char* data
Node* list
Node* list = (Node*)malloc(sizeof(Node));
list->next = NULL; // NULL is our end-of-list marker
list->data = NULL; // Allocate/copy the data you want
Adding some links to our chain
• If we want to append an item, we can add a node anywhere!
18
“tail”
Node* next
char* data
Node* list
void append(Node* tail, const char* value) {
Node* node = (Node*)malloc(sizeof(Node));
node->next = NULL; // The new end of our list
tail->next = node; // We attach this node to the old last node
node->data = (char*)malloc(strnlen(value, 100) + 1);
strncpy(node->data, value, 100);
}
“node”
Node* next
char* data
Remember the
‘\0’ sentinel!
This is a problem!
Why?
We can add them anywhere!!
• Consider what happens if we update our append to take any Node:
19
void linkedListAppend(Node* curNode, const char* value) {
Node* node = (Node*)malloc(sizeof(Node));
node->next = curNode->next;
curNode->next = node;
node->data = (char*)malloc(strnlen(value, 100) + 1);
strcpy(node->data, value);
}
“curNode”
Node* next
char* data
Node* list
Tail
Node* next
char* data
“node”
Node* next
char* data
We can add them anywhere!!
• This function has very consistent performance (constant time):
• The append always allocates the same amount.
• It always copies the same amount.
• Compare to a big array where you may have to copy the entire thing to
append something new!
void linkedListAppend(Node* curNode, const char* value) {
Node* node = (Node*)malloc(sizeof(Node));
node->next = curNode->next;
curNode->next = node;
node->data = (char*)malloc(strnlen(value, 100) + 1);
strcpy(node->data, value);
}
Traversal… on the other hand…
• Accessing an array element is generally very simple.
• arr[42] is the same as *(arr + 42) because its location is very well-known!
• This is because array items are continuous in memory. Not true for linked lists!
• Here is a function that performs the equivalent for linked lists:
21
Node* linkedListGet(Node* head, size_t index) {
Node* curNode = head;
while(curNode && index) {
index--;
curNode = curNode->next;
}
return curNode;
}
Q: How many times is memory accessed relative to the requested index?
Removing… on the other, other hand!
• One nice thing about linked lists
is their flexibility to changing
shape.
• I used to be able to bend a lot
better, too, when I was in my 20s.
Alas.
• Since we don’t have a way to go
“backward”
• We first find the node we want to
delete (curNode)
• Keeping track of the node of index
– 1 (lastNode)
• Rewire lastNode to cut out
curNode.
22
Node* linkedListDelete(Node* head, size_t index) {
Node* lastNode = NULL;
Node* curNode = head;
while(curNode && index) {
index--;
lastNode = curNode;
curNode = curNode->next;
}
if (!curNode)
return head;
if (!lastNode)
head = curNode->next; // New head is next item
else
lastNode->next = curNode->next; // Orphans item
free(curNode->data);
free(curNode);
return head;
} Returns new head (or old head if unchanged).
Can’t find item at index.
We are deleting the head.
Removing… on the other, other hand!
• This looks complex, but it really is
a simple traversal.
• So it takes to find the item.
• And it performs a simple update
and deallocation. (quick to do)
• A big array, on the other hand.
• It can find the element to remove
immediately.
• However, removing it means
shifting over every item after it left.
• That can be an expensive update!
(Memory is slow!!)
23
Node* linkedListDelete(Node* head, size_t index) {
Node* lastNode = NULL;
Node* curNode = head;
while(curNode && index) {
index--;
lastNode = curNode;
curNode = curNode->next;
}
if (!curNode)
return head;
if (!lastNode)
head = curNode->next; // New head is next item
else
lastNode->next = curNode->next; // Orphans item
free(curNode->data);
free(curNode);
return head;
}
On your own!
Think about the code you would need to do any of the following:
• Delete/free the entire linked list.
• Sort a linked list.
• Append a linked list to an existing one.
• Copy a subset of a linked list to a new list.
Often, operations can be abstracted in such a way that all of these can be written relatively
simply.
Consider the performance of these operations compared to an Array.
24
Linked lists … link you … to the world!
• Consider how much cleaner you can make certain operations if you tracked the
previous node as well.
• This is a doubly linked list.
• This is typically “double-ended” as well: keeping track of both head and tail.
25
Node
Node* next
char* data
Node* prev
Node
Node* next
char* data
Node* prev
Node
Node* next
char* data
Node* prevNode*head
Node*
tail
Seeing the trees through the forest
• A binary tree can be represented by the same nodes as a linked list.
• In this case, you have a left and right child node instead of next and prev.
• The operations are
very different, though.