Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
CS/COE 0449 Introduction to Systems SoftwareManagement 6 Memory Our Story So Far You Hear a Voice Whisper: “The Memory Layout is a Lie” 2 Reallocating our thoughts • A program has several sections: • Code • Static data • Stack • Heap • Today, we take a deeper dive at how dynamic memory is allocated in the heap. 3 currently unused but available memory code static data heap stack ~ 0xFFFFFFFF ~ 0x00000000 Potential Layout (32-bit addresses) Reallocating our thoughts • We have looked at malloc and calloc. • They stake out space in the heap and return an address. • Right now, we live in a nice ideal world. • No other programs are running. • We have access to all of the memory. • Muhahahaha!! • The OS is lying to our program. • This memory is… virtual... reality. • We will investigate this lie later in the course. 4 currently unused but available memory code static data heap stack ~ 0xFFFFFFFF ~ 0x00000000 Potential Layout (32-bit addresses) The World of Allocation It is a puzzle without any optimal solution. Welcome to computers! 5 A heap of possibilities • Stack access often does not deviate much. • We allocate a little bit at a time. • We allocate and free the memory VERY often. • Heap allocations have many access patterns that are possible. • You might allocate a lot at a time and keep it around for a long time. Or a short time. • You might allocate a lot of small things, instead. • Maybe you do a little bit of everything? • Often, such patterns are not easy to predict. • Do you get a big file as input? A small file? 6 currently unused but available memory code static data heap stack ~ 0xFFFFFFFF ~ 0x00000000 Potential Layout (32-bit addresses) available memory available memory available memory A heaping helping of good luck • Allocations could happen in a nice order. • When something is allocated, it can be allocated after everything else. • When freed, it makes room for new things. • IF ONLY. • I mean, it’s possible… but like… • the heap and stack are different things for a reason. 7 code static data heap stack ~ 0x00000000 available memoryavailable memory available memoryavailable memory Digital potholes… as annoying as real ones • Small allocations interfere with large ones. • When small gaps interfere with allocation, this is called fragmentation. 8 available memory code static data heap stack ~ 0x00000000 ? Next Allocation Ugh Ugh U gh free() malloc() if we had omniscience of future allocations, we could avoid this… but we can’t know ahead of time! The worst case • When you allocate a lot of small things… • Free every other one… • And then attempt to allocate a bigger thing… • Even though there is technically enough memory… • There is no continuous space. • Therefore, our naïve malloc will fail. • We have to come up with some strategy. 9 code static data heap stack ~ 0x00000000 malloc() ? ? ? Moving is never easy • Why not move things around?? • A defragmentation process/algorithm • Moving around something in the heap is hard! • Any pointers referring to data within a block must be updated. • Finding these pointers automatically is effectively as difficult as garbage collection. • Because of this, moving blocks around is discouraged. (Easier to solve it another way.) 10 code static data heap stack ~ 0x00000000 malloc() ? ? ? Moving is NEVER easy • When blocks move, pointers to anything within them must be updated. • This is hard to keep track of! • C does not check validity of pointers after free() 11 available memory code static data heap stack ~ 0x00000000 int* my_int float* my_float 3.14-1.8e-6 42????? old data: int arr[100] Stressing it out • If we allocate a large array it will be allocated on the heap somewhere. • Other allocations can also happen, and they go “above” that array. • What happens when you need to append a 101st element to this array? • Uh oh! • You will need to allocate more space. • And then copy the array contents. • Free the old array. • How long does that take? 12 available memory heap stack ~ 0x00000000 int rr[100] int arr[200] fragmentation old data: int arr[100] Stressing it out: Big Arrays • This happens in very practical situations! • Reallocating means getting rid of a small thing • And replacing it with a larger thing. • You could have TiBs of memory and this will be a problem. • This affects performance: (in terms of writes:) • Appending item arr[0]: 1 • Appending item arr[1]: 1 • … • Appending item arr[99]: 1 • Appending item arr[100]: + 1 oh no! • When you would overflow the buffer… • You then need to copy all previous values as well. 13 available memory heap stack ~ 0x00000000 int rr[100] old data: int arr[100] Stressing it out: Performance Consistency • Big arrays want to be continuous. • Ensuring continuous space is difficult when you do not know how much you will ultimately need. • This is exactly why linked lists exist! • Since a linked list allocates on every append. • Each append takes the same amount of time. • However, everything is a trade-off. • Dang it!!! • One cost is extra overhead for metadata. • Linked list traversal can stress memory caches. • It means traversing the array is slower. • However, we will mostly ignore this for now. 14 available memory heap stack ~ 0x00000000 int rr[100] The Linked List A story about trade-offs. 15 What is a linked list? • A linked list is a non-continuous data structure representing an ordered list. • Each item in the linked list is represented by metadata called a node. • This metadata indirectly refers to the actual data. • Furthermore, it indirectly refers to at least one other item in the list. 16 Node Node* next char* data typedef struct _Node { struct _Node* next; char* data; } Node; “struct” required since Node is not technically defined until after it is defined! Keeping ahead of the list. • Creation of a list occurs when one allocates a single node and tracks it in a pointer. This is the head of our list (first element.) 17 Node Node* next char* data Node* list Node* list = (Node*)malloc(sizeof(Node)); list->next = NULL; // NULL is our end-of-list marker list->data = NULL; // Allocate/copy the data you want Adding some links to our chain • If we want to append an item, we can add a node anywhere! 18 “tail” Node* next char* data Node* list void append(Node* tail, const char* value) { Node* node = (Node*)malloc(sizeof(Node)); node->next = NULL; // The new end of our list tail->next = node; // We attach this node to the old last node node->data = (char*)malloc(strnlen(value, 100) + 1); strncpy(node->data, value, 100); } “node” Node* next char* data Remember the ‘\0’ sentinel! This is a problem! Why? We can add them anywhere!! • Consider what happens if we update our append to take any Node: 19 void linkedListAppend(Node* curNode, const char* value) { Node* node = (Node*)malloc(sizeof(Node)); node->next = curNode->next; curNode->next = node; node->data = (char*)malloc(strnlen(value, 100) + 1); strcpy(node->data, value); } “curNode” Node* next char* data Node* list Tail Node* next char* data “node” Node* next char* data We can add them anywhere!! • This function has very consistent performance (constant time): • The append always allocates the same amount. • It always copies the same amount. • Compare to a big array where you may have to copy the entire thing to append something new! void linkedListAppend(Node* curNode, const char* value) { Node* node = (Node*)malloc(sizeof(Node)); node->next = curNode->next; curNode->next = node; node->data = (char*)malloc(strnlen(value, 100) + 1); strcpy(node->data, value); } Traversal… on the other hand… • Accessing an array element is generally very simple. • arr[42] is the same as *(arr + 42) because its location is very well-known! • This is because array items are continuous in memory. Not true for linked lists! • Here is a function that performs the equivalent for linked lists: 21 Node* linkedListGet(Node* head, size_t index) { Node* curNode = head; while(curNode && index) { index--; curNode = curNode->next; } return curNode; } Q: How many times is memory accessed relative to the requested index? Removing… on the other, other hand! • One nice thing about linked lists is their flexibility to changing shape. • I used to be able to bend a lot better, too, when I was in my 20s. Alas. • Since we don’t have a way to go “backward” • We first find the node we want to delete (curNode) • Keeping track of the node of index – 1 (lastNode) • Rewire lastNode to cut out curNode. 22 Node* linkedListDelete(Node* head, size_t index) { Node* lastNode = NULL; Node* curNode = head; while(curNode && index) { index--; lastNode = curNode; curNode = curNode->next; } if (!curNode) return head; if (!lastNode) head = curNode->next; // New head is next item else lastNode->next = curNode->next; // Orphans item free(curNode->data); free(curNode); return head; } Returns new head (or old head if unchanged). Can’t find item at index. We are deleting the head. Removing… on the other, other hand! • This looks complex, but it really is a simple traversal. • So it takes to find the item. • And it performs a simple update and deallocation. (quick to do) • A big array, on the other hand. • It can find the element to remove immediately. • However, removing it means shifting over every item after it left. • That can be an expensive update! (Memory is slow!!) 23 Node* linkedListDelete(Node* head, size_t index) { Node* lastNode = NULL; Node* curNode = head; while(curNode && index) { index--; lastNode = curNode; curNode = curNode->next; } if (!curNode) return head; if (!lastNode) head = curNode->next; // New head is next item else lastNode->next = curNode->next; // Orphans item free(curNode->data); free(curNode); return head; } On your own! Think about the code you would need to do any of the following: • Delete/free the entire linked list. • Sort a linked list. • Append a linked list to an existing one. • Copy a subset of a linked list to a new list. Often, operations can be abstracted in such a way that all of these can be written relatively simply. Consider the performance of these operations compared to an Array. 24 Linked lists … link you … to the world! • Consider how much cleaner you can make certain operations if you tracked the previous node as well. • This is a doubly linked list. • This is typically “double-ended” as well: keeping track of both head and tail. 25 Node Node* next char* data Node* prev Node Node* next char* data Node* prev Node Node* next char* data Node* prevNode*head Node* tail Seeing the trees through the forest • A binary tree can be represented by the same nodes as a linked list. • In this case, you have a left and right child node instead of next and prev. • The operations are very different, though.