CS 485-006, Spring 2016
Midterm Exam Review

The midterm exam for CS 485G will be held in class on Friday, 4 March, 2016. The exam will be closed-note, closed-book.

Exam topics
Study questions

There will be a mix of multiple-choice, fill-in-the-blank, short-answer (one or two sentences), long-answer (a paragraph or two), and code-writing questions (C or C++ and assembly).

Topics

Chapter 2: Representing and Manipulating Information:
- Converting among binary, hexadecimal, and decimal.
- Big-endian vs little-endian
- Two's-complement representation of negative numbers
- Sign extension and zero extension
- Bitwise operators in C: &, |, ^, <<, >>.
Chapter 3 (part I): Machine-Level Representation of Programs (basics)
- Size of data types (x86-64): char, short, int, long, size_t, pointer; instruction suffixes b, w, l, q.
- x86-64 register set: %rax, %rbx, %rcx, %rdx, %rdi, %rsi, %rbp, %rsp, %r8–%r15, %rip. 32-bit versions: %eax etc.
- The mov instructions: movq, movl, movb; valid source and destination operands.
- Addressing memory in assembly: disp(base,index,scale).
- Arithmetic at bitwise instructions: add, sub, imul, xor, or, and, sal/shl, sar, shr.
- Doing arithmetic with the leaq instruction.
Chapter 3 (part II): Control
- Flags (condition codes): CF, ZF, SF, OF.
- The test and cmp instructions.
- Conditional jumps: je/jz, jne/jnz, etc. Particularly, the difference between ja/jb and jg/jl.
- Converting if and if/else into assembly (and vice versa).
- Converting do–while, while, and for loops into gotos.
- Converting loops into assembly (and vice versa).
Chapter 3 (part III): Procedures
- Structure of a stack frame: excess arguments, return address, saved registers, local variables
- Behavior of the call and ret instructions.
- Calling conventions: first six arguments in %rdi, %rsi, %rdx, %rcx, %r8, %r9; return value in %rax
- Caller-saved vs. callee-saved registers: what is the difference, which registers fall into which category, when to use each.
- Allocating/freeing stack space by subtracting/adding to %rsp.
- Saving and restoring registers with push and pop
Chapter 3 (part IV): Arrays and structures.
- Size of an array (indices vs bytes).
- Allocating stack space for a local array variable.
- Accessing array elements in machine code: disp(,index,scale) and (base,index,scale).
- Pointer arithmetic and array indexing in C and C++; array parameters (not other array variables) are really pointers.
- Nested arrays: memory layout, row-major vs column-major layout.
- Nested arrays in assembly code: getting a pointer to a row; accessing an element; calculating offsets.
- Multi-level arrays (arrays of pointers): advantages and disadvantages compared to nested arrays.
- Accessing an element of a multi-level array.
- structs: differences between C structs and C++ classes/structs.
- Size and offset of elements in a struct; accessing an element at a particular offset.
- Alignment and padding: why? How much padding for different data types?
- Alignment and padding: size of the whole struct; arranging a struct to reduce padding; accessing an array of structs.
Chapter 3 (part V): Memory layout and buffer overflows
- Layout of memory regions: text (code), data, heap, stack.
- Buffer overflows: causes and consequences.
- Buffer overflows: historical examples (be prepared to list one or two).
- Buffer overflows: overwriting the return address.
- Mitigating the danger of buffer overflows: stack randomization, non-executable stacks, canaries.
Programming on Unix: Tools:
- Stages of compilation: compiler, assembler, linker.
- Invoking gcc: assembly (-S), compilation + assembly (-c).
- Invoking gcc: optimization flags
- Invoking gcc: including information for debugging (-g) and profiling (-pg).
- gdb commands: break, run, backtrace (bt), disas(semble), p(rint), examine (x), disp(lay)
- gdb commands: next, step, n(ext)i, s(tep)i, continue, finish
- Other programming tools: disassembling (objdump -d), profiling (gprof), checking for memory errors (valgrind).

Study questions

The following questions and problems are representative of those that might appear on the exam. The actual exam will be nowhere near this long, of course.

We will not be posting solutions to these problems. If you would like to verify your answers, send them to Dr. Moore by email.

All questions assume we are talking Linux running on the x86-64 architecture.

1. Convert hexadecimal 0x2b to decimal.
2. Convert decimal 485 to binary.
3. Convert binary 0b1001011010110011 to hexadecimal.
Is x86-64 big-endian or little-endian? Show how the 32-bit number 0x40046f is represented as a sequence of bytes (you can leave the bytes in hexadecimal).
What is the value of the largest unsigned integer, if interpreted instead as a signed integer?
Represent the number -12 as an 8-bit two's-complement number.
What is the signed decimal value of the 8-bit two's-complement number 0b10011100?
Find the value of:
1. 0x42 | 0x2a
2. 0x42 & 0x2a
3. 0x42 ^ 0x2a
4. 7 << 3
5. -5 >> 1
Name a C integer type that has the same size as a pointer.
What C integer type is guaranteed to be able to hold any valid array index (on every architecture, not just x86-64)?
How many bytes are required to hold an array of five ints?
What is the meaning of the value stored in the %rip register?
Which of the following are illegal instructions? Select all the correct answers.
1. movq (%rsp), %rdi
2. movq %rbx, %rbp
3. movq (%rdi), (%rsi,%rdx,4)
4. movl $1, %edx
5. movq %r10, 44(,%r10,2)
6. movq %r16, %rax
7. movq %rbp, (%rdx,%rcx,6)
If %rdi = 5000 and %rsi = 100, what is the (decimal) address computed by each of the following operands?
1. (%rdi)
2. (%rdi,%rsi,4)
3. (%rsi,%rdi,4)
4. (%rdi,%rsi)
5. 12(%rdi)
6. 12(,%rsi,2)
7. 12(%rdi,%rsi,8)
What is the difference between the sar and shr instructions? Give an example where they compute different answers, and show the result of each (decimal, binary, or hexadecimal is fine).
Write a single instruction to compute the value of %rdi - %rax. Where does the instruction store its result?
Write a single assembly instruction that computes a*2 + b, if a is in register %rax and b in %rbx.
Write a single leaq instruction that computes a*5, if a is in register %rax.

What is the value of %rax after the following code executes?

      movl $1, %eax
      movl $3, %ebx
      leaq (%rax, %rbx, 2), %rcx
      shl  %rax, 4
      subq %rcx, %rax

What are the values of the flags CF, ZF, SF, and OF after executing each of the following instructions? Assume that, before executing each instruction, %rax contains the value 10 and %rbx contains the value -10.
1. add %rax, %rax
2. add %rbx, %rbx
3. add %rax, %rbx
4. sub %rax, %rax
What is the difference between the instructions subq and cmpq?
What is the difference between the instructions testq and cmpq?
Suppose %rax contains the value 10 and %rdx contains the value -2. After executing the instruction cmp %rax, %rdx, which of the following instructions will jump?
Translate the following C code into assembly. Assume that a and b are of type long and are stored in %rax and %rbx, respectively.
```
      if (a > b)
          a = b;
```
Translate the following C code into assembly. Assume that a and b are of type size_t and are stored in %rax and %rbx, respectively.
```
      if (a < b)
          b -= a;
      else
          a -= b;
```

What are the values of %rax and %rbx after the following code executes?

      movl $10, %eax
      movl $5,  %ebx
      cmpq %rax, %rbx
      jge L2
      subq $1, %rax
  L2:
      subq $1, %rbx

Rewrite the following fragments of C code to use goto rather than the high-level loop constructs. Write your answers in C, not assembly.

        do {
            sum += x;
            x *= 2;
        } while (x < 64);

        while (x) {
            sum += x->data;
            x = x->next;
        }

        for (i = 0; i < size; i++) {
            a[i] = 0;
        }

Translate the following fragments of C code into assembly. Assume that a and b are of type long and are stored in %rax and %rbx, respectively.

        do {
            a += b;
            b += 4;
        } while (b < 10);

        while (a) {
            b++;
            a /= 2;
        }

Translate the following fragments assembly code into C. Use the variable names rax, rbx, etc. in your code to represent the registers.

  L1: addq %rbx, %rcx
      addq $1, %rbx
      cmp %rax, %rbx
      jl L1

      jmp L2
  L1: addq %rbx, %rcx
      addq $1, %rbx
  L2: cmp %rax, %rbx
      jl L1

      cmp %rax, %rbx
      jg L2
  L1: addq %rbx, %rcx
      addq $1, %rbx
      cmp %rax, %rbx
      jl L1
  L2:

The callq instruction does two separate things. What are they?
When a function is called, what is stored at the address pointed to by %rsp?
Suppose we have the function: long myfunc(long x, long y, long z), and that we have three long variables a, b, and c, stored in %rax, %rbx, and %rcx, respectively. Write assembly code to call myfunc(a, b, c)
If a function modifies the contents of %rbp, what should be the first instruction executed by that function? What should be the last instruction before ret?
Suppose you are writing an assembly function with no parameters that does not call any other functions. You need to use a register to store a local variable. All else being equal, should you prefer to use %rax or %rbx? Why?
List all the callee-saved registers. In what situations is it preferable to use a callee-saved register rather than a caller-saved register?
Why is %rax caller-saved rather than callee-saved?
Suppose %rsp has the value 5000 and %rbp has the value 100. When the instruction pushq %rbp is executed:
1. What is the new value of %rsp?
2. At what memory address is the value 100 stored?
Suppose we have the variable d in register %rdx. We want to call func1(d) then func2(d). Why does the following code not work?
```
      movq %rdx, %rdi
      callq func1
      movq %rdx, %rdi
      callq func2
```
Write a corrected version of the code.
Which registers are used for passing arguments to a function? If a function call has more arguments than the number of argument registers, where are the other arguments stored?

Write a complete implementation in assembly of the following C function:

      long calc(long x, long y)
      {
          long result = (x + y) / 2;
          return result;
      }

Suppose the array variable int a[6] = { 0, 10, 20, 30, 40, 50 } is stored at address 4000. What is the type and value of each of the following C expressions?
1. a[1]
2. &a[1]
3. a + 3
4. a[0] + 3
5. a[6]
6. &a[6]
Suppose the array variable long a[10]; is stored at address 4000, and that the variable i is stored in register %rsi. Write an assembler instruction to load the value of a[i] into the register %rdx.
Suppose that %rdi holds a pointer p to an array of longs, and that %rsi stores the variable i. Write an assembler instruction to load the value of p[i] into the register %rdx.
Suppose we have a nested array int a[3][5] stored at address 1000. What is the address of:
1. a[0]
2. a[0][0]
3. a[1]
4. a[1][0]
5. a[1][1]
6. a[2][4]
Suppose we have a nested array int a[10][8] stored at address 1000, and two variables i and j stored in %rax and %rdx, respectively. Write an assembly instruction to add 1 to the value of a[i][j].
List at least one advantage and one disadvantage of multi-level arrays (arrays of pointers to arrays) compared to nested (multidimensional) arrays.
Suppose we have a nested array int *a[3] stored at address 1000, and two variables i and j stored in %rax and %rdx, respectively. Write a sequence of assembly instructions to add 1 to the value of a[i][j].
Consider the data structure:
```
      struct data {
          char array[5];
          int number;
          size_t size;
      };
```
1. What are the alignment requirements for each of the three members?
2. What is the alignment requirement for the struct as a whole?
3. What is sizeof(struct data)?
4. If struct data d is stored at address 4000, what is the address of d.number? Of d.size?
5. If %rbx stores a pointer struct data *p, write an assembly instruction to load p->size into register %rdx.
Consider the data structure:
```
      struct too_big {
          char x;
          long y;
          char z;
      };
```
1. What is the value of sizeof(struct too_big)?
2. Write C code to define a structure just_right that contains the same data members as too_big, but that requires less space. What is sizeof(struct just_right)?
In what region of memory (text, data, heap, stack) is each of the following stored?
1. A global variable.
2. A local variable.
3. The machine code for main.
4. An array allocated with malloc.
5. An array defined as int a[4]; inside a function.
Which region of memory is located at the very top of the program's address space?
How can the gets() function be used safely with no risk of buffer overflows?
Consider the following assembly code:
```
badfunc:
      subq $8, %rsp
      movq %rsp, %rdi
      callq gets
      addq $8, %rsp
      ret
```
1. What is the maximum amount of user input that will allow this function to execute without problems?
2. If the user provides a few more bytes of input than that, which instruction is likely to crash?
3. If another function evil is at address 0x00414243, what input could the user provide to this gets call to cause evil to be executed?
  Hint: 0x41 is the ASCII code for "A".
What two techniques does the x86-64 architecture use by default to mitigate against buffer overflow attacks? How can an attack still be carried out despite these techniques?
What technique does gcc -fstack-protector use to mitigate against buffer overflow attacks?
Describe how stack canaries work.
List at least two significant current or historical buffer overflow attacks.
What have buffer overflow attacks been used for, other than gaining unauthorized access to other peoples' computers? List at least one real-world example.
List the gcc command-line options for each of the following:
1. Generate a .o file rather than an executable.
2. Generate a .s (assembly code) file rather than an executable.
3. Enable the highest level of optimization.
4. Enable optimizations that do not make debugging more difficult.
5. Include debugging information in the executable file.
Give the gdb command to do each of the following:
1. Set a breakpoint at the beginning of main.
2. Set a breakpoint at line 150 of prog.c.
3. Set a breakpoint at address 0x4005fc.
4. Print the value of %rax.
5. Print the value of %rax in hexadecimal.
6. Print the 64-bit value stored at the top of the stack.
7. Print the C string pointed to by %rdi.
8. Print the address of the next instruction to be executed.
9. Print the next instruction to be executed.
10. List the assembly code for the entire current function.
11. See the list of stack frames, including the current function, its caller, its caller's caller, etc.
What is the difference between the print and display commands in gdb?
What is the difference between the nexti and stepi commands in gdb?
What is gprof used for? How does one compile a program so that it can be used with gprof?
List at least two different kinds of problems that valgrind can help detect.
Without running gdb, how can you list the assembly code for an executable program?

CS 485-006, Spring 2016Midterm Exam Review

Topics

Study questions

CS 485-006, Spring 2016
Midterm Exam Review