CIS314-Assignment 4 Solved

1. Consider the following x86-64 code:

loop:

      movq %rsi, %rcx       movl $1, %edx  movl $0, %eax

.L2:

     testq %rdx, %rdx

       je .L4

 movq %rdx, %r8  andq %rdi, %r8  orq %r8, %rax  salq %cl, %rdx  jmp .L2

.L4:

      ret

 

The code above was generated by compiling C code (with Arch gcc) that has the following overall form:

long loop(long a, long b) {     long result = ? ;

    for (long mask = ? ; mask != ? ; mask <<= ? ) {         result |= (? & ?);

    }

    return result;

}



Copy the above x86-64 code into a C file as a comment.  Annotate each line of the x86-64 code in terms of a, b, result, and mask.  Assume that this x86-64 code follows the register usage conventions outlined in B&O’H section 3.7.5 (it does).  Then implement the above C function by filling in the blanks so that it’s functionally equivalent to the x86-64 code. 

Here are some test runs:

loop(1, 5): 1 loop(2, 4): 0 loop(3, 3): 1 loop(4, 2): 4 loop(5, 1): 5

 

Also write a main() function to test your loop function.  

Hint: the register conventions in B&O’H section 3.7.5 with inform which registers are holding a and b.

Hint: try compiling your C code to x86-64 using gcc with the -S, -Og flags.

Name your source file 4-1.c.  

2 Consider the following C code:

int sum(int from, int to) {     int result = 0;     do {

        result += from;         ++from;

    } while (from <= to);     return result;

}

 

Implement the do-while loop above in x86-64.  Use the following as a framework:

long sum(long from, long to) {      long result = 0;

      // Ensure that argument *from* is in %rdi, 

      // argument *to* is in %rsi, *result* is in %rax - do not modify.

     __asm__ ("movq %0, %%rdi # from in rdi;" :: "r" ( from )); 

      __asm__ ("movq %0, %%rsi # to in rsi;" :: "r" ( to )); 

  __asm__ ("movq %0, %%rax # result in rax;" :: "r" ( result )); 

 

   // Your x86-64 code goes below - comment each instruction...

      __asm__(

         "movq %rsi, %rax # For example, this sets result = to;"

      );

 

  // Ensure that *result* is in %rax for return - do not modify.

      __asm__ ("movq %%rax, %0 #result in rax;" : "=r" ( result ));        return result;

}

 

Add a comment describing the purpose of each of your x86-64 instructions.  Your x86-64 code must follow the register usage conventions outlined in B&O’H section 3.7.5.  FWIW, I was able to get this working with 4 instructions and 1 label.

Here are some test runs:

sum(1, 6): 21 sum(3, 5): 12

 

Also write a main() function to test your sum function.  Name your source file 4-2.c.  

3. The following C code transposes the elements of an N × N array: 

#define N 4

typedef long array_t[N][N];

 

void transpose(array_t a) {     for (long i = 0; i < N; ++i) {         for (long j = 0; j < i; ++j) {             long t1 = a[i][j];             long t2 = a[j][i];             a[i][j] = t2;             a[j][i] = t1;

        }

    } 

}

 

When compiled with -Og, gcc (on Arch) generates the following x86-64 code for the inner loop of the function: 

.L3:

  cmpq %rcx, %rsi

  jle .L7

  movq (%rdx), %r9   movq (%rax), %r8   movq %r9, (%rax)   movq %r8, (%rdx)   addq $8, %rax   addq $32, %rdx

  addq $1, %rcx

  jmp .L3

 

Copy the x86-64 code above into a C file as a comment.  Annotate each line of the x86-64code in terms of N, a, i, j, t1, and t2.  Note that this x86-64 code uses pointer dereferencing and pointer arithmetic (e.g., *a, a += 8) rather than nested array lookups (e.g., a[i][j]) as optimizations.  Write a new C version of transpose that uses these optimizations.

For example, the input:

{{1, 2, 3, 4}, {5, 6, 7, 8}, {9, 10, 11, 12}, {13, 14, 15, 16}}

should be transposed as:

{{1, 5, 9, 13}, {2, 6, 10, 14}, {3, 7, 11, 15}, {4, 8, 12, 16}}

Also write a main() function to test your procedure.  Name your source file 4-3.c.  

Hint: declare a pointer to the ith row and a pointer to the ith column in the outer loop; use pointer arithmetic in the inner loop.