**Introduction**

In lab6 we touch on how sound is represented internally in our computers. As we can guess, sound does not get processed as analog signals within our programs. Instead, the typical representation of sound is in the for int16_t (16-bit integers) samples. Should we decide to change the volume, the sound sample would have to be re-scaled, based on the sound factor.

We are going to use 3 different approaches to scale sound samples. For each, I am going to provide the code and explain how these algorithms differ from each other. The algorithms will be tested on two separate architectures: Aarch64 and x86-64.

**Alogrithm 1 (Aarch-64)
**

As per lab instruction I wrote the following programs,

// // lab6.c // SPO600-Lab6 // // Created by Evgeni Kolev on 2017-10-18. // Copyright © 2017 Evgeni Kolev. All rights reserved. // #include #include #include #define SIZE 250000 // array size #define LOW -32767 // lowest value in the range #define HIGH 32767 // highest value in the range int main(void) { int16_t* samples = (int16_t*) malloc(SIZE * sizeof(int16_t)); float* scaled = (float*) malloc(SIZE * sizeof(float)); // second array int16_t div = (HIGH + 1 - LOW); // divider for the random number to be in the range float total = 0; int x; /* Generate the samples. */ printf("Generating the samples"); for (x = 0; x < SIZE; x++) samples[x] = (LOW + rand() % div); /* Scalte the samples */ printf("Scaling the samples"); for (x = 0; x < SIZE; x++) { scaled[x] = samples[x] * 0.75; //scale the sample } for(x = 0; x < SIZE; x++) total += scaled[x]; printf("Total: %.2f\n ", total); return 0; }

First I defined the size of the sound samples arrays as well as their range. After declaring the arrays to hold the data, I initialize the elements in the array to hold the samples. Then, I scale each sample from the array and save them in the array containing scaled samples.

Now, let’s run the solution with the “time” command to see how long it takes for the whole program to execute.

`time ./lab6.1`

Here are the results.

**Algorithm 2 (Aarch-64)
**

In the second approach, we need to amend the scaling operation. Basically, instead of scaling each individual sample in an array of millions of values we are going to scale all of the possible values with the range and save them in a table. This way, we already have all the scaled results that a sample could have.

Here is the code:

// main.c // SPO600-Lab6 // // Created by Evgeni Kolev on 2017-10-17. // Copyright © 2017 Evgeni Kolev. All rights reserved. // #include #include #include #define SIZE 250000 // array #define LOW -32767 // lowest value in the range #define HIGH 32767 // highest value in the range int main(void) { /* Declear a table and initialize arrays. */ float* scaled = (float*) malloc(SIZE * sizeof(float)); float* table = (float*) malloc(SIZE * sizeof(float)); int div = (HIGH + 1 - LOW); int16_t* samples = (int16_t*) malloc(SIZE * sizeof(int16_t)); int x, j, i; float total = 0; /* Create the scales in the table. */ printf("Creating a look up table\n"); for (x = 0; x <65534 ; x++) { table[x] = (HIGH - x) * 0.75; } /* Create samples. */ printf("Creating samples\n"); for (x = 0; x < SIZE; x++) samples[x] = (LOW + rand() % div); printf("Scaling samples\n"); for(x = 0; x < SIZE; x++){ scaled[x] = table[HIGH - (samples[x])]; } for(x = 0; x < SIZE; x++) total += scaled[x]; printf("Total: %.2f\n", total); return 0; }

Now, let’s see how this approach is goint to perform. I am going to run again the program using the time command.

`time ./lab6.2`

Here is the output:

We can see that there is a significant improvement in the user time. In the previous approach user time was 0.18 seconds, where with the second algorithm it is 0.09. The second algortihm run 2 times faster than the first. The reason is, using a scaled table we have limited the number of multiplication operation only to the size of the samples’ range.

**Algorithm 3 (Aarch-64)
**

In the third approach, we are going to turn the scaling expression into integer math. The first thing to do is turn the scaling factor into an integer, multiply 0.75 by 256. When scaling the samples we multiply the sample by the factor and then shift the results 8 bits to the right.

Here is the code:

// // main.c // lab6.3 // // Created by Evgeni Kolev on 2017-10-18. // Copyright © 2017 Evgeni Kolev. All rights reserved. // #include #include #include #define SIZE 250000 // array size #define SHIFT 8 // used to shift bits #define FACTORIAL 0.75 #define HIGH -32767 // lowest value in the range #define LOW 32767 // highest value in the range int main() { int factorial = FACTORIAL * 256; int16_t* samples = (int16_t*) malloc(SIZE * sizeof(int16_t)); int16_t* scaled = (int16_t*) malloc(SIZE * sizeof(float)); // second array // divider for the random number to be in the range int16_t div = (int16_t)((HIGH + 1) - LOW); int16_t total = 0; int x; /* Initialize the samples in the array. */ printf("Generating samples \n"); for (x = 0; x < SIZE; x++) samples[x] = (LOW + rand() % div); /* Sacle the samples */ printf("Scaling the samples \n"); for(x = 0; x < SIZE; x++) scaled[x] = (samples[x] * factorial) >> SHIFT; for(x = 0; x < SIZE; x++) total += scaled[x]; printf("Total: %d\n", total); return 0; }

Here are the ruslts from the test:

**x86-64**

Let’s see what results we are getting from running the same algorithms on a different architecture.

**Algorithm 1**

**Algorithm 2**

**Algorithm 3**

**Conclusion**

Based on the results we got from the 3 different algorithms we can see that there are a number of dependencies. First, when we select a given algorithm for our application we need to take into account the architecture our app is going to be run on. Second, we should consider the amount of time it takes for the CPU to perform different operations. In example, multiplication is a more expensive operation than addition. Third, the approach we take should be dependent on the data load our application is going to handle.