More C & Pointers

Last time, we left some stuff on the table, so let’s address that again

Pointers and Functions

Remember how earlier we talked about pointers in relation to function calls? Let’s talk about that

You should recall that your program is stored in RAM which, like all memory, is finite

Whenever you declare a variable in a function, the appropriate memory space is placed into a portion of memory called the stack

This is different from what we referred to as a stack in Java, since each function as its own stack, meaning functions can’t call upon variables from other stacks

…well, directly at least

Let’s take the following code to see what I mean

void my_func (int *x, int *y) { // taking pointers
	x = x * 2;
	y += 3;
}

int main (int argc, char *argv[]) {
	int x, y;
	x = 2;
	y = 2;
	printf("%d\n", x); // 2
	printf("%d\n\n", y); // 2
	my_func(&x, &y); // passing addresses
	printf("%d\n", x); // 4
	printf("%d\n\n", y); // 5
}

The C compiler allocates memory for x and y to the stack of main(), as well as *x and *y to the stack of my_func(), letting the locations of x and y be in the stack

Since these two pointers in my_func() point to the addresses of x and y, the C compiler jumps to x and y in the main stack directly

After the my_func() call finishes, all memory in the my_func() stack is deallocated, fresh for more computing

If we were to have something like this…

void my_func (int x, int y) { // taking values
	x = x * 2;
	y += 3;
}

int main (int argc, char *argv[]) {
	int x, y;
	x = 2;
	y = 2;
	printf("%d\n", x); // 2
	printf("%d\n\n", y); // 2
	my_func(&x, &y); // passing addresses
	printf("%d\n", x); // 4
	printf("%d\n\n", y); // 5
}

…then x and y in main() would be unaffected; x and y in my_func() would be treated as a separate variable because it’s in a different stack

This method is called pass-by-value, whereas the method with pointers is called pass-by-reference

Pointers and Arrays

I’m gonna throw a bombshell at you real quick

Remember array labels? Those are actually pointers

Let’s take the following code to see what I mean

int arr[3];

for (int i = 0; i < 3; i++) {
	arr[i] = i;
}

int *arr_point = &(arr[0]);

Look at that last line and then look at the following

int *arr_point = arr

These are the same

The array label is simply a pointer to the address of its first element

You can use offsets the same way

&(arr[0]) + 2 = 5
arr + 2 = 5 //these are equivalent

Pointers and Dynamic Memory

So far we’ve only done static memory allocation, meaning that whatever memory we allocate is decided before execution

But what if we need extra? Or we don’t know exactly how much memory we’ll need? That’s where dynamic memory allocation comes in

For this, we need to ask the OS to set aside a certain amount of memory during execution

Take the following code as an example

double *a;
a = (double*) malloc(40); //malloc is a C function that sets aside n bytes in memory
//more on malloc later

In this, we request the OS to set aside 40 bytes of memory configured to handle double values and assign its address to a

But where is this extra memory stored? It’s in the heap

Remember that for each process, you have allocated memory for the stack at the top, shared memory/libraries in the middle and the heap at the bottom (along with your static code)

For memory in the stack, you have labels to describe it and call it

For malloc (and calloc), however, no label is assigned to the memory, so you must rely entirely on pointer arithmetic to access said addresses

Now let’s say that I want to use some of this memory

a[0] = 8;

Now, we take the first 8 bytes out of our 40 that we declared with malloc() and assign it to the value of 8, since a double is 8 bytes

If the address allocated by malloc() started at 10000, we would take that byte + 7 down and assign all of those bytes to the value of 8

This memory assignment behaves the same way it does in arrays, except it’s in the heap this time

The difference between doing this and using an array is that static memory is deallocated once you leave the call frame, but this is not the case with dynamic memory

If you were to allocate those 40 bytes and not deallocate them in the program, the memory will still be set aside and the OS can no longer use it

Luckily, this is only in RAM, so you can physically free this memory by resetting RAM (aka rebooting your machine)

One or two memory leaks on a personal machine isn’t that big of a deal, but they can accumulate over time and cause a lot of problems

Fortunately, we can reallocate this memory in C with a function

free(a);

This function frees the dynamic memory block that a points to, making everything alright

…right?

Wrong

Let’s say you continue using a and accidentally reassign it

a = (double*) malloc(40);
//...
//...
//...
a = (double*) malloc(32);
free(a);

In here, you only free the 32 byte block, but not the 40 byte block, since a no longer points to that chunk of memory

Sadly, the only way to reallocate this is through a system reboot

calloc

I mentioned earlier that calloc also allocates dynamic memory, but what’s the difference?

While malloc allocates exact bytes, calloc takes in two parameters, one being the # of cells and the other being the # of bytes in each cell

a = (double*) calloc(5, 8); //5 cells of 8 bytes each = 40 bytes

This is sometimes safer than malloc since you can’t accidentally allocate too much or too little memory by fractions of cells

Double Pointers

The concept of a pointer is very simple; it’s a pointer pointing to a pointer

Sounds complicated? Let me explain

Let’s take the following code

int i;
int *p1;
int **p2; //we use two asterisks for a double pointer

i = 37;
p1 = &1;
p2 = &p1;

Here, we point the first pointer towards i and the second pointer towards the first pointer, creating a kind of chain of pointers

\[p_2 \rightarrow p_1 \rightarrow i\]

Now, we were to assign a variable to 37, we can do the following

int x = **p2 //each asterisk is a step down the chain; go to p1 and then go to i

The real interesting thing occurs when we combine double pointers and malloc

Let’s try this out for a second

int **m; /* 4 bytes (just an address) */
m = (int **) calloc (2, sizeof(int *) ); /* 2 x 4 bytes */
m[0] = (int *) calloc ( 3, sizeof(int)); /* 3 x 4 bytes */
m[1] = (int *) calloc ( 2, sizeof(int)); /* 2 x 4 bytes */

First, we declare a double pointer as normal, right before we do something strange

Then, we allocate dynamic memory equivalent to two pointers and assign it to m

After that, we initialize each pointer as a heap of different sizes

With this, we have essentially created a 2D pointer

After you’re done using this, every call to dynamic memory must be freed as so

free(m[0])
free(m[1])
free(m)

File Organization

So far, we’ve done most of our code in one file and one function, but this gets clunky fast with larger programs

What we can do instead is divide your functions up into multiple files

Usually you’ll have a main.c file which simply works to control your program (do a, do b, do c, etc), along with other files for different functions

We can then compile together as follows

% gcc -o [name] main.c file1.c file2.c ... fileN.c

If you do this by itself, however, it will cause runtime errors, so we must prototype the functions

But how would we do this? With a header file

Header Files

Header files can be used to dictate some preprocessor commands to be used in multiple files

These will include:

These header files can get very large too, so you can break these up as well with one header file acting as a main, which you will include into every C file in your program

A problem you could possibly run into is the situation where your header file gets defined more than once, in which case hijinks will ensue

In order to prevent this, you can do the following

#ifndef HEADER_INCLUDED
	#define HEADER_INCLUDED

	#include "something.h"

	...

#endif // HEADER_INCLUDED

You can also take advantage of this modularity by having pointers to functions included in your prototypes

//header file
...
void* func(...)
...

This way, you can change the return type and the form of the function depending on its use

Size of function = Size of all local variable which has declared in function + Size of those global variables which has used in function+ Size of all its parameter+ Size of returned value if it is an address.

Scope

In 1027, we discussed that different variables will have a different scope depending on where in the program they’re declared

The same exists in C, only there’s 5 types

These types generally fall into 2 categories: compile-time variables, which persists throughout the whole execution period of the program, and run-time variables, which only exist in the system stack

Compile-time

Global - accessible everywhere

Static global - accessible within the same C program file

Static local - accessible within the same C function

Run-time

Local - accessible only in the block declared

Parameter accessible only in the called function

#include ...

int z; //global
static int n; //static global

int func1 (int k) //k is parameter
{
	int j; //local to func1
	j = z + k + n;
	return j;
}

int main (int argc, char *argv[])
{
	int x, y; //local to main
	x = 9;
	n = 2; //global to all code
	z = 3; //global to all code
	for (int i = 0; i < 10; i++)
	{
		y = i; //i local to the block (for loop)
	}
	x = func1(y);
}

(Recursion, CLA, Unix)

Recursion

We should know what recursion is by now, so let’s see how it works in C

Iterative Solution

But first, we should trace the iterative approach to a program that returns the factorial of 3

#include <stdio.h>

int fact(int );

int main()
{
	int i = 3 , x;
	x = fact(i);
	printf("\nIteration x: %d\n", x);
	return 0;
}

int fact(int ir)
{
	if (ir == 1)
		return (1);
	else
	{
		int totalSum = 0;
		for (int k = 1; k <= ir; k++)
		{
			totalSum = totalSum + k;
		}
		return (totalSum);
	}
}

When we go into the main program, we allocate space for i and x

Then, we go into fact and allocate a new variable ir (since we’re passing by value)

Now we go into the else condition and allocate totalSum and k

Now our program is finished, printing out the value of 6

Recursive Approach

What happens if we instead opt for a recursive solution?

#include <stdio.h>

int fact(int );

int main()
{
	int i = 3 , x;
	x = fact(i);
	printf("\nIteration x: %d\n", x);
	return 0;
}

int fact(int ir)
{
	int result;
	if (ir == 1)
		return (1);
	else
	{
		result = (it + fact(ir - 1));
	}
	return result;
}

With this, we create i and x as normal, then go into fact and create ir and result

After this, we call fact again and make a new call frame with ir and result

After THAT, we call fact AGAIN and make a third call frame with ir and result

Once we know we’re finished, we pop everything sequentially out of the memory map and return back to main with the value of 6

This is about the same as how it works in Java, so I’ll stop here

Command Line Arguments

Sometimes your main function can have some number of arguments

//argc is the number of arguments
//argv is the actual arguments
int main (int argc, char *argv[]) //argv[0] is the name of the program itself
{
	//...
	return 0;
}

For example, if we passed one value to the program, argc would be 2 (one for the program name and one for the argument itself)

For argv, everything from argv[0] to argv[argc-1] will contain a pointer to a string

To add command line arguments, you execute the executable with the arguments

% ./my_program arg1 arg2 "argument 3"

This will pass in 3 arguments to my_program

It is also with noting at argv[argc] is a NULL pointer, which is pretty obvious

Interacting with Unix

Much of what you can do in Unix you can do in C, with some different syntax

Process ID

You can get the process ID for the current executable with getpid() in the standard library unistd.h

#include <stdio.h>
#include <unistd.h>
int main()
{
	pid_t n = getpid(); //pid_t is the return type
	printf("Process id is %d\n", n);
	return 0;
}

With unistd.h, you can also do many of the other tasks that you can in Unix, such as changing the directory

#include <stdio.h>
#include <unistd.h>
int main(){
	char str[1000];
	char*p=getcwd(str,1000); //gets current working directory
	if(p!=str){
		printf("Could not get cwd!");
		exit(1);
	}
	printf("cwd is %s\n", str);
	chdir("/usr/bin"); //changes the working directory
	printf("cwd is now %s\n",getcwd(str,1000));
}

Time

With time.h, you can get the current system time

You get two types with this, the first being time_t (similar to long), which is the number of seconds since epoch (00:00:00 UTC, Jan 1, 1970)

You also get the struct tm, which looks like so

struct tm{
	int tm_sec; // seconds [0,61]
	int tm_min; // minutes [0,59]
	int tm_hour; // hour [0,23]
	int tm_mday; // day of month [1,31]
	int tm_mon; // month of year [0,11]
	int tm_year; // years since 1900
	int tm_wday; // day of week [0,6] (Sunday = 0)
	int tm_yday; // day of year [0,365]
	int tm_isdst; // daylight savings flag
}

To use this in a program, we can do something like this, which detects whether or not you’ve passed the expiration date of the time

#include <stdio.h>
#include <time.h>

int main() {
	time_t t = time(NULL); //gets the current epoch time
	struct tm *p = localtime(&t); //populates the time structure from the time at t
	if( p->tm_year >= 121 ){ //year 121 translates to 2021
		printf("Trial version expired!\n");
		exit(0);
	}
	return 0;
}

Shell

With the system() function in stdlib.h, we can invoke a subshell and run any Unix command we want

For example, we can do this

#include <stdio.h>
#include <stdlib.h>

int main() {
	int k;
	printf("Files in Directory are: \n");
	k = system("ls -l"); //k is equal to the response code from the command
	printf("%d is returned.\n", k); //if ls worked, k should be 0
	return k;
}

Sometimes you don’t want to actually give control back, which is where execl() comes in

This command creates a new process and replaces it with the process in execl(), effectively deleting the old process

The synopsis is as follows

#include <unistd.h>
//path is the path to the new executable
//arg0 is the same as the path or the file
//arg1 to argn are the actual arguments
//last parameter must be NULL
int execl(const char *path, const char *arg0,
..., const char *argn, char * /*NULL*/);

Here’s an example

#include <stdio.h>
#include <unistd.h>
int main() {
	printf("Files in Directory are:\n");
	execl("/bin/ls", "ls”, "-l", NULL);
	printf("This line should not be printed out!\n"); //this won't print
	return 0;
}

Every statement after execl() will not be executed

You can create a new subprocess with fork(), which allows you to run several process in parallel

The synopsis of fork() is as so

#include <unistd.h>
pid_t fork(); //notice that fork returns a process ID, much like getpid()

The only difference between a child process and a parent process is the return value of fork()

Here’s an example of fork() in action

#include <stdio.h>
#include <unistd.h>
int main(){
	int pid; /* Process identifier */
	pid = fork();
	if ( pid < 0 ) {
		printf("Cannot fork!!\n"); exit(1);
	} else if ( pid == 0 ) {
		/* Child process */ ......
	} else {
		/* Parent process */ ....
	}
}

Piping

system() uses the same standard IO as the main program, but what if you don’t want that?

Something you can do instead is use the popen() function in stdio.h

#include <stdio.h>
FILE *popen(const char *command, const char *mode); //opens the file
int pclose(FILE *fp); //closes the file

If the mode is r, popen() returns a pointer that can read the standard input of command

If the mode is w, popen() returns a pointer that can write the standard input of command

The following is an example of this in action

int main() {
	FILE *fp;
	char buffer[100]; //used for getting lines from fgets
		if ((fp = popen("ls -l", "r")) != NULL) {
			//loop through max 100 lines and print them
			//buffer is the max number of characters printed at one time
			while(fgets(buffer, 100, fp) != NULL) {
				printf("Line from ls:\n");
				printf(" %s\n", buffer);
			}
			pclose(fp);
		}
	return 0;
}