CS 485 Programming Assignment 1
A counted string library

Due: Friday, 12 February, 2016 by 11:59:59 PM.

Programming assignments in CS 485G are individual work. You may discuss approaches with other students, but may not share code or pseudocode for the assignment. If do get ideas from somebody, or use snippets of code from elsewhere, you must cite the source in your documentation.

For this assignment, you are explicitly permitted to share the disassembled code of test-full.o, should you and your classmates wish to work together to figure out exactly what the tests are doing.

See this note if you are using C++ and test case 5 fails only under valgrind (make memcheck).

A new alternative Makefile has been added for C++ users.

Contents

Background

The built-in C string library uses null-terminated strings, where a "string" is simply (a pointer to) an array of characters, with the character '\0' (NUL) marking the end of the string. While this is a convenient representation because a string parameter can be a simple primitive type (char * or const char *), it has several problems:

The most common alternative to null-terminated strings are so-called counted strings. A counted string stores both the data and its length. An example data structure for a counted string (similar to the one you will use in your program) would be:

    struct kstring
    {
        char *data;
        size_t length;
    };

Your task

Your task is to implement a library (as a set of source and header files, documentation, and a Makefile) that provides a few useful functions for working with a counted string data structure. The definition of the data structure and prototypes for all the functions may be found in kstring.h. A set of function stubs, which you may use as the basis for your implementation, is provided in kstring-stubs.c.

Provided files

Makefile-cpp is an alternative Makefile if you are writing in C++. Just rename it to "Makefile" and name your library kstring.cc or kstring.cpp

Specifications

Your library should implement the following eight functions. Prototypes for the functions, and the definition of the kstring struct (like a class, but with public data and no member functions), are in the provided kstring.h.

kstring kstralloc(size_t nbytes)

Creates and returns a new kstring object with the specified length. The .length member of the returned kstring should be equal to nbytes. The .data member should be a pointer to newly-allocated memory; the memory should be initialized by filling it with nbytes copies of the null byte ('\0').

If there is an error allocating memory, this function should call abort() or throw an uncaught exception.

Note: your program must support allocating (and subsequently using and freeing) zero-byte strings.

kstring kstrfrom(const char *cstr)

Creates and returns a new kstring object that contains a copy of the contents of a null-terminated C string, including the null terminator.

The .length member of the returned kstring should be the length of cstr, plus one for the null terminator. The .data member should be a pointer to newly-allocated memory, into which you have copied the contents of cstr, including the null byte at the end.

If there is an error allocating memory, this function should call abort() or throw an uncaught exception.

void kstrfree(kstring str)

Frees the memory to which str.data points. Should correctly handle any kstring created with either kstralloc() or kstrfrom(), including those with length 0.

Note: Since the kstring is passed by value, setting its data pointer to NULL would not be reflected in the caller. This means that it is the caller's responsibility not to re-use the kstring it passed to this function.

Note: Remember to free the memory in the way appropriate to how you allocated it: if you used new char[N], you should free with delete[]; if you used malloc() or calloc(), you should use free().

size_t kstrlen(kstring str)

Returns the length of the kstring str (that is, str.length).

char kstrget(kstring str, size_t pos)

Returns the character at index pos in str's data.

If pos is out of bounds (greater than or equal to the length of str), this function should abort the program: either by throwing an uncaught exception (C++), or by calling abort() (C or C++).

void kstrput(kstring str, size_t pos, char ch)

Replaces the character at index pos in str's data with the character ch.

If pos is out of bounds (greater than or equal to the length of str), this function should abort the program: either by throwing an uncaught exception (C++), or by calling abort() (C or C++).

void kstrextend(kstring *strp, size_t nbytes)

Modifies an existing kstring, pointed to by strp, to be at least nbytes bytes long.

If strp->length was already nbytes or longer, does nothing. That is, this function will never reduce the length of a string, only increase it.

If nbytes is longer than the current length, this function should take the following steps:

  1. Allocate a new array with the larger size.
  2. Copy data over from the old array to the new one.
  3. Fill the additional elements of the new array with null bytes ('\0').
  4. Free the old array.
  5. Make strp->data point to the new array.
  6. Set strp->length to the new size.
Note: if you are using malloc() for memory allocation, the function realloc() will take care of steps 1, 2, and 4 with a single function call.

If there is an error allocating memory, this function should call abort() or throw an uncaught exception.

Note that this function takes a pointer to a kstring rather than taking a kstring by value. That means that changes you make to the strp->length and strp->data members will be visible to the caller.

void kstrcat(kstring *destp, kstring src)

Concatenates str onto the end of *destp. First extends *destp to be long enough to hold the contents of both strings, by calling kstrextend(). Then copies the contents of src into the dest

So, for example, if we execute the following code:

      kstring a = kstrfrom("hello"); // 6 bytes including \0
      kstring b = kstrfrom("world"); // 6 bytes including \0
      kstrcat(&a, b);
Now a should have length 12, and the contents "hello\0world\0".

Note that this function takes a pointer to the destination kstring. That means that changes you make to the destp->length and destp->data members will be visible to the caller.

Note: Remember to save the original length of *destp before calling kstrextend(). Otherwise, you won't know where to copy src's data to!

Constraints and technical restrictions

  1. Your library may be written in either C or C++. You may use any version of the C or C++ standard, but you must edit the Makefile to use the correct version flag in CFLAGS:
  2. Your library should compile with the -Wall compiler flag (enable all warnings), without producing any compiler or linker warnings or errors.
  3. Your library must work with the provided kstring.h without modification.
  4. You may not use C++ data structure or smart pointer classes such as vector, uniq_ptr, string, and so on.
  5. All dynamic memory allocation should use either new, delete, and delete[] (C++); or malloc, calloc, realloc, and free (either C or C++).
  6. You may use the standard C string functions (strncpy, strlen, etc.). However, do keep in mind that they do not properly handle data containing null bytes, so it is probably a mistake to use them anywhere except kstrfrom (which takes a null terminated string as its argument).
  7. It must be possible to link and run your library with the provided test-full.o on the class virtual machines.
  8. If memory allocation fails, your program should abort. In C++, an uncaught exception (such as the one thrown by new if allocation fails) aborts the program. In C, you must detect the condition and call abort(). Simply exiting the program with exit() is not enough!

Test cases

Two test suites are provided for your program. The first, test-abbrev.c, contains thirteen functions, each implementing a test cases, as well as a main() that runs all the tests and tallies the number of successes (the function returns a positive number), failures (the function returns zero), and skipped tests (the function returns a negative number). You are welcome to modify this file, for example to add your own test cases.

The second test suite is provided as an object file, test-full.o. This test suite is similar to test-abbrev, but includes thirteen additional test cases, for a total of twenty-six. Source code is not provided for the additional tests, and you may not modify this file (though you are welcome to reverse-engineer it if you'd like).

A Makefile is provided that will build both test programs, given an object file kstring.o. If you are using C++, use Makefile-cpp instead (and rename it to "Makefile").

Grading will be based in part on the performance of your library under the test-full suite. Your program should pass all twenty-six tests. Note that, if your library causes a test case to crash, you are considered to have failed that test and all the subsequent tests. The three test cases that expect your library to abort are constructed so that aborting does not terminate the test program itself.

The twenty-six test cases in test-full are as follows. The ones marked with an asterisk (*) also appear, with source code, in test-abbrev.c.

  1. * kstralloc + kstrfree (simply checks that calling the two functions properly does not crash)
  2. * kstralloc initializes memory to 0
  3. * kstralloc sets length
  4. kstralloc sets length (big)
  5. * kstralloc sets length (0 bytes)
  6. kstralloc aborts on allocation failure (expected to abort)
  7. * kstrlen matches allocation
  8. kstrlen matches allocation (big)
  9. kstrlen matches allocation (0 bytes)
  10. * kstrfrom gives correct length
  11. * kstrfrom contains null byte
  12. kstrfrom contains correct data
  13. kstrfrom copies, not shares, data
  14. * kstrget fetches all indices correctly
  15. kstrget aborts when out of bounds (expected to abort)
  16. * kstrput sets all indices correctly
  17. kstrput aborts when out of bounds (expected to abort)
  18. * kstrextend lengthens kstring
  19. * kstrextend does not shorten kstring
  20. kstrextend supports length-0 kstring
  21. kstrextend copies old contents
  22. kstrextend extends with null bytes
  23. kstrcat of two empty kstrings ("empty" here means length zero)
  24. * kstrcat with empty kstring (second argument is empty)
  25. kstrcat onto empty kstring (first argument is empty)
  26. * kstrcat has correct data (neither argument is empty)

Memory safety

The test suites should execute without any memory leaks, crashes, out-of-bounds memory accesses, or incorrectly freed memory (freeing the same memory twice, attempting to free memory that was not dynamically allocated, or in C++ using the wrong version of delete).

Update: A note about valgrind and C++

If you are using C++ and the new operator for your project, make memcheck will report the tests that throw exceptions (test 5, and possibly tests 14 and 16) as skipped, because of a limitation in valgrind's C++ support. This will not affect your grade, as long as those tests pass when run without valgrind.

To test your program for memory errors, you can use valgrind :

    valgrind --tool=memcheck --child-silent-after-fork=yes ./test-full
A Makefile rule is provided to run this command for you: make memcheck. Near the end of the output you want to see:
    ==22657== HEAP SUMMARY:
    ==22657==     in use at exit: 0 bytes in 0 blocks
    ...
    ==22657== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
    
You may find it helpful to consult the Valgrind manual, especially the Quick Start Guide at the beginning.

Note: It is acceptible for your program to leak memory when it aborts; the three relevant test cases already take that possibility into account.

Documentation

Your submission should include a README file. This should be a plain text file with at least the following sections:

What to submit

Submit a zip or tar.gz file containing a directory with the following files:

To make a .tar.gz archive of the directory program1, you can use a command like:

    tar czvf cs485-program1.tar.gz program1/
To make a .zip archive:
    zip -r cs485-program1.zip program1/

Submit your .tar.gz or .zip file at the CS Portal website, under course CS485G006 and assignment "Programming Assignment 1 - kstring library".

Grading

This assignment will be scored out of 100 points:

Hints and FAQs

  1. I would recommend implementing your functions in the order they appear in the test cases. First get the second test to pass without crashing, then the third test, then the fourth, and so on.
  2. Be sure to use valgrind to check your program for memory leaks and array bounds exceptions.
  3. My implementation of kstring.c came out to around 90 lines not including comments. Yours may be shorter or longer of course, but if it gets to be more than a few hundred lines, you may be making the task more complicated than it needs to be.