.ZIP

# CSCI 620 Program 5: Huffman Compression, Step 2 Solution

1 The Goals and the Purpose.

1. To use an array and a heap.

2. To implement the rst two phases of a Human Encoding application.

A Human code can be used to compress a le. It replaces xed-size, one-byte characters by variable- length codes (strings of bits). The codes for the most frequently used characters will be shorter than 8 bits, those for unusual characters may be longer. A new code is dened for each le.

This helps to keep the codes short, since no codes are generated for characters that are not used in the le. The input le is read twice, once to generate the code, then again to encode the le. This assignment and the next two form one large project to generate and use a Huffman code. It is essential

to debug this part before going on to the rest of the project.

2 Modules to Create

Class Tally You have a complete and working Tally class from Program 2. Include your Tally class as part of this project.

Class Node and Tree

Dene a tree Node with four data members: a char, an int, and two Node*s. Also typedef Tree to be a Node*.

Write Node constructor that takes two parameters: a char (the input character), and an int (its frequency). Use the parameters and two nullptrs to initialize the Node object.

Write a second Node constructor that takes two dierent parameters: two Node*s (the left and right sons of the new node). Use the parameters to initialize the pointer elds of the Node object.

Set the char eld to any arbitrary visible char value: you will never use it but you need to be able to see it on a printout. Set the frequency eld to the sum of the frequencies of the left and right sons.

You may also need a default constructor for Node. If so, use =default.

For this assignment, dene a null default destructor. A real destructor will be needed in the last phase of this program, but not at this stage. The output of this phase will be a binary tree of Nodes that contains every Node that you have allocated.

Write a print function that displays all four elds. The pointers will be printed in hex if you use

<<.

Write accessor functions if and only if you need them.

A Heap Class Adapt the function denitions from the class Heap example to implement a MIN-heap that stores Trees instead of numbers.

Your Heap should have one data member: a vector<Node*.

Write a Heap constructor with one parameter, an array of type Tally. See details below.

Implement the downHeap, buildHeap, and printHeap functions rst, and debug that much.

After your heap works, implement push() and pop() to put things into the heap and take them out.

Also write a function reduceHeap() that calls push() and pop() to reduce the heap to one element. See details below.

The Heap Constructor

Start by pushing a dummy Node* into the vector to occupy subscript 0. We want the real data

to start at subscript 1.

Then process the data in the tally vector:

{ Take each non-zero Tally in the array.

{ Use the character and the counter to create a new Node*.

{ Push the Node* into the vector.

Call the buildHeap function to arrange the data in heap order, according to the frequency of the

letters, with the least frequent letter in slot 1 of the vector

reduceHeap() The goal of this function is to combine all the Nodes in the heap into one Human

tree. Repeat this until there is only one node left:

Call pop() to remove the node at the root, replace it by the last node in the heap, and perform

a downHeap() operation to re-establish heap order. This node will be the left son of a new node.

Call pop() again to remove the node at the root, replace it by the last node in the heap, and

perform a downHeap() operation to re-establish heap order. This node will be the right son of a

new node.

Create a new Node with these to Node*s.

Push the new node onto the end of the heap and perform an upHeap() operation to re-establish

heap order.

The main Function

1. Create a Tally object and use it, as you did in P2, to tally the characters in an input le. Main

should own the Tally array.

2. Construct a Heap object using the lled Tally array. The constructor will call buildHeap().

When the heap constructor nishes, print the heap array and make sure the heap is correct. Then go

on to the next phase:

1. Call reduceHeap() to unify the nodes in the Heap.

2. Print the resulting tree that is in subscript 1 of the heap. To do this, you will need to do a

recursive tree walk. In this recursion, print the contents of a node then indent four columns and

recursively print the let son, then the right son. Return from the recursion when the tree you are

trying to print is a nullptr; print \|||-" instead. (Remember: every leaf node has two nullptr

sons.)

3 Future Work

The last two phases of the Human project are:

P6: Do a recursive treewalk to generate a code. Write the code to a binary le.

P7: Reopen the original text le. Encode each letter in it. Write the encodings to a binary le.

In a real Human project, the code and the encoded message are written to the same le. I am asking

you to keep them separate for reasons related to debugging and grading.

1. To use an array and a heap.

2. To implement the rst two phases of a Human Encoding application.

A Human code can be used to compress a le. It replaces xed-size, one-byte characters by variable- length codes (strings of bits). The codes for the most frequently used characters will be shorter than 8 bits, those for unusual characters may be longer. A new code is dened for each le.

This helps to keep the codes short, since no codes are generated for characters that are not used in the le. The input le is read twice, once to generate the code, then again to encode the le. This assignment and the next two form one large project to generate and use a Huffman code. It is essential

to debug this part before going on to the rest of the project.

2 Modules to Create

Class Tally You have a complete and working Tally class from Program 2. Include your Tally class as part of this project.

Class Node and Tree

Dene a tree Node with four data members: a char, an int, and two Node*s. Also typedef Tree to be a Node*.

Write Node constructor that takes two parameters: a char (the input character), and an int (its frequency). Use the parameters and two nullptrs to initialize the Node object.

Write a second Node constructor that takes two dierent parameters: two Node*s (the left and right sons of the new node). Use the parameters to initialize the pointer elds of the Node object.

Set the char eld to any arbitrary visible char value: you will never use it but you need to be able to see it on a printout. Set the frequency eld to the sum of the frequencies of the left and right sons.

You may also need a default constructor for Node. If so, use =default.

For this assignment, dene a null default destructor. A real destructor will be needed in the last phase of this program, but not at this stage. The output of this phase will be a binary tree of Nodes that contains every Node that you have allocated.

Write a print function that displays all four elds. The pointers will be printed in hex if you use

<<.

Write accessor functions if and only if you need them.

A Heap Class Adapt the function denitions from the class Heap example to implement a MIN-heap that stores Trees instead of numbers.

Your Heap should have one data member: a vector<Node*.

Write a Heap constructor with one parameter, an array of type Tally. See details below.

Implement the downHeap, buildHeap, and printHeap functions rst, and debug that much.

After your heap works, implement push() and pop() to put things into the heap and take them out.

Also write a function reduceHeap() that calls push() and pop() to reduce the heap to one element. See details below.

The Heap Constructor

Start by pushing a dummy Node* into the vector to occupy subscript 0. We want the real data

to start at subscript 1.

Then process the data in the tally vector:

{ Take each non-zero Tally in the array.

{ Use the character and the counter to create a new Node*.

{ Push the Node* into the vector.

Call the buildHeap function to arrange the data in heap order, according to the frequency of the

letters, with the least frequent letter in slot 1 of the vector

reduceHeap() The goal of this function is to combine all the Nodes in the heap into one Human

tree. Repeat this until there is only one node left:

Call pop() to remove the node at the root, replace it by the last node in the heap, and perform

a downHeap() operation to re-establish heap order. This node will be the left son of a new node.

Call pop() again to remove the node at the root, replace it by the last node in the heap, and

perform a downHeap() operation to re-establish heap order. This node will be the right son of a

new node.

Create a new Node with these to Node*s.

Push the new node onto the end of the heap and perform an upHeap() operation to re-establish

heap order.

The main Function

1. Create a Tally object and use it, as you did in P2, to tally the characters in an input le. Main

should own the Tally array.

2. Construct a Heap object using the lled Tally array. The constructor will call buildHeap().

When the heap constructor nishes, print the heap array and make sure the heap is correct. Then go

on to the next phase:

1. Call reduceHeap() to unify the nodes in the Heap.

2. Print the resulting tree that is in subscript 1 of the heap. To do this, you will need to do a

recursive tree walk. In this recursion, print the contents of a node then indent four columns and

recursively print the let son, then the right son. Return from the recursion when the tree you are

trying to print is a nullptr; print \|||-" instead. (Remember: every leaf node has two nullptr

sons.)

3 Future Work

The last two phases of the Human project are:

P6: Do a recursive treewalk to generate a code. Write the code to a binary le.

P7: Reopen the original text le. Encode each letter in it. Write the encodings to a binary le.

In a real Human project, the code and the encoded message are written to the same le. I am asking

you to keep them separate for reasons related to debugging and grading.

You'll get a 474.1KB .ZIP file.