BNK File Converter




Assume that you are working for a local bank to develop a
new banking system. The current system stores bank accounts in a text file,
commonly called "comma separated values" (CSV). The first line in
this file contains the field names and remaining ones have detailed information
of the stored accounts, each line for an account. All data

values are separated by This is an example snippet of this
file.

Account, Name, Balance

333,Kate "son, 1512.34

101,Adam smith,100.23

212,Mary Lee,-10.56

It is easy to see that this is not an efficient to store and
manage data. For example, when a customer deposits a check, the current system
needs to search for his account and update the new balance. Because this is a
text file, to search for an account, the system needs to read all the file
content until it finds the corresponding account. Then, it needs to rewrite the
whole file so the line for that account can be updated with the new balance.
(There is no Other way to update a line in a text file). Thus, in the new
system, you decide to store bank accounts in a binary file because you could
read and update a specified part of a binary file at the same time without
touching others. In addition, you can also maintain an in-memory index which
can help the search for the accounts stored in the file faster. You call this
file format BNK and design it with the following structure.

Header

The first part of a BNK file is the header. It is 32-byte
long and stores the following information:

• The first 4 bytes is the char sequence BANK, which is used
as a signature for BNK files. That means, if a file

does not contain 4 characters BANK at the beginning, it is
not a BNK file.

• The next item is the total number of accounts, stored as a
4-byte integer.

• The remaining 24 bytes are space reversed for future
usages.

This header can be declared in C++ as a struct, for example:


struct BNKHeader

char signature (41; // B int numberOfAccounts;

char reserved1241;

Account Data

If N is the total number Of accounts, there will be N
account records stored consecutively after the header. Each  record stores the account number (a 4-byte
integer), the holder name (at most 20 characters including the ending  NULL), the balance (a double value of 8
bytes), and a reserved space Of 96 bytes. That means, the size Of an account
record is 128 bytes. We can declare such a record as a struct in C++ as the
following:

 

struct BNKAccount

int number;

char name12e);

double balance;

char reserved 1961;

Index Data

The last part Of a BNK file contains N consecutive index
records. Each record contains the account number and the position Of the
corresponding account record in the BNK file. For those index records, the
account numbers are sorted increasingly, so can use binary search later on it.
For example, for the account data given above in the CSV file, the record Of
account 333 is stored as position 32 (right after the 32-byte header), that Of
account 101 is stored at position 32+128—160, and that Of account 212 is stored
at position 160+128—288. The index data WII

contain the following records: (101, 160), (212, 288), and
(333, 32). With such file positions, can access an account easily. For example,
after loading the index into if we want to know the record Of account 212, will
search for account number 212 in the index and find its position is 288 in the
BNK file.

Index records can be declared using the following struct in
C++:

struct BNKIndex

int accountNumber;

long filePosition;

Attention: To
ensure the portability of your code, you should always use function sizeof to
compute the data size Of those structs. For example, to read to the memory an
array Of N BNKlndex structs frorn a binary file, you could use this statement:

file. read((char*) index, N * sizeof(BNKIndex));

Content:

An example CSV file can found here

Programming Tasks:

Task 1 (20 points):

You need to write a program to convert a CSV file to a BNK
file. This program can work following this procedure:

• Open the CSV file (as text, for input) and BNK file (as
binary, for output).

• Read quickly through the CSV file (e.g. just counting the
number Of lines without parsing each line) to determine  the total number Of accounts (N).

• Put N into a BNKHeader struct and write it to the BNK
file.

• Allocate a (dynamic) array of N BNKlndex records for the
in-memory index data.

• Reread the CSV file line by line, parse each line into a
BNKAccount record, and Mite it to the BNK file. Before writing, you also add to
the in-memory index array the information for this BNKAccount record (e.g. its
account  number and its corresponding
file position, which is provided by function tellp).

• After parsing the CSV file, you sort the index array by
account numbers, and write it to BNK file.
Powered by