Streams & Files

CPP09 W2 M1 Streams and Files

Learning Objective

Understanding streams
Understand the stream class hierarchy
Understand the concepts of stream insertion and extraction
Use streams for file input and output
Distinguish between text and binary file input and output
Write programs for random access of data files using:
- get pointer
- put pointer
- seekg()
- tellg()
- seekp()
- tellp()

Streams and Files

Introduction

In simple words, a stream is a sequence of bytes. In input operations, the bytes are transferred from a device (a keyboard, a disk drive, a network connection etc.) to the main memory. Where as in case of output operations, this is reverse. That means, the bytes are transferred from main memory to a device such as a display screen, a printer, a disk drive, network connection, a tape( a file on tape) etc.

Streams

In C++, a stream is a source or destination for collection of characters. Streams are of two types:

Output stream
Input Stream

Output streams allow you to write or store characters, and input streams allow you to read or fetch character. Applications can use both these streams.

The Stream Class Hierarchy

The stream classes are arranged in a rather complex hierarchy. Figure below shows the arrangement of the most important of these classes.

We’ve already made extensive use of some stream classes. The extraction operator >> is a member of the istream class, and the insertion operator << is a member of the ostream class. Both of these classes are derived from the ios class. The cout object, representing the standard output stream, which is usually directed to the video display, is a predefined object of the ostream_withassign class, which is derived from the ostream class. Similarly, cin is an object of the istream_withassign class, which is derived from istream.
The istream and ostream classes are derived from ios and are dedicated to input and output, respectively. The istream class contains such functions as get(), getline(), read(), and the overloaded extraction (>>) operators, while ostream contains put() and write(), and the overloaded insertion (<<) operators.
The iostream class is derived from both istream and ostream by multiple inheritance. Classes derived from it can be used with devices, such as disk files, that may be opened for both input and output at the same time. Three classes—istream_withassign, ostream_withassign, and iostream_withassign—are inherited from istream, ostream, and iostream, respectively. They add assignment operators to these classes.
The ios class is the granddaddy of all the stream classes, and contains the majority of the features you need to operate C++ streams. The three most important features are the formatting flags, the error-status flags, and the file operation mode.

Stream Insertion and Extraction

Stream Insertion
Stream classes have their own member data, functions definitions. Class ostream contains functions defined for output operations. These operations are called stream insertions. The <<>
cout<<"Hello World\n";
translates as: the text string to the right of the inserter is to be stored in the stream object on the left. The <<>
Stream Extraction
The opposite of insertion is extraction, which is fetching of data from an input stream. Input stream operations are defined in istream class. The >> operator, called the extractor, can accept any fundamental data type. The most important point to note is that the extractor skips leading white spaces(,\n\t') while extracting any of these data types. cin is the predefined object if the istream class attached to standard input device which is keyboard.

User defined Streams

The input and output streams discussed so far dealt only with standard input and output. C++ also provides specific classes which deal with user-defined streams. User-defined streams are in the form of files. In C++ a files is linked to a stream. Before a file can be opened. a stream must be obtained. These streams are more powerful than the pre-defined streams.

File I/O with Streams

Most programs need to save data to disk files and read it back in. Working with disk files requires another set of classes: ifstream for input, fstream for both input and output, and ofstream for output. Objects of these classes can be associated with disk files, and we can use their member functions to read and write to the files.
Referring back to Figure previously, you can see that ifstream is derived from istream, fstream is derived from iostream, and ofstream is derived from ostream. These ancestor classes are in turn derived from ios. Thus the file-oriented classes derive many of their member functions from more general classes. The file-oriented classes are also derived, by multiple inheritance, from the fstreambase class. This class contains an object of class filebuf, which is a fileoriented buffer, and its associated member functions, derived from the more general streambuf class. You don’t usually need to worry about these buffer classes.
The ifstream, ofstream, and fstream classes are declared in the FSTREAM file.

Formatted File I/O

In formatted I/O, numbers are stored on disk as a series of characters. Thus 6.02, rather than being stored as a 4-byte type float or an 8-byte type double, is stored as the characters ‘6’, ‘.’, ‘0’, and ‘2’. This can be inefficient for numbers with many digits, but it’s appropriate in many situations and easy to implement. Characters and strings are stored more or less normally.

Writing Data

The following program writes a character, an integer, a type double, and two string objects to a disk file. There is no output to the screen. Here’s the listing for FORMATO:

// formato.cpp
// writes formatted output to a file, using <<
#include  //for file I/O
#include 
#include 
using namespace std;
int main()
{
char ch = ‘x’;
int j = 77;
double d = 6.02;
string str1 = "Kafka"; //strings without
string str2 = "Proust"; // embedded spaces
ofstream outfile("fdata.txt"); //create ofstream object
if(!outfile){
 cout<<"Error Opening File";
 return 0;
}
outfile << ch //insert (write) data
        << j
        << ‘ ‘ //needs space between numbers
        << d
        << str1
        << ‘ ‘ //needs spaces between strings
        << str2;
cout << "File written\n";
return 0;
}

Here we define an object called outfile to be a member of the ofstream class. At the same time, we initialize it to the file FDATA.TXT. This initialization sets aside various resources for the file, and accesses or opens the file of that name on the disk. If the file doesn’t exist, it is created. If it does exist, it is truncated and the new data replaces the old. The outfile object acts much as cout did in previous programs, so we can use the insertion operator (<<) to output variables of any basic type to the file. This works because the insertion operator is appropriately overloaded in ostream, from which ofstream is derived.
When the program terminates, the outfile object goes out of scope. This calls its destructor, which closes the file, so we don’t need to close the file explicitly. There are several potential formatting glitches. First, you must separate numbers (such as 77 and 6.02) with nonnumeric characters. Since numbers are stored as a sequence of characters, rather than as a fixed-length field, this is the only way the extraction operator will know, when the data is read back from the file, where one number stops and the next one begins. Second, strings must be separated with whitespace for the same reason. This implies that strings cannot contain imbedded blanks. In this example we use the space character (‘ ‘) for both kinds of delimiters. Characters need no delimiters, since they have a fixed length. You can verify that FORMATO has indeed written the data by examining the FDATA.TXT file with the Windows WORDPAD accessory or the DOS command TYPE.

Reading Data

We can read the file generated by FORMATO by using an ifstream object, initialized to the name of the file. The file is automatically opened when the object is created. We can then read from it using the extraction (>>) operator.
Here’s the listing for the FORMATI program, which reads the data back in from the FDATA.TXT file:

// formati.cpp
// reads formatted output from a file, using >>
#include  //for file I/O
#include 
#include 
using namespace std;

int main()
{
char ch;
int j;
double d;
string str1;
string str2;
ifstream infile("fdata.txt"); //create ifstream object
if(!infile){
  cout<<"File Cannot be opened";
  return 0;
  }
//extract (read) data from it
infile >> ch >> j >> d >> str1 >> str2;
cout << ch << endl //display the data
     << j << endl
     << d << endl
     << str1 << endl
     << str2 << endl;

return 0;
}

Here the ifstream object, which we name infile, acts much the way cin did in previous programs. Provided that we have formatted the data correctly when inserting it into the file, there’s no trouble extracting it, storing it in the appropriate variables, and displaying its contents. The program’s output looks like this:

x
77
6.02
Kafka
Proust

Of course the numbers are converted back to their binary representations for storage in the program. That is, the 77 is stored in the variable j as a type int, not as two characters, and the 6.02 is stored as a double.

Character I/O

The Character I/O functions such as get() and put() can be used when the programmer wishes to read / write white space characters also. So, with Character I/O the problem of accepting and writing character/string with white spaces is solved.
The put() and get() functions, which are members of ostream and istream, respectively, can be used to output and input single characters. Here’s a program, OCHAR, that outputs a string, one character at a time:

// ochar.cpp
// file output with characters
#include  //for file functions
#include 
#include 
using namespace std;
int main()
{
string str = "Time is a great teacher, but unfortunately "
             "it kills all its pupils. Berlioz";
ofstream outfile("TEST.TXT"); //create file for output

if(!outfile){
  cout<<"File Cannot be opened";
  return 0;
  }

for(int j=0; j<str.size(); j++) //for each character,
 outfile.put( str[j] ); //write it to file
cout << "File written\n";
return 0;
}

The length of the string object str is found using the size() member function, and the characters are output using put() in a for loop.
We can read this file back in and display it using the ICHAR program.

// ichar.cpp
// file input with characters
#include  //for file functions
#include 
using namespace std;
int main()
{
char ch; //character to read
ifstream infile("TEST.TXT"); //create file for input

if(!infile){
  cout<<"File Cannot be opened";
  return 0;
  }

while( infile ) //read until EOF or error
{
 infile.get(ch); //read character
 cout << ch; //display it
}
cout << endl;
return 0;
}

Binary Input and Output

All the input and output opeartions you have seen so far are text or character based. That is, all information is stored in the same format as it would be dispalyed on the screen.
You can write a few numbers to disk using formatted I/O, but if you’re storing a large amount of numerical data it’s more efficient to use binary I/O, in which numbers are stored as they are in the computer’s RAM memory, rather than as strings of characters. In binary I/O an int is stored in 4 bytes, whereas its text version might be "12345", requiring 5 bytes. Similarly, a float is always stored in 4 bytes, while its formatted version might be "6.02314e13", requiring 10 bytes.
Our next example shows how an array of integers is written to disk and then read back into memory, using binary format. We use two new functions: write(), a member of ofstream; and read(), a member of ifstream. These functions think about data in terms of bytes (type char). They don’t care how the data is formatted, they simply transfer a buffer full of bytes from and to a disk file. The parameters to write() and read() are the address of the data buffer and its length. The address must be cast, using reinterpret_cast, to type char*, and the length is the length in bytes (characters), not the number of data items in the buffer. Here’s the listing for BINIO:

// binio.cpp
// binary input and output with integers
#include  //for file streams
#include 
using namespace std;
const int MAX = 100; //size of buffer
int buff[MAX]; //buffer for integers
int main()
{
for(int j=0; j<MAX; j++) //fill buffer with data
buff[j] = j; //(0, 1, 2, ...)
             //create output stream
ofstream os("edata.dat", ios::binary);
if(!os){
        cerr << "File cannot be opened\n";
}
//write to it
os.write( reinterpret_cast<char*>(buff), MAX*sizeof(int) );
os.close(); //must close it
for(j=0; j<MAX; j++) //erase buffer
buff[j] = 0;
//create input stream
ifstream is("edata.dat", ios::binary);
//read from it
is.read( reinterpret_cast<char*>(buff), MAX*sizeof(int) );
for(j=0; j<MAX; j++) //check data
if( buff[j] != j )
{
  cerr << "Data is incorrect\n";
  return 1;
  }
  cout << "Data is correct\n";
  return 0;
}

You must use the ios::binary argument in the second parameter to write() and read() when working with binary data. This is because the default, text mode, takes some liberties with the data. For example, in text mode the ‘\n’ character is expanded into two bytes—a carriagereturn and a linefeed—before being stored to disk. This makes a formatted text file more readable by DOS-based utilities such as TYPE, but it causes confusion when it is applied to binary data, since every byte that happens to have the ASCII value 10 is translated into 2 bytes. The ios::binary argument is an example of a mode bit. We’ll say more about this when we discuss the open() function later in this chapter.

The reinterpret_cast Operator

In the BINIO program (and many others to follow) we use the reinterpret_cast operator to make it possible for a buffer of type int to look to the read() and write() functions like a buffer of type char.
is.read( reinterpret_cast(buff), MAX*sizeof(int) );
The reinterpret_cast operator is how you tell the compiler, “I know you won’t like this, but I want to do it anyway.” It changes the type of a section of memory without caring whether it makes sense, so it’s up to you to use it judiciously. You can also use reinterpret_cast to change pointer values into integers and vice versa. This is a dangerous practice, but one which is sometimes necessary.

Closing Files

So far in our example programs there has been no need to close streams explicitly because they are closed automatically when they go out of scope; this invokes their destructors and closes the associated file. However, in BINIO, since both the output stream os and the input stream is are associated with the same file, EDATA.DAT, the first stream must be closed before the second is opened. We use the close() member function for this. You may want to use an explicit close() every time you close a file, without relying on the stream’s destructor. This is potentially more reliable, and certainly makes the listing more readable.

Object I/O

Since C++ is an object-oriented language, it’s reasonable to wonder how objects can be written to and read from disk. The next examples show the process.

Writing an Object to Disk

When writing an object, we generally want to use binary mode. This writes the same bit configuration to disk that was stored in memory, and ensures that numerical data contained in objects is handled properly. Here’s the listing for OPERS, which asks the user for information about an object of class person, and then writes this object to the disk file PERSON.DAT:

// opers.cpp
// saves person object to disk
#include  //for file streams
#include 
using namespace std;
////////////////////////////////////////////////////////////////
class person //class of persons
{
protected:
          char name[80]; //person’s name
          short age; //person’s age
public:
       void getData() //get person’s data
       {
        cout << “Enter name: “; cin >> name;
        cout << “Enter age: “; cin >> age;
        }
};

////////////////////////////////////////////////////////////////
int main()
{
person pers; //create a person
pers.getData(); //get data for person
//create ofstream object
ofstream outfile(“PERSON.DAT”, ios::binary);
if(!outfile){
 cerr << "File cannot be Opened";
 return 0;
}
//write to it
outfile.write(reinterpret_cast<char*>(&pers), sizeof(pers));
return 0;
}

The getData() member function of person is called to prompt the user for information, which it places in the pers object. Here’s some sample interaction:

Enter name: Coleridge
Enter age: 62

The contents of the pers object are then written to disk, using the write() function. We use the sizeof operator to find the length of the pers object.

Reading an Object from Disk

Reading an object back from the PERSON.DAT file requires the read() member function. Here’s the listing for IPERS:

// ipers.cpp
// reads person object from disk
#include  //for file streams
#include 
using namespace std;
////////////////////////////////////////////////////////////////
class person //class of persons
{
protected:
          char name[80]; //person’s name
          short age; //person’s age
public:
       void showData() //display person’s data
       {
        cout << “Name: “ << name << endl;
        cout << “Age: “ << age << endl;
        }
};

////////////////////////////////////////////////////////////////
int main()
{
person pers; //create person variable
ifstream infile(“PERSON.DAT”, ios::binary); //create stream
if(!infile){
 cerr << "File cannot be opened";
 return 0;
 }
//read stream
infile.read( reinterpret_cast<char*>(&pers), sizeof(pers) );
pers.showData(); //display person
return 0;
}

The output from IPERS reflects whatever data the OPERS program placed in the PERSON.DAT file:

Name: Coleridge
Age: 62

Compatible Data Structures

To work correctly, programs that read and write objects to files, as do OPERS and IPERS, must be talking about the same class of objects. Objects of class person in these programs are exactly 82 bytes long: The first 80 are occupied by a string representing the person’s name, and the last 2 contain an integer of type short, representing the person’s age. If two programs thought the name field was a different length, for example, neither could accurately read a file generated by the other.
Notice, however, that while the person classes in OPERS and IPERS have the same data, they may have different member functions. The first includes the single function getData(), while the second has only showData(). It doesn’t matter what member functions you use, since they are not written to disk along with the object’s data. The data must have the same format, but inconsistencies in the member functions have no effect. However, this is true only in simple classes that don’t use virtual functions.
If you read and write objects of derived classes to a file, you must be more careful. Objects of derived classes include a mysterious number placed just before the object’s data in memory. This number helps identify the object’s class when virtual functions are used. When you write an object to disk, this number is written along with the object’s other data. If you change a class’s member functions, this number changes as well. If you write an object of one class to a file, and then read it back into an object of a class that has identical data but a different member function, you’ll encounter big trouble if you try to use virtual functions on the object. The moral: Make sure a class used to read an object is identical to the class used to write it.
You should also not attempt disk I/O with objects that have pointer data members. As you might expect, the pointer values won’t be correct when the object is read back into a different place in memory.

I/O with Multiple Objects

The OPERS and IPERS programs wrote and read only one object at a time. Our next example opens a file and writes as many objects as the user wants. Then it reads and displays the entire contents of the file. Here’s the listing for DISKFUN:

// diskfun.cpp
// reads and writes several objects to disk
#include  //for file streams
#include 
using namespace std;
////////////////////////////////////////////////////////////////
class person //class of persons
{
protected:
          char name[80]; //person’s name
          int age; //person’s age
public:
       void getData() //get person’s data
       {
            cout << “\n Enter name: “; cin >> name;
            cout << “ Enter age: “; cin >> age;
        }
        void showData() //display person’s data
        {
         cout << “\n Name: “ << name;
         cout << “\n Age: “ << age;
        }
};

////////////////////////////////////////////////////////////////
int main()
{
char ch;
person pers; //create person object
fstream file; //create input/output file
//open for append
file.open(“GROUP.DAT”, ios::app | ios::out |
ios::in | ios::binary );
do //data from user to file
{
cout << “\nEnter person’s data:”;
pers.getData(); //get one person’s data
//write to file
file.write( reinterpret_cast<char*>(&pers), sizeof(pers) );
cout << “Enter another person (y/n)? “;
cin >> ch;
}

while(ch==’y’); //quit on ‘n’

file.seekg(0); //reset to start of file

//read first person
file.read( reinterpret_cast<char*>(&pers), sizeof(pers) );
while( !file.eof() ) //quit on EOF
{
cout << “\nPerson:”; //display person
pers.showData(); //read another person
file.read( reinterpret_cast<char*>(&pers), sizeof(pers) );
}

cout << endl;
return 0;
}

Here’s some sample interaction with DISKFUN. The output shown assumes that the program has been run before and that two person objects have already been written to the file.

Enter person’s data:
Enter name: McKinley
Enter age: 22

Enter another person (y/n)? n Person: Name: Whitney Age: 20
Person: Name: Rainier Age: 21
Person: Name: McKinley Age: 22
Here one additional object is added to the file, and the entire contents, consisting of three objects, are then displayed.

The fstream Class

So far in this chapter the file objects we have created have been for either input or output. In DISKFUN we want to create a file that can be used for both input and output. This requires an object of the fstream class, which is derived from iostream, which is derived from both istream and ostream so it can handle both input and output.

The open() Function

In previous examples we created a file object and initialized it in the same statement:
ofstream outfile(“TEST.TXT”);
In DISKFUN we use a different approach: We create the file in one statement and open it in another, using the open() function, which is a member of the fstream class. This is a useful approach in situations where the open may fail. You can create a stream object once, and then try repeatedly to open it, without the overhead of creating a new stream object each time.

The Mode Bits

We’ve seen the mode bit ios::binary before. In the open() function we include several new mode bits. The mode bits, defined in ios, specify various aspects of how a stream object will be opened.
Mode bit for open function are.

in Open for reading (default for ifstream)
out Open for writing (default for ofstream)
ate Start reading or writing at end of file (AT End)
app Start writing at end of file (APPend)
trunc Truncate file to zero length if it exists (TRUNCate)
nocreate Error when opening if file does not already exist
noreplace Error when opening for output if file already exists, unless ate or app is set
binary Open file in binary (not text) mode

In DISKFUN we use ios::app because we want to preserve whatever was in the file before. That is, we can write to the file, terminate the program, and start up the program again, and whatever we write to the file will be added following the existing contents. We use ios:in and ios:out because we want to perform both input and output on the file, and we use ios:binary because we’re writing binary objects. The vertical bars between the flags cause the bits representing these flags to be logically combined into a single integer, so that several flags can apply simultaneously.
We write one person object at a time to the file, using the write() function. When we’ve finished writing, we want to read the entire file. Before doing this we must reset the file’s current position. We do this with the seekg() function, which we’ll examine in the next section. It ensures we’ll start reading at the beginning of the file. Then, in a while loop, we repeatedly read a person object from the file and display it on the screen. This continues until we’ve read all the person objects—a state that we discover using the eof() function, which returns the state of the ios::eofbit.

File Pointers

Each file object has associated with it two integer values called the get pointer and the put pointer. These are also called the current get position and the current put position, or—if it’s clear which one is meant—simply the current position. These values specify the byte number in the file where writing or reading will take place. (The term pointer in this context should not be confused with normal C++ pointers used as address variables.)
Often you want to start reading an existing file at the beginning and continue until the end. When writing, you may want to start at the beginning, deleting any existing contents, or at the end, in which case you can open the file with the ios::app mode specifier. These are the default actions, so no manipulation of the file pointers is necessary. However, there are times when you must take control of the file pointers yourself so that you can read from and write to an arbitrary location in the file. The seekg() and tellg() functions allow you to set and examine the get pointer, and the seekp() and tellp() functions perform these same actions on the put pointer.

Resources

PPT:

Chapter 14- File Processing C++ How to program by Dietle and Dietle
- 14.1 Introduction
- 14.2 The Data Hierarchy
- 14.3 Files and Streams
- 14.4 Creating a Sequential-Access File
- 14.5 Reading Data from a Sequential-Access File
- 14.6 Updating Sequential-Access Files
- 14.7 Random-Access Files
- 14.8 Creating a Random-Access File
- 14.9 Writing Data Randomly to a Random-Access File
- 14.10 Reading Data Sequentially from a Random-Access File
- 14.11 Example: A Transaction-Processing Program
- 14.12 Input/Output of Objects

Web Resources:

http://www.arachnoid.com/cpptutor/student3.html ---> Output Formatting]
http://www.cplusplus.com/doc/tutorial/files/ ---> Input/Output with files

I/O Stream Library:

http://www.cplusplus.com/reference/iostream/ios_base/ ---> ios_base
http://www.cplusplus.com/reference/iostream/ios/ ---> ios
http://www.cplusplus.com/reference/iostream/ ---> IO Stream Library
http://www.cplusplus.com/reference/iostream/fstream/ ---> fstream

Steps to solve the problem sets

Step 1: Read about Streams and Files given
Step 2: Try sample programs given (You can copy paste the programs and try your self)
Step 3: Go through the PPT on file processing.(you can copy paste the program and try your self)
Step 4: Go through the additional resources given.
Step 5: Attack the problem sets in the given order(Problem Set A and then Problem Set B ...).
Step 6: If you are not clear with the problem sets then contact your respective mentor for clarification.

Spend atleast 2 - 3 Hours from Step 1 to Step 6.
Try the programs given in the PPT for Random access files.

Problem Sets

Problem Set A

1. Write a C program that uses a binary file cricket.binary. The binary file is having the information of Indian cricket player statistics.
The binary file is divided into 2 parts.

1st part(Header part) is the header, having information of number of records in the file.
2nd part(Data part) contains the data of the players.

A record in binary file is the information of one player.
The layout of the file is

Header Part is of 4 Bytes( Size of Integer) -- Number of records(N).
Data Part is of N * Size of Player Bytes -- N Records.

The structure of the player is

//Player Class
class Player{
Private:
    int id;       //#ID -- Unique ID given to each player and starts from 1.
    char name[50];//#Name -- Name of the Player
    int career;   //#Career -- Career Started year
    int matches;  //#Matches -- Number of Matches Played
    int runs;     //#Runs -- Total runs scored
    int highest;  //#Highest -- Highest score of the player
    int r50;      //#50's -- Number of half centuries of the player.
    int r100;     //#100's -- Number of Centuries of the player.
    float average;//#Average -- Average score of the player. (Total Runs/Number of Matches played)
Public:
//Declare the Implementation specific functions here and define them.
};

The Program should display the Menu with the following five choices

Print all player details
1. Display the details of all the players in a tabular format
Print Individual Player details
1. Display the Individual player details based on the ID enter by the user.
Update a Player Record for the Match.
1. User will enter the ID of the Player to be updated
2. IF Player is not found - print Player not found and redisplay the MENU
3. Display the Player details.
4. Ask for the runs scored for the recent match played by the Player.
5. Update the record of the entered player in the same file with the following details.
  1. Matches Played will be increment by one
  2. Runs - runs + runs scored
  3. highest - Check highest score with runs scored and update whichever is higher.
  4. r50 -- if runs scored is a half century then increment r50 by 1
  5. r100 -- if runs scored is a century then increment r100 by 1
  6. average -- Calculate the new average with the runs scored.
Add a new player
1. Generate a new ID for the player (ID will be the next highest ID of all the records)
2. Ask for all the details of the player from the user except ID.
3. Update the number of players in the file.
4. Append the new player details at the last of the file.
Quit

Use a loop for the Menu choice and perform the operation selected by the user. The program should quit only when the user selects quit option in the menu.
The program should open the binary file in Random Access Mode for reading and writing. Use seekg() to move the cursor position in the file.
You are not supposed to use Array of Players or Dynamic memory allocation.
You can use ID to seek in the file for the location of the record. (4Bytes+(ID*size of player)from start of file)
The data file should be passed as command line argument.
A sample program cricket_read.cpp is given.

Compile the cricket_read.cpp
Execute the program as $> cricket_read.exe in the directory where you saved the binary file.
1. Prints all the player details

C++ Dev

Search This Blog