University of Leeds

Reading CSV files

Introduction

As well as writing programs that perform calculations and write the output data to a .csv file, it is often required to read the data contained in a .csv file. Often this is when the data has been created by a separate program or has been downloaded from the Internet. You may then have to write a program to process that data.

Data

In this example, we will write a program to process some fictitious data. This fictitious data is student grades. The data (in a .csv file) consists of many rows, each representing a student. In each row, there is a column for the surname, student ID number and then four columns representing the grades (on a 0 to 100 scale) for 4 separate tests. The data has been created using Mockaroo.

MOCK_DATA.csv

Cohalan,771298484,67,51,74,51
Vallack,615278957,45,57,79,49
Bungey,319585891,42,37,57,86
Balden,019478797,80,90,79,26
Probet,701116170,58,35,76,98
Baigent,998970542,86,61,63,65
Quarton,971939596,58,40,100,91
Witt,054978458,59,99,72,46
Burnand,283808017,59,92,51,11
Rawsthorn,814535776,63,50,62,100
Vasyatkin,925395193,84,73,85,57
Crossby,427723213,33,56,100,100
Hanham,597141409,74,30,68,60
Scorthorne,720102259,58,33,100,67
Evans,473955894,93,96,92,89
O'Bee,058336708,44,11,66,44
Lewsam,623498290,60,58,72,74
Skim,412729216,70,51,88,84
Feehely,073095508,49,50,82,100
Brandassi,027384415,77,59,93,51
Umbert,058510748,73,100,78,58
Lygo,140824408,55,75,78,85
Schulkins,200434642,54,52,65,37
Scanlon,962865222,39,49,55,85
Allridge,115735549,75,52,70,27
Batchan,333316640,71,39,94,54
Purveys,583193483,46,74,59,66
Shemilt,374260199,58,88,85,32
Conant,640666260,39,51,89,64
Botright,292406035,77,27,82,95
Schulkins,152835084,60,22,70,45
Domke,624409539,61,71,48,51
Elies,634033768,77,100,56,100
Larham,145602942,63,32,81,29
Vacher,498955516,72,14,73,21
Di Filippo,711731193,65,65,89,64
Blench,496815213,40,24,73,79
Goodread,010007942,50,42,66,56
Minchin,988677183,94,14,69,8
Orrock,509070758,69,54,85,69
Wyles,154625310,49,25,86,48
Tookill,193475472,23,62,53,28
Wimmer,853474046,45,37,83,97
Spira,979696276,14,54,93,61
Felix,875835071,63,2,71,81
Reckhouse,148144410,71,88,77,59
Le Breton De La Vieuville,558486912,47,41,81,57
Horrigan,812230116,45,52,80,64
Le Port,014113685,67,34,91,83
Spitell,043792447,40,88,83,79
Dils,538529529,84,88,55,48
Goodge,602790717,100,50,67,39
Vern,235405979,95,51,62,40
Death,436211337,62,2,93,20
Giacomello,790772359,57,18,80,68
McAvey,843110330,61,17,53,74
Sevitt,711438118,61,40,69,76
Utridge,013821354,69,51,74,56
Calleja,304154084,82,69,72,74
Barenskie,545937322,57,72,70,75
Beckingham,380600744,61,82,61,33
Coggen,520808387,63,0,100,61
Christene,005768811,74,38,100,53
Hodgen,248294860,81,18,73,87
Bisco,203249260,43,45,54,68
Woodruff,721092637,65,68,76,90
Binham,298643920,73,60,76,49
Ivasechko,199652751,31,16,78,92
Gecke,019609957,57,29,53,34
Izakof,347786735,57,14,48,9
Pringour,210495355,71,11,100,74
Burgum,066120460,32,67,93,76
Thomel,827027608,73,54,52,42
Doldon,700489818,69,73,71,58
Benyon,026302393,76,86,89,80
Norvel,010489034,45,52,68,65
Mullard,111450702,85,56,92,63
Phelp,395748563,62,60,57,84
Androli,746030914,55,64,100,71
Vinall,758604004,86,60,60,27

Example

The code to read in the .csv file is shown below.

main.cpp

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
#include <fstream>
#include <iostream>
#include <string>

// struct to hold student data
struct Student {
  std::string surname;
  int sid;         // student ID number
  int test1;       // test 1 grade (0 to 100)
  int test2;       // test 2 grade (0 to 100)
  int test3;       // test 3 grade (0 to 100)
  int test4;       // test 4 grade (0 to 100)
  double average;  // average of all tests (equal weight)
};

// function prototypes
int count_lines();
void read_into_array(Student array[], int n);
void print_array(const Student array[], int n);

int main() {
  // count the number of lines in the CSV
  int n = count_lines();
  std::cout << n << " lines read." << std::endl;
  // then create dynamic array of correct size
  Student *students = new Student[n];
  // read the CSV file into the array
  read_into_array(students, n);
  // and then print for debug purposes
  print_array(students, n);
}

int count_lines() {
  // create an input file stream
  std::ifstream input;
  // use it to open a file named 'MOCK_DATA.csv'
  input.open("MOCK_DATA.csv");
  // check if the file is not open
  if (!input.is_open()) {
    // print error message and quit if a problem occurred
    std::cerr << "Error! No input file found!\n";
    exit(1);
  }
  int n = 0;
  std::string dummy;
  // keep reading lines in file until no lines left to read
  // read into dummy string and increment count
  while (getline(input, dummy)) {
    n++;
  }
  return n;
}

void read_into_array(Student array[], int n) {
  // create an input file stream
  std::ifstream input;
  // use it to open a file named 'MOCK_DATA.csv'
  input.open("MOCK_DATA.csv");
  // check if the file is not open
  if (!input.is_open()) {
    // print error message and quit if a problem occurred
    std::cerr << "Error! No input file found!\n";
    exit(1);
  }
  std::string dummy;
  // loop through each line in file
  for (int i = 0; i < n; i++) {
    getline(input, dummy, ',');  // read until first comma
    array[i].surname = dummy;    // write to array
    getline(input, dummy, ',');  // read until next comma
    array[i].sid = std::stoi(dummy);
    getline(input, dummy, ',');  // read until next comma
    array[i].test1 = std::stoi(dummy);
    getline(input, dummy, ',');  // read until next comma
    array[i].test2 = std::stoi(dummy);
    getline(input, dummy, ',');  // read until next comma
    array[i].test3 = std::stoi(dummy);
    getline(input, dummy);  // for the last element, read until
    // end of line (default)
    array[i].test4 = std::stoi(dummy);
  }
}

void print_array(const Student array[], int n) {
  // just loop through array and print to terminal
  for (int i = 0; i < n; i++) {
    std::cout << array[i].surname << " | " << array[i].sid << " | "
              << array[i].test1 << " | " << array[i].test2 << " | "
              << array[i].test3 << " | " << array[i].test4 << std::endl;
  }
}

A struct has been defined that can be used to store the data for each student. A more object-oriented approach could have been to create a relevant class. The main() function is relatively simple, the number of lines in the data file are counted and a dynamic array of the struct-type is created to hold the data. This array is then passed into a function and the data from the .csv file read into the array. Finally, the array is passed into a print function so that it can be printed to the command line.

To count the number of lines, an input file stream is created and the .csv file opened. The getline() function is then used inside a while loop. getline() reads a line from the input stream into a string. By default it reads until it finds a newline (\n) character. At this point, we are not interested in the content of the line, so it is just read into a dummy string and a value incremented on each loop. At the end of the file, this value will be equal to the number of lines in the file.

Once the dynamic array of the required file has been created, the data in the .csv file is read into the array. The code loops through each line in the file. By default, getline() reads until a newline character is found. However, for CSV data, we wish to read until we find a comma.

getline(input, dummy, ',');

The variables are read into a dummy string and converted to integers when required using std::stoi(). Note that for the last element, we want to read to the end of the line i.e. a newline character and not a comma.

Now the data is in an array, it can be iterated over to analyse and process. It can also be trivially printed to the terminal.

Output

If the above example is run, the following will appear in the terminal.

main.cpp

80 lines read.
Cohalan | 771298484 | 67 | 51 | 74 | 51
Vallack | 615278957 | 45 | 57 | 79 | 49
Bungey | 319585891 | 42 | 37 | 57 | 86
Balden | 19478797 | 80 | 90 | 79 | 26
Probet | 701116170 | 58 | 35 | 76 | 98
Baigent | 998970542 | 86 | 61 | 63 | 65
Quarton | 971939596 | 58 | 40 | 100 | 91
Witt | 54978458 | 59 | 99 | 72 | 46
Burnand | 283808017 | 59 | 92 | 51 | 11
Rawsthorn | 814535776 | 63 | 50 | 62 | 100
Vasyatkin | 925395193 | 84 | 73 | 85 | 57
Crossby | 427723213 | 33 | 56 | 100 | 100
Hanham | 597141409 | 74 | 30 | 68 | 60
Scorthorne | 720102259 | 58 | 33 | 100 | 67
Evans | 473955894 | 93 | 96 | 92 | 89
O'Bee | 58336708 | 44 | 11 | 66 | 44
Lewsam | 623498290 | 60 | 58 | 72 | 74
Skim | 412729216 | 70 | 51 | 88 | 84
Feehely | 73095508 | 49 | 50 | 82 | 100
Brandassi | 27384415 | 77 | 59 | 93 | 51
Umbert | 58510748 | 73 | 100 | 78 | 58
Lygo | 140824408 | 55 | 75 | 78 | 85
Schulkins | 200434642 | 54 | 52 | 65 | 37
Scanlon | 962865222 | 39 | 49 | 55 | 85
Allridge | 115735549 | 75 | 52 | 70 | 27
Batchan | 333316640 | 71 | 39 | 94 | 54
Purveys | 583193483 | 46 | 74 | 59 | 66
Shemilt | 374260199 | 58 | 88 | 85 | 32
Conant | 640666260 | 39 | 51 | 89 | 64
Botright | 292406035 | 77 | 27 | 82 | 95
Schulkins | 152835084 | 60 | 22 | 70 | 45
Domke | 624409539 | 61 | 71 | 48 | 51
Elies | 634033768 | 77 | 100 | 56 | 100
Larham | 145602942 | 63 | 32 | 81 | 29
Vacher | 498955516 | 72 | 14 | 73 | 21
Di Filippo | 711731193 | 65 | 65 | 89 | 64
Blench | 496815213 | 40 | 24 | 73 | 79
Goodread | 10007942 | 50 | 42 | 66 | 56
Minchin | 988677183 | 94 | 14 | 69 | 8
Orrock | 509070758 | 69 | 54 | 85 | 69
Wyles | 154625310 | 49 | 25 | 86 | 48
Tookill | 193475472 | 23 | 62 | 53 | 28
Wimmer | 853474046 | 45 | 37 | 83 | 97
Spira | 979696276 | 14 | 54 | 93 | 61
Felix | 875835071 | 63 | 2 | 71 | 81
Reckhouse | 148144410 | 71 | 88 | 77 | 59
Le Breton De La Vieuville | 558486912 | 47 | 41 | 81 | 57
Horrigan | 812230116 | 45 | 52 | 80 | 64
Le Port | 14113685 | 67 | 34 | 91 | 83
Spitell | 43792447 | 40 | 88 | 83 | 79
Dils | 538529529 | 84 | 88 | 55 | 48
Goodge | 602790717 | 100 | 50 | 67 | 39
Vern | 235405979 | 95 | 51 | 62 | 40
Death | 436211337 | 62 | 2 | 93 | 20
Giacomello | 790772359 | 57 | 18 | 80 | 68
McAvey | 843110330 | 61 | 17 | 53 | 74
Sevitt | 711438118 | 61 | 40 | 69 | 76
Utridge | 13821354 | 69 | 51 | 74 | 56
Calleja | 304154084 | 82 | 69 | 72 | 74
Barenskie | 545937322 | 57 | 72 | 70 | 75
Beckingham | 380600744 | 61 | 82 | 61 | 33
Coggen | 520808387 | 63 | 0 | 100 | 61
Christene | 5768811 | 74 | 38 | 100 | 53
Hodgen | 248294860 | 81 | 18 | 73 | 87
Bisco | 203249260 | 43 | 45 | 54 | 68
Woodruff | 721092637 | 65 | 68 | 76 | 90
Binham | 298643920 | 73 | 60 | 76 | 49
Ivasechko | 199652751 | 31 | 16 | 78 | 92
Gecke | 19609957 | 57 | 29 | 53 | 34
Izakof | 347786735 | 57 | 14 | 48 | 9
Pringour | 210495355 | 71 | 11 | 100 | 74
Burgum | 66120460 | 32 | 67 | 93 | 76
Thomel | 827027608 | 73 | 54 | 52 | 42
Doldon | 700489818 | 69 | 73 | 71 | 58
Benyon | 26302393 | 76 | 86 | 89 | 80
Norvel | 10489034 | 45 | 52 | 68 | 65
Mullard | 111450702 | 85 | 56 | 92 | 63
Phelp | 395748563 | 62 | 60 | 57 | 84
Androli | 746030914 | 55 | 64 | 100 | 71
Vinall | 758604004 | 86 | 60 | 60 | 27