17 March 2008

PNOI 2k8 Problems: Input for Multiple Data Sets

Problems from PNOI2008

I had not given attention to the methods of getting input for multiple data sets. This presents an additional challenge, on top of the problem itself, especially when not used to the usual methods that problems in programming competitions provide the input.

The problems in PNOI 2k8 did not try to make parsing input too difficult, using only spaces and newlines to separate data, and without additional parsing issues.

It's best to keep the loop mechanism for getting input as simple and unobtrusive, making sure that it keeps out of the way of the solution itself: this usually means that the solution should be put into a function that is called by the main function.

There were two types of data set input shown in PNOI 2k8. First, as in problems A, B, E and F, the first number in the input gives the number of data sets in the input. Generally, this is easier to implement:

int main(void) {
int numCases, currentCase;
input numCases;
for (currentCase = 1; currentCase <= numCases; currentCase ++)
solveProblem();
return 0;
}

The other type of data set input shown was the header type: basically, each data set is preceded by a header line which tells how many lines are in the data set, a trivial value (such as all zero header values) indicates the end of the input. This is seen in problems C, D and to some extent F. For problem D:

int main(void) {
int numPeople, numPairs;
for (input numPeople, numPairs; numPeople > 0 || numPairs > 0; input numPeople, numPairs)
solveProblem(numPeople, numPairs);
return 0;
}

Generally, with just numbers and separators, reading the data from the input is simply reading the next number. In problems E and F, though, strings have to be read, with different expectations: F required that a line have an alphanumeric string followed by a space, then a number, then a space and a number again; E required the whole line of characters, spaces and all, until the newline as a string.

This is where familiarity with the input schemes of the language you are using will come in handy. If you are experienced with separating a string with spaces into "words", and then converting some or all of these "words" into their proper type, such as integers, or floating-point numbers, then you can always read input line-per-line and then do the conversion: this will apply to both E and F, although the second step is not needed for E. In C, the function to use is gets() or fgets() for file input (or stdin); in C++, the function to use is getline(), which can be used with cin or a file stream; in Java, the method readLine() of a BufferedReader object does the job - you will have to import the package java.io.* and use an InputStream object - read here for more details.

For problem F, you have the added option of taking into account that there are input functions that only reads strings only up to a blank space or a newline. In C:

scanf("%s %d %d\n", name[item], &cost[item], &size[item]);

Note that the newline in the end is necessary so that the next call to this command does not read the previous line's newline character.

In C++:

cin >> name >> cost >> size;
item[currentItem] = new Item(name, cost, size);

In Java, the only available option is to read per line.

Parsing is a skill all programmers must grasp up to some extent (even those that deal with GUI need to handle text input now and then), and the more experience you have with different types of file input, the better prepared you'll be. My advice is to always test your parsing before working on the solution: make the parsing code then have the program output what it had parsed to ensure that you are getting the data correctly.

1 comment:

Chipi Buenafe said...

I would suggest that contestants (and their respective coaches) build a "template" for input parsing. This helps maximize contest time for "thinking the solution" rather than spending the time to "try and fix the input reading".

To those who have been my contemporaries back in the SEARCC ISC days, the input scheme introduced by the ACM has been quite challenge and indeed paradigm-shifting. When we started competing in the ACM in 2000, we really spent most of the time trying to parse the inputs correctly. Fortunately, at least for the UA&P ACM teams back then (in 2000 and 2001), we had one or two guys whose job is simply to code the input-reading logic into the solution. However, this puts the burden to one or two guys who will do the "thinking" portion for the contest.

At the world stage, each member should be a "thinking" member and that input-reading must be second nature by this time. Besides, the input-reading challenge is not a big deal once the "language barrier" is solved, as alluded by Mike Samson in this post.