17 March 2008

PNOI 2k8 Problems: Input for Multiple Data Sets

Problems from PNOI2008

I had not given attention to the methods of getting input for multiple data sets. This presents an additional challenge, on top of the problem itself, especially when not used to the usual methods that problems in programming competitions provide the input.

The problems in PNOI 2k8 did not try to make parsing input too difficult, using only spaces and newlines to separate data, and without additional parsing issues.

It's best to keep the loop mechanism for getting input as simple and unobtrusive, making sure that it keeps out of the way of the solution itself: this usually means that the solution should be put into a function that is called by the main function.

There were two types of data set input shown in PNOI 2k8. First, as in problems A, B, E and F, the first number in the input gives the number of data sets in the input. Generally, this is easier to implement:

int main(void) {
int numCases, currentCase;
input numCases;
for (currentCase = 1; currentCase <= numCases; currentCase ++)
solveProblem();
return 0;
}

The other type of data set input shown was the header type: basically, each data set is preceded by a header line which tells how many lines are in the data set, a trivial value (such as all zero header values) indicates the end of the input. This is seen in problems C, D and to some extent F. For problem D:

int main(void) {
int numPeople, numPairs;
for (input numPeople, numPairs; numPeople > 0 || numPairs > 0; input numPeople, numPairs)
solveProblem(numPeople, numPairs);
return 0;
}

Generally, with just numbers and separators, reading the data from the input is simply reading the next number. In problems E and F, though, strings have to be read, with different expectations: F required that a line have an alphanumeric string followed by a space, then a number, then a space and a number again; E required the whole line of characters, spaces and all, until the newline as a string.

This is where familiarity with the input schemes of the language you are using will come in handy. If you are experienced with separating a string with spaces into "words", and then converting some or all of these "words" into their proper type, such as integers, or floating-point numbers, then you can always read input line-per-line and then do the conversion: this will apply to both E and F, although the second step is not needed for E. In C, the function to use is gets() or fgets() for file input (or stdin); in C++, the function to use is getline(), which can be used with cin or a file stream; in Java, the method readLine() of a BufferedReader object does the job - you will have to import the package java.io.* and use an InputStream object - read here for more details.

For problem F, you have the added option of taking into account that there are input functions that only reads strings only up to a blank space or a newline. In C:

scanf("%s %d %d\n", name[item], &cost[item], &size[item]);

Note that the newline in the end is necessary so that the next call to this command does not read the previous line's newline character.

In C++:

cin >> name >> cost >> size;
item[currentItem] = new Item(name, cost, size);

In Java, the only available option is to read per line.

Parsing is a skill all programmers must grasp up to some extent (even those that deal with GUI need to handle text input now and then), and the more experience you have with different types of file input, the better prepared you'll be. My advice is to always test your parsing before working on the solution: make the parsing code then have the program output what it had parsed to ensure that you are getting the data correctly.

PNOI 2k8 Problem F: Shopping Spree

Problems from PNOI2008

Type: dynamic programming
Difficulty: medium-hard, tedious

Consider this the clarification-test problem: some pertinent upper limits were not given in the description: the maximum number of data sets (expected to be twenty, as the other problems); the maximum capacity of the cart; the maximum number of items. However, those not familiar with the core idea of the problem and its simplest solution would not know this, so it was more of a do-you-know-dynamic-programming problem.

Dynamic programming is actually a solution approach: you generate the solution by finding the solution to "smaller versions" of the problem you want to solve, and then build up solutions to "larger versions" from solutions to "smaller versions" until the solution to the original problem is achieved. The usual problem used to illustrate this approach is called the knapsack problem: given a knapsack of a particular capacity, and different items that each occupy a given space and have a given worth, what is the maximum worth that the knapsack can take? This particular problem has a small variation added: determine the number of each item that makes up this maximum worth.

However, the first - and more tedious problem - involves sorting the items by name. Sorting strings lexicographically is not really a trivial process: the comparison of strings is not as simple as the comparison of numbers. Add to this the difficulty that each item has three pieces of information that go with it: name, size and cost. These have to go together, as they define one item.

The best solution is to define an item structure/object (with get and set functions for each field) and defining a function to sort these items by name. The object in C++:

class Item {
String name;
int cost, size;
public:
Item(String newName, int newCost, int newSize) {
name = newName;
cost = newCost;
size = newSize;
}
String getName() {return name;}
int getCost() {return cost;}
int getSize() {return size;}
}

In cases where comparison between Strings isn't defined, here is a possible definition:

int compare(String str1, String str2) {
int index;
for (index = 0; str1.charAt(index) == str2.charAt(index); index ++);
return str2.charAt(index) - str1.charAt(index);
}

The function returns a non-zero value, the difference between the first characters where the Strings are not the same: if it is positive, str2 comes after str1 in lexicographic order; if it is negative, the reverse. This function can be used as an argument for the predefined quicksort function in C/C++, or in your own sorting method.

Then, the dynamic programming approach can be applied: in this case, the maximum cost we can get for any cart size is where we get "smaller versions":

for (cartSize = 0; cartSize <= cartCapacity; cartSize ++) {
maxCartCost[cartSize] = 0;
for (item = 0; item < numOfItems; item ++) {
if (cartSize >= items[item].getSize()) {
costSmaller = maxCartCost[cartSize - items[item].getSize()];
if (costSmaller + items[item].getCost() > maxCartCost[cartSize])
maxCartCost[cartSize] = costSmaller + items[item].getCost();
}
}
}

Basically, for a given cartSize, for each item whose size is at least cartSize we determine if the currently stored maximum cost at cartSize is larger than [the maximum cost when the size of the cart was [cartSize minus the size of the item]] plus [the cost of that item]: if the sum is larger, the sum becomes the new value for the maximum cost at cartSize. After this loop ends, maxCartCost[cartCapacity] should have the maximum cost that can be taken in a cart that has capacity cartCapacity.

Now we have the total needed, but what were the items that composed the shopping cart to get that total? Here we "backtrack": using the same method to stock the shopping cart, we try to remove items one by one, until the cart is empty, as in:

for (item = 0; item < numOfItems; item ++)
counts[item] = 0;
cartSize = cartCapacity;
while (cartSize > 0) {
for (item = 0; item < numOfItems; item ++) {
costSmaller = maxCartCost[cartSize - items[item].getSize()];
if (costSmaller + items[item].getCost() == maxCartCost[cartSize])
break;
}
counts[item] ++;
cartSize -= items[item].getSize();
}

PNOI 2k8 Problem E: Text Messages

Problems from PNOI2008

Type: string
Difficulty: easy-medium, tedious

This problem was probably overlooked due to its position: since the easiest problems came earlier, and some harder problems came before this, it can be supposed that this was assumed to be harder than on first blush. The tediousness in this problem is in the encoding of the key-press map, whereas in problem C it is in the output formatting.

The key idea here is not only to be able to translate characters into key-presses, but to be able to determine which sequences of characters require additional key-presses. Dr. Pablo Manalastas, head judge of PNOI 2k8, and whose daughter suggested this problem, suggests the use of an array of structures, each structure storing the number of the button to press and the number of times that button has to be pressed to generate the character. My own approach is a little more convoluted, but only relies on a reference string and some mathematical overhead.

To determine both the button and the button-press multiplicity, we can use a function that returns a number that stores both pieces of information:

int button_and_multiplicity(char ch) {
return index of position of ch in "abc2ghi3jkl4mno5pqrstuv8wxyz "
}

The number returned is [(B - 2) * 4] + (P - 1) where B is the number of the button to press and P is the number of times the button is pressed, noting that the function returns 0 for a. The numbers are there to space the characters properly, since only s and z require four key-presses.

Then, the process is simply to read through every character and adding the number of key-presses, noting that if the current character and the one before it are on the same button, an additional key-press is required:

prevChar = -1;
pressCount = 0;
for (index = 0; index < length(message); index ++) {
thisChar = button_and_multiplicity(charAt(message, index));
if (prevChar / 4 == thisChar / 4)
pressCount ++;
pressCount += (thisChar % 4) + 1;
prevChar = thisChar;
}

prevChar is initially set to -1, since the first character does not compare with any character before it; the last statement makes the current character-value become the value of the previous character, for the next comparison. When dividing integers, the result is an integer, without the decimals, if there are any. That is why dividing the result of the function gives the number of the button that is pressed minus two.

This approach may not be obvious, but experience will determine if you can use it and how to set up such a reference string and then such a function.

PNOI 2k8 Problem D: How Many Groups?

Problems from PNOI2008

Type: graph, set
Difficulty: medium

Problem D disguises the problem of determining the number of components of a graph.

A graph is a structure composed of nodes/vertices and edges. A node or vertex can be considered a point and an (unordered) edge connects two nodes; the edge is said to be incident on those nodes. In the problem, the persons can be represented by nodes, and there is an edge connecting two nodes when the persons they represent know each other.

A component of a graph is a portion of the graph consisting of nodes and edges, where all the edges incident to the nodes is included, and that between any two nodes in that portion, there is a sequence of edges that can be traced between the two nodes.

In the language of the problem, two persons A and B are in the same component if there is a sequence of people (X1, X2, ... Xn) such that A knows X1, X1 knows X2, and so on, and Xn knows B. This is another way to describe the groups given in the problem.

There are several graph-oriented methods that can be used to solve this problem, but another way to look at this problem is in terms of sets.

Consider each person to be an element of a set, and that each person is the only element of that person's set. When a person knows another person, their sets are joined (unless they are in the same set already). When all the knowledge relations have been considered, the sets that remain are the groups in the problem.

It isn't natural to store sets or graphs as data structures in the computer languages given in the contest, so some way to store them is necessary. We can determine who is and who isn't in a set using a two-dimensional array. Since each person starts off in a set of their own, there are as many sets as persons at the start, so we initialize the array:

for (i = 1; i <= persons; i ++)
for (j = 1; j <= persons; j ++)
if (i == j)
groups[i][j] = TRUE;
else
groups[i][j] = FALSE;

For the array, the first index is the number of the group/set, the second is the number of the person.

Then, as each pair is input, we update the sets; if the persons in the pair are from different groups, we move everyone from the group where the second person belongs to the group that the first person belongs:

for (index = 1; index <= pairs; index ++) {
input person1, person2;
group1 = belongs_to_group(person1);
group2 = belongs_to_group(person2);
if (group1 == group2)
continue;
for (person = 1; person <= persons; person ++) {
if (groups[group2][person]) {
groups[group2][person] = FALSE;
groups[group1][person] = TRUE;
}
}
}

You may use a function or an array to determine to which group a person belongs; for example, if the array is global:

int belongs_to_group(int person) {
int group = 1;
while (!groups[group][person])
group ++;
return group;
}

Then you may count the number of groups that remain after all the pairs have been input. Note that you will be begin with the number of persons as the number of groups, and each time a pair is composed of two persons from different groups, the number of groups decreases by one, since their groups will merge to become one group.

PNOI 2k8 Problem C: Checking Collinearity

Problems from PNOI2008

Type: geometry
Difficulty: easy to medium, tedious

This problem was easier before editing: I suggested to keep all the coefficients as integers, so that there would be less problems with the precision of floating-point numbers. It would have been much more straightforward if the lines were given in standard form (i.e., the form y = Mx + B) instead of general form (i.e., the form Ax + By + C = 0), even if that required a special case for vertical lines, which have the form x = C. This also added the process of removing the common factors of A, B and C, and the output format considerations.

In geometry, we find that two points can determine a line, and a third point is collinear with the first two if it falls on the determined line. We get the first two points (ax, ay) and (bx, by) and determine the equation of the line that they determine.

In standard form, M = (by - ay) / (bx - ax) is the slope of the line, and B is the y-intercept of the line (and not the same B in the general form). We can determine the general form of the equation of a line from its standard form: y = Mx + B has general form Mx - y + B = 0. To make both coefficients integers, we multiply both sides by the denominator of M, so, using the formula for the slope above, we have(by - ay)x - (bx - ax)y + B(bx - ax) = (by - ay)x + (ax - bx)y + C = 0. The value of C is determined by plugging in a point of the line, say (ax, ay) into the equation, i.e. C = -(A * ax + B * ay) = (ay - by)ax + (bx - ax)ay. We choose, however, to remove the common factors of A and B before determining C.

First, to remove the common factors of A and B, we have to determine their greatest common factor (GCF), so we can divide them both by their GCF and they will have no common factors, i.e. be relatively prime.

Euclid, father of geometry, also has an efficient algorithm that determines the GCF of two positive numbers, a and b. He noted that the GCF of a and b is also the GCF of a and the remainder when dividing b by a. Noting that the GCF of 0 and any nonzero number is the number, Euclid's algorithm can be implemented in a function:

int gcf(int a, int b) {
if (a < b)
swap a, b;
while (b > 0) {
a %= b;
swap a, b;
}
return a;
}

We can now find the equation of the line determined by the first two points in expected format:

if (by < ay) {
A = ay - by;
B = bx - ax;
} else {
A = by - ay;
B = ax - bx;
}
if (B < 0)
signB = -1;
else
signB = 1;
temp = gcf(A, signB * B);
A /= temp;
B /= temp;
if (A == 0)
B = 1;
C = -(A * ax + B * ay);

signB is used because the GCF function can only be used on positive integers. It is possible that the formula will determine A = 0 and B < 0, after dividing both by their GCF, we'll have B = -1. For purposes of the formatting, we'll set B = 1.

Also note that if we didn't use temp to store the GCF of A and B and instead immediately divided A by the GCF, we would no longer find the common factors with which B needs to be divided.

Then we determine if the other points are on this line with the use of flag:

flag = TRUE;
while (index = 3; index <= points; index ++) {
input cx, cy;
if (flag && (A * cx + B * cy + C != 0))
flag = FALSE;
}

The other difficulty is in the output of the equation of the line when the points given are collinear. Several conditions are to be considered, as shown below:

if (A > 1)
output A;
if (A > 0)
output "x";
if (B > 0)
output "+";
if (B == -1)
output "-";
if (B < -1 || B > 1)
output B;
if (B != 0)
output "y";
if (C > 0)
output "+";
if (C != 0)
output C
output "=0";

PNOI 2k8 Problem B: Rina's Triangles

Problems from PNOI2008

Type: mathematics, data structure
Difficulty: easy

Rina, in the title, refers to Rina Caballar, a former co-worker of mine and current co-worker of Dr. Pablo Manalastas, head judge at PNOI 2k8, who provided this problem. This problem was originally given in last year's Javacup, which is an annual collegiate two-man team Java programming competition hosted by CURSOR, a UP Computer Science organization in which I have had the privilege of previously worked; they were last year's judges.

Dr. Felix Muga II, another judge at PNOI 2k8, was the one who first noted that the triangles resembled Pascal's triangle, though the operation was subtraction instead of addition, and that the number of elements in each "succeeding" row decreased by one instead of increased. Keeping the visual, it probably works better imagining that the triangle pointed downward instead of upward.

89 78 12 13
11 66 1
55 65
10

The solution is to generate the triangle and compare all pairs of numbers to determine if any two of them are the same. This can be done with one array, and it would be easy to do the comparison that way, but I'll try to maintain the triangle shape in a two-dimensional array, where generating the triangle is easier.

for (i = 1; i < 4; i ++)
for (j = 0; j < 4 - i; j ++)
if (tri[i - 1][j] > tri[i - 1][j + 1])
tri[i][j] = tri[i - 1][j] - tri[i - 1][j + 1];
else
tri[i][j] = tri[i - 1][j + 1] - tri[i - 1][j];

The trick is to compare all possible pairs of elements of the triangle: what you basically want is a way to keep track of one element while going through all the other elements "after" it, comparing them, then going to the "next" element. This is why an array would make it easier to do: "after" and "next" are simply the elements whose indices are larger. We try to simulate that here:

flag = TRUE;
for (i = 0; flag && i < 4; i ++) {
for (j = 0; flag && j < 4 - i; j ++) {
for (m = i; flag && m < 4; m ++) {
if (m == i)
n = j + 1;
else
n = 0;
while (flag && n < 4 - m)
if (tri[i][j] == tri[n][m])
flag = FALSE;
else
n ++;
}
}
}

Note that flag is used to determine whether or not two elements have been found equal. flag does not need to be a Boolean (truth) variable: flag can be numeric, 1 and 0 can replace TRUE and FALSE, and the comparisons changed accordingly. When two equal elements are found, all the loops will terminate, so no additional comparisons will be made.

PNOI 2k8 Problem A: A Simple Task

Problems from PNOI2008

Type: mathematics
Difficulty: easy

The first thing that came to mind in looking at this problem was "prime factorization".

Every positive integer has a unique prime factorization, and there is an algorithm for outputting the prime factors of a number n:

p = 2;
while (p * p <= n) {
if (n % p == 0) {
output p;
n /= p;
} else p ++;
}

Simply put, if the current prime number being considered divides the number n, divide n by p (and output p), otherwise go to the next prime number, until what remains is prime.

There are some details of the loop that need noting: p does go through non-prime numbers, like 4, but, at that point in time (what is left of) n cannot be divisible by 4, otherwise, it would have been divisible by 2, and then divided by 2 before p becomes 4; the loop condition basically ensures that, if it is not met, the value in n is a prime value, but how?

Those familiar with the Sieve of Eratosthenes know that you can determine the primes in the range of 2 to n by marking the lowest uncrossed number as prime, crossing out all its multiples, then repeating until no more values can be crossed out. Experience will tell you that once you cross out the multiples of the largest prime less than or equal to the square root of n, the remaining uncrossed numbers are all prime.

This is because if one of the numbers between the square root of n and n was composite, say c, it would have to be a multiple of a number also in the range, say p < c, otherwise it would have been crossed out. But c has to be at least p * p, and p * p is greater than n, so that's not possible.

That explains the prime factorization algorithm, but how do you solve problem A?

We simply divide n by 2 as many times as possible, counting: whatever remains is odd.

p = 0;
while (n % 2 == 0) {
p ++;
n /= 2;
}
output n, p;

15 March 2008

Philippine National Olympiad in Informatics 2008

And, for my next post, another blow-by-blow of a programming competition I am a judge at: PNOI 2008. Here at Faura Hall in AdMU, basically checking internet access. Will see if I can actually have a blow by blow. If not, join me here for the summary.

Edit: Boo, they shut down the internet throughout our side to ensure lack of net-access in the venue. Ten contestants, winner got problems A and B (trying D) and runner-up getting A (trying B and D). One more contestant tried A.

Edit: Forgot to mention technical problems causing late submissions. Fourth contestant tried A and B.

Will do a rundown of problems in separate posts. Stay tuned.