17 March 2008

PNOI 2k8 Problem D: How Many Groups?

Problems from PNOI2008

Type: graph, set
Difficulty: medium

Problem D disguises the problem of determining the number of components of a graph.

A graph is a structure composed of nodes/vertices and edges. A node or vertex can be considered a point and an (unordered) edge connects two nodes; the edge is said to be incident on those nodes. In the problem, the persons can be represented by nodes, and there is an edge connecting two nodes when the persons they represent know each other.

A component of a graph is a portion of the graph consisting of nodes and edges, where all the edges incident to the nodes is included, and that between any two nodes in that portion, there is a sequence of edges that can be traced between the two nodes.

In the language of the problem, two persons A and B are in the same component if there is a sequence of people (X1, X2, ... Xn) such that A knows X1, X1 knows X2, and so on, and Xn knows B. This is another way to describe the groups given in the problem.

There are several graph-oriented methods that can be used to solve this problem, but another way to look at this problem is in terms of sets.

Consider each person to be an element of a set, and that each person is the only element of that person's set. When a person knows another person, their sets are joined (unless they are in the same set already). When all the knowledge relations have been considered, the sets that remain are the groups in the problem.

It isn't natural to store sets or graphs as data structures in the computer languages given in the contest, so some way to store them is necessary. We can determine who is and who isn't in a set using a two-dimensional array. Since each person starts off in a set of their own, there are as many sets as persons at the start, so we initialize the array:

for (i = 1; i <= persons; i ++)
for (j = 1; j <= persons; j ++)
if (i == j)
groups[i][j] = TRUE;
else
groups[i][j] = FALSE;

For the array, the first index is the number of the group/set, the second is the number of the person.

Then, as each pair is input, we update the sets; if the persons in the pair are from different groups, we move everyone from the group where the second person belongs to the group that the first person belongs:

for (index = 1; index <= pairs; index ++) {
input person1, person2;
group1 = belongs_to_group(person1);
group2 = belongs_to_group(person2);
if (group1 == group2)
continue;
for (person = 1; person <= persons; person ++) {
if (groups[group2][person]) {
groups[group2][person] = FALSE;
groups[group1][person] = TRUE;
}
}
}

You may use a function or an array to determine to which group a person belongs; for example, if the array is global:

int belongs_to_group(int person) {
int group = 1;
while (!groups[group][person])
group ++;
return group;
}

Then you may count the number of groups that remain after all the pairs have been input. Note that you will be begin with the number of persons as the number of groups, and each time a pair is composed of two persons from different groups, the number of groups decreases by one, since their groups will merge to become one group.

4 comments:

Chipi Buenafe said...

For this problem, I think you can develop another solution. I haven't gone around to check if this alternative is faster or simpler [or even correct] since I haven't implemented this, but here's my solution --->

Procedure Initialization:
Let group[i] indicate the group number of person i. Initialize group[] to zero.
Let ctr be the "current group number", initialized at zero.
Let cnt be the "final count of groups", initialized at zero.

Procedure GroupAssign:
**Begin loop for the sorted pairs
Let a be the first person
Let b be the second person
Now do this check ...
1. If whichgroup[a] and whichgroup[b] is equal to zero, this means that neither person has been assigned to a group, so ctr++, cnt++ and assign ctr to whichgroup[a] and whichgroup[b].
2. If whichgroup[a] > 0 and whichgroup[b] = 0, then person b should be part of the group that person a belongs, thus whichgroup[b] = whichgroup[a].
3. If whichgroup[b] > 0 and whichgroup[a] = 0, same line of thinking as #2, thus whichgroup[a] = whichgroup[b]
4. If whichgroup[a] and whichgroup[b] are both >0 but not equal to each other, there is "double assignment" so we correct that with a Procedure Correct.
**End loop for the sorted pairs

Procedure Correct:
Given i,j as the persons to correct their grouping since whichgroup[i]<>whichgroup[j] and whichgroup[i],whichgroup[j]>0.
Let reassign be the lesser of whichgroup[i] and whichgroup[j].
Let elim be the greater of whichgroup[i] and whichgroup[j].
**For k=1 to Persons
If whichgroup[k]==elim, then whichgroup[k]=reassign
**Next k
cnt--

At the end of the GroupAssign process, cnt counts the number of groups.


Analysis --->

The advantage here is that you're using a one-dimension array instead of the Boolean matrix that you've presented. The whichgroup[] array is analogous to what belongs_to_group() intends to do and you save on the loop.

Also, in your solution, your correction process is more of the norm than the exception since you're assuming that initially, all persons belong to a group (they're on their own!) whereas Procedure Correct will only execute as an exception since we're assuming that they don't have a group and will adjust already if the other element in the pair already belongs to a group. As such, your correction process will run for more number of times.

Unknown said...

Hmm, you do have a point: the correction process will probably take more time when the people are pre-assigned groups. In my original solution, I actually did use an array for belongs_to_group() and I do agree that that is a simpler solution.

The only thing lacking is counting the number of people that aren't in groups yet. So at the end, there should be a loop that counts the number of people whose whichgroup[] is zero, as each of these is a group by him/herself, and add that value to cnt.

Thanks! This should run faster because it doesn't need a two-dimensional array. I was wondering how to do it without one. :)

Chipi Buenafe said...

Oops! You got it -- I didn't include those numbers that were not explicitly listed in the pairs but are implicitly included based on the number of persons.

Thanks! A one-off loop will cover that.

Or you can do a backward-induction type of logic.

Start with cnt=number of people
If condition #1, #2, #3 or #4 happens, cnt--.

Now I'm not sure if this logic will double-count (and cause cnt to be negative), but I think you can still avoid using another loop (even if it's only one-off) with this style.

Unknown said...

I think that back counting will work: that is effectively what I used in my solution. Cases 1 through 4 cover everything and won't cause double-counting.