Algorithm Design and Analysis

# FSA Part IV: Concoction of design-strategies

## Kindly note that I have moved my website and blog and I will no longer be active here. Please visit my new website.

Again, this post is meant specifically for undergraduate students (first timers into automata theory) – a very brief introduction of FSA: Finite State Automaton (Automata when plural). This post is the third of four parts: the first part covers an elementary introduction to FSA and DFA (Deterministic Finite Automaton), the second part encompasses some common examples illustrating design-strategies for constructing DFA, the third part provides an introduction to NFA (Nondeterministic Finite Automaton) and its equivalence with DFA, and this part introduces the Pumping Lemma and its usage in the context of FSA.

This part delves only into the application of the pumping lemma to prove that a language is NOT regular. It does not cover the properties of Regular languages or inter-conversion of FSA and the regular expressions.

# Regular Languages:

Recall that a language is simply a set of sequences (or strings) over an alphabet. This set can be finite or infinite making the language finite or infinite accordingly. For instance, let L1 = {“00”, “01”}, L2 = {0k |0 ≤ k ≤ 4}, and L3 = {0k | k ≥ 0} be three languages over an alphabet Σ = {0, 1}. Here, L1 is a set of only two strings — “00” and “01”; L2 is a set of five strings — ε (empty string when k = 0), “0” (when k=1), “00” (when k=2), , “000” (when k=3), , “0000” (when k=4);   and L3 is a set of infinite strings i.e. {ε, “0”, “00”, “000”, “0000”, ….}. Consequently, L1 and L2 are finite languages whereas L3 is infinite.

There are different criteria based on which languages can be classified. The class of languages that can be expressed by finite automata (recall that DFA and NFA are equivalent in terms of their expressive power, see Part III), is called Regular. Simply, a language L is regular if an automaton M can be constructed which accepts all sequences of L and rejects all other sequences (i.e. those in its complement LC).

Now the question arises – How to identify whether a given language L is regular or not? If we can construct a DFA for it, it definitely is regular. What if we can not build a DFA? Does it mean that no DFA is possible for L implying that L is not regular? Or is it the case that a DFA may be  possible but we do not know how to define it. This is where Pumping Lemma comes to our rescue.

It is to make explicit that a finite language can always be expressed by a finite automaton. It may be practically an unwieldy FSA with scores of states, still theoretically it is possible. Resultantly, every finite language is a regular language.

# Pumping Lemma:

The pumping lemma specifies a vital property exhibited by a regular language. There are many variants, correspondingly various formal statements, of the pumping lemma. The one we will use is as follows:

If $L$ is an infinite regular language, then there exists some $x$, $y$, $z$ in $\Sigma^*$ such that
$y \neq \epsilon$
$xy^nz \in L \text{ for } n \geq 0$

Note that this definition takes into account only an infinite language which is regular. If that is the case then there exists some string x, y, and z over the alphabet of the language such that y is not empty and all the strings of the form x.z, x.y.z, x.y2.z (which is same as x.y.y.z), x.y3.z (or x.y.y.y.z) , and so on are in that language. Recall that “.” (dot) represents concatenation.

The important point to understand here is that if L is regular and infinite language then there will be some string x, y, z such that we can keep pumping (increasing power of) y (which is non-empty) in x.y.z so that all such strings obtained after pumping will be in L. That’s it. It doesn’t say anything at all about how to find those x, y, and z or what if we have found them; having found some x, y, z that satisfy these conditions does not mean that the language is regular. In other words, the converse of this statement is not true.

Let’s take an example to understand what the statement really implies. Consider a language L over alphabet {0, 1} such that $L = \{x | x \in \Sigma^* \text{ and } |x| \text{ is odd} \}$ i.e. L is a language of all strings of 0s and 1s whose length is odd (L = {“0”, “1”,”111″,”000″,”001″,”010″, …. }). It is an infinite language and is regular. Following is its automaton (refer to Post II to see how we can construct it):

There may be multiple x, y, z that satisfy the pumping lemma for L (which is an infinite regular language): one such tuple is x= “0”, y=”01″, z=”00″. Here, x.z (0.00 = “000”) is in L (as it has length 3); so is x.y.z (0.01.00 = “00100”) with length 5; and so is x.y2.z (0.01.01.00 = “0010100”) with length 7; and it goes on. In fact, any x and y such that there combined length is odd and a y whose length is even will satisfy the pumping lemma for this language — we can keep pumping y of even length => resulting pumped up y will be of even length, but |x| + |z| is odd => total length of xynz is odd (even + odd = odd).

## Application –

Pumping lemma simply states a property of a regular language; it does not claim that if that property is exhibited by some language then that language will be regular. So, the property (as noted in the pumping lemma) can not be used to prove the regularity of a language. However, if we can prove that no x, y, z fulfilling the property can exist for a language, that would mean that the language is not regular. Therefore, the pumping lemma finds its application in proving that a language is non-regular.

There are two approaches to attack the problem of proving a language is non-regular using the pumping lemma; I call them direct and indirect. The direct approach is used when the sequence in the language has some shape (or structure or pattern); whereas, the indirect approach comes handy where no such structure in the sequences can be identified.

Nevertheless, both the approaches work on the lines of proof by contradiction. We begin by assuming that the given infinite language, say L, is regular. In the direct approach, we prove that no x,y, z over $\Sigma^*$ are possible using which we can pump up y (non-empty) and have the pumped up string in L. It is in contradiction to Pumping Lemma which guarantees that some such x, y, z do exist if L is regular, which in turn forces us to conclude that our assumption is wrong. Thus, we prove L is non-regular.

Likewise in the indirect approach, we begin by making an assumption that the infinite language L is regular. Then, we try to use the following property of regular languages (it’s is called closure under the intersection):

If L1 and L2 are two regular languages then so is their intersection (ie. L1 ∩ L2 will be regular).

How do we proceed now? We have already assumed that L is regular. But L does not have a structure. So we try to come up with another regular language say L’, which has some kind of structure. When we take their intersection, L” = L ∩ L’ will adopt the same structure. Since, L” has some structure, we can use Pumping lemma to prove it is non-regular (making use of the direct approach on it). But following the above property, the language L” ( = L ∩ L’) must be regular. What went wrong? The answer is our assumption that L is regular. L” is not regular and L’ is definitely regular. It only means that L can not be regular.

# Examples:

## Direct Approach –

### Language 0i1i

Problem 1: Let $L$ be a language over an alphabet $\Sigma = \{0, 1\}$, such that $L = \{0^i1^i | i \geq 1\}$. Is $L$ regular?

#### Example

Here $L$ = {“01”, “0011”, “000111”, … }. $L$ is an infinite language which is a set of sequences consisting of a certain number of 0s followed by the same number of 1s. Note that the number itself is a variable. But L has a structure: All zeroes in the beginning, followed by all 1s.

A rule of thumb: An FSA can not make an arbitrary number of comparisons. Any language that needs comparing a variable number of symbols, will not be regular.

#### Proof

$L$ is infinite. Let us assume that $L$ is regular. Then according to the pumping lemma, there must exist some $x$, $y$, $z$ in $\{0,1\}^*$ such that $y$ is non-empty and $xy^nz \in L \text{ for } n \geq 0$. Every string in $L$ is of the form: all 0s followed by an equal number of 1s (i.e. 00000….0011111….11). If we try to split up such a string in $x$, $y$, $z$, then a non-empty $y$ can take a form from one of the following possibilities: all 0s, all 1s, some 0s followed by some 1s.

Now we will show that any of the three types of $y$ when pumped up will result in a string that can not be a member of $L$ because either the balance (of the number of 0s and 1s) or the structure gets violated by pumping.

• $y = 0^l$ for some $l \geq 1$. Then $xy^2z$ (in fact any $xy^iz$ for $i > 1$) will result in more 0s than 1s (violating the balance of 0s and 1s); therefore, such strings can not be in $L$.
• $y = 1^m$ for some $m \geq 1$. Then $xy^2z$ (in fact any $xy^iz$ for $i > 1$) will result in more 1s than 0s (violating the balance of 0s and 1s); therefore, such strings can not be in $L$.
• $y = 0^p1^q$ for some $p,q \geq 1$. Then any $xy^2z$ (in fact any $xy^iz$ for $i > 1$) will result in a string of 0s, followed by $2 (or~ i)$ copies of $0^p1^q$, followed by 1s (i.e. $000....0^p1^q0^p1^q.....11111$ which destroys the structure); such strings can not be in $L$.Hence, there is no $y$ possible that can satisfy the pumping lemma for this language. Thus, $L$ is not regular.

### Language 0i10i1

Problem 2: Let $L$ be a language over an alphabet $\Sigma = \{0, 1\}$, such that $L = \{0^i10^i1 | i \geq 1\}$. Is $L$ regular?

#### Example

Here $L$ = {“0101”, “001001”, “00010001”, … }. $L$  is an infinite language which is a set of sequences consisting of a certain number of 0s followed by 1, then followed by the same number of 0s which in turn ends with a 1. L has a structure: A segment of all zeroes in the beginning, then a 1 followed by another segment of 0s which is followed by another 1.

#### Proof

$L$ is infinite. Let us assume that $L$ is regular. Then according to the pumping lemma, there must exist some $x$, $y$, $z$ in $\{0,1\}^*$ such that $y$ is non-empty and $xy^nz \in L \text{ for } n \geq 0$. Every string in $L$ has the above-mentioned form.:  00000….00100000….001). If we try to split up such a string in $x$, $y$, $z$, then a non-empty $y$ can take a form from one of the following possibilities: all 0s, zero or more 0s followed by a 1 followed by zero or more 0s, zero or more 0s followed by a 1 followed by all 0s of the second segment followed by 1.

Now we will show that any of the three types of $y$ when pumped up will result in a string that can not be a member of $L$ because either the balance (number of 0s in the two segments) or the structure is violated by pumping.

• $y = 0^l$ for some $l \geq 1$. This comes from either the first or the second segment. Then $xy^2z$ (in fact any $xy^iz$ for $i > 1$) will result in more 0s in that segment than the other (violating the balance of 0s in the two segments); therefore, such strings can not be in $L$.
• $y = 0^p10^q$ for some $p,q \geq 0$. Then any $xy^2z$ (in fact any $xy^iz$ for $i > 1$) will result in a string of 0s, followed by $2 (or~ i)$ copies of $0^p10^q$, followed by 0s and a 1 (i.e. $000....0^p10^q0^p10^q.....0001$ which destroys the structure); such strings can not be in $L$.
• $y = 0^p10^q1$ for some $p \geq 0$ and $q$ is the number of 0s in the second segment. Then any $xy^2z$ (in fact any $xy^iz$ for $i > 1$) will result in a string of 0s, followed by $2 (~or~ i)$ copies of $0^p10^q1$, followed by 0s and a 1 (i.e. $000....0^p10^q10^p10^q1$ which destroys the structure); such strings can not be in $L$.

Hence, there is no $y$ possible that can satisfy the pumping lemma for this language. Thus, $L$ is not regular.

## Indirect Approach –

### Language of equal number of 0s and 1s

Problem 3: Let $L$ be a language over an alphabet $\Sigma = \{0, 1\}$, such that $L = \{ x | x \in \{ 0, 1\}^* \text { and x has equal number of 0s and 1s} \}$. Is $L$ regular?

#### Example

Here $L$ = {“01”, “10”, “0011”, “0101”, “1010, “1100”, “1001”, … }. $L$  is an infinite language which is a set of sequences consisting of 0s and the same number of 1s but in no definite pattern. Here L has no structure: a sequence can take any form. Thus, we will go by the indirect approach.

#### Proof

Let us assume that $L$ is regular. Consider another language $L' = \{0^p1^q | p,q \geq 1\}$$L'$ is a language consisting of sequences which takes the form: some 0s followed by some 1s. Note that it doesn’t require 0s and 1s to be equal. It is a regular language as the following automaton (DFA) accepts it.

As L is a language with equal number of 0s and 1s and L’ is a language of 0s followed by 1s. The language resulting from the intersection of these two, say L”, consists of sequences with 0s followed by equal number of 1s. In other words $L'' = L \cap L' = \{0^i1^i | i \geq 1\}$. L’ is regular (shown by the DFA) and as per our assumption, L is also regular. Then according to the closure property under intersection, L” should also be regular.

But in Problem 1 we have shown that L” is not regular using the pumping lemma. Therefore, our assumption about L that it is regular is wrong. Hence, L is non-regular.

### Language wwR

Problem 3: Let $L$ be a language over an alphabet $\Sigma = \{0, 1\}$, such that $L = \{ww^R | w \in \{0, 1\}^* \text{ and } w^R \text{ is reverse of w}\}$. Is $L$ regular?

#### Example

Here $L$ = {“0”, “1001”, “11011”, … }. $L$  is an infinite language which is a set of (even length) palindromes i.e. the sequences consisting of strings (of 0s and 1s here) followed by the reverse of the same string. Here L has no structure: a sequence can take any form. Thus, we will go by the indirect approach.

#### Proof

Let us assume that $L$ is regular. Consider another language $L' = \{0^p110^q | p,q \geq 1\}$$L'$ is a language consisting of sequences which takes the form: some 0s followed by two 1s which in turn are followed by some 0s. Note that it doesn’t require 0s in the two segments to be equal. It is a regular language as the following automaton (DFA) accepts it.

As L is a language with a string of 0s and 1s followed by its reverse and L’ is a language of 0s followed by two 1s followed by 0s (which can be seen as a string of 0s ending in 1 followed by another string starting from 1 and followed by 0s). The language resulting from the intersection of these two, say L”, consists of sequences with a segment of 0s followed by a 1 followed by another 1 followed by another segment of equal number of 0s. In other words, $L'' = L \cap L' = \{0^i110^i | i \geq 1\}$. L’ is regular (shown by the DFA) and as per our assumption, L is also regular. Then according to the closure property under intersection, L” should also be regular.

But similar to Problem 2, we can show that L” is not regular using the pumping lemma.

$L''$ is infinite and regular. Then according to the pumping lemma, there must exist some $x$, $y$, $z$ in $\{0,1\}^*$ such that $y$ is non-empty and $xy^nz \in L \text{ for } n \geq 0$. Every string in $L$ has the above-mentioned form.:  00000….001100000….00). If we try to split up such a string in $x$, $y$, $z$, then a non-empty $y$ can take a form from one of the following possibilities: all 0s, zero or more 0s followed by a 11 followed by zero or more 0s, zero or more 0s followed by a 1, or a 1 followed by zero or more 0s.

Now we will show that pumping any of the four types of $y$ either disturbs the balance (number of 0s in the two segments) or the structure making it infeasible for the resulting pumped up string to be in L” .

• $y = 0^l$ for some $l \geq 1$. This comes from either the first or the second segment. Then $xy^2z$ (in fact any $xy^iz$ for $i > 1$) will result in more 0s in that segment than the other (violating the balance of 0s in the two segments); therefore, such strings can not be in $L''$.
• $y = 0^p110^q$ for some $p,q \geq 0$. Then any $xy^2z$ (in fact any $xy^iz$ for $i > 1$) will result in a string of 0s, followed by $2 (or~i)$ copies of $0^p110^q$, followed by 0s (i.e. $000....0^p110^q0^p110^q.....000$ which destroys the structure); such strings can not be in $L$.
• $y = 0^p1$ (or $y = 10^p$) for some $p \geq 0$. Then any $xy^2z$ (in fact any $xy^iz$ for $i > 1$) will result in a string of 0s, followed by $2 (or~ i)$ copies of $0^p1$ (or$10^p$), followed by 0s and a 1 (i.e. $000....0^p10^p100...0000$ or $000....10^p10^p000...0000$ which destroys the structure); such strings can not be in $L$.Hence, there is no $y$ possible that can satisfy the pumping lemma for this language. Thus, $L''$ is not regular.

Therefore, our initial assumption about L being regular is wrong. Hence, L is non-regular.

Note that here, in effect, we used 0i1 in place of w such that wR became 10i in L”. To obtain a regular language with the structure of L”, we just obviate the need of two numbers (representing length of the two segments) being equal.

In similar fashion it can be shown that the languages ww or wwC (where wC is the complement of w obtained by replacing 0 with 1 and vice versa) or any such combination are not regular. We can use the above cheat code of  using 0i1 in place of w, then wC becomes 1i0 in L”.

## Epilogue:

Please pardon the verbosity, oversimplification, and baby-steps. This part of the post is also more relevant to undergraduate students. Congrats, you made through all the four parts!!.🙂

Cheers!