Czy język par słów o równej długości, których odległość hamowania wynosi 2 lub więcej, jest pozbawiona kontekstu?

Czy następujący kontekst językowy jest bezpłatny?

L = {u x v y ∣ u, v, x, y \in {0, 1}^{+}, | u | = | v |, u \neq v, | x | = | y |, x \neq y}

$L = \{ uxvy \mid u,v,x,y \in \{ 0,1 \}^+, |u| = |v|, u \neq v, |x| = |y|, x \neq y\}$

Jak wskazał sdcvvc, słowo w tym języku można również opisać jako połączenie dwóch słów o tej samej długości, których odległość młotkowania wynosi 2 lub więcej.

Myślę, że to nie jest kontekstowe, ale ciężko mi to udowodnić. Próbowałem przeciąć ten język zwykłym językiem (na przykład ), a następnie użyć lematu pompowania i \ lub homomorfizmów, ale zawsze dostaję język, który jest zbyt skomplikowany, aby go scharakteryzować i zapisać. $\ 0^*1^*0^*1^*$

— Robert777
źródło

Czy próbowałeś pompować ciąg

0^{u} 1^{x} 1^{u} 0^{x}

$0^u1^x1^u0^x$ ?

— Pål GD

Tak, ale nie udało mi się wypompować tego łańcucha z języka (nie znaczy to, że nie jest to możliwe, tylko że tego nie zrobiłem).

— Robert777

@ PålGD, prawdopodobnie potrzebujesz sposobu na „zaznaczenie” elementów, na przykład

1^{u} 0 1^{x} 0 1^{u} 0 1^{x} 0

$1^u 0 1^x 0 1^u 0 1^x 0$

— vonbrand

Ten język można zapisać jako

{u v : | u | = | v |, d (u, v) \geq 2}

$\{uv:|u|=|v|,d(u,v) \geq 2\}$ gdzie

d

$d$ jest odległością Hamminga. Zauważ, że jeśli zamienimy 2 na 1, nie będzie to kontekst ( cs.stackexchange.com/questions/307 ), ale zastosowana tam sztuczka nie zadziała. Osobiście założę się, że to nie jest kontekstowe.

— sdcvvc

@sdcvvc: Masz rację, jeden dzieli

tak aby jeden z różnych bitów był

a drugi na

. Poprawiono mnie.

u

$u$

u^{'} x

$u'x$

u^{'}

$u'$

x

$x$

— András Salamon,

Odpowiedzi:

Uwaga [2019-07-30] Dowód jest błędny ... pytanie jest bardziej skomplikowane niż się wydaje.

Po nieudanej próbie tutaj jest inny pomysł.

Jeśli przecinamy $L$ z regularnym językiem $L_{reg} = 0^*10^*10^*10^*$ , otrzymujemy język CF.

Być może mamy więcej szczęścia jeśli używamy $L_{reg}' = 0^*10^*10^*10^*10^*$ (ciąg z dokładnie 4 1s).

Niech $L_1 = L \cap L_{reg}'$ , nieformalnie $w \in L_1$ jeśli można go podzielić na dwie połowy, tak że jedna połowa zawiera dokładnie $\{0,1,3,4\}$ $1s$ lub obie połowy zawierają dwie $1$ s ale ich pozycje się nie zgadzają.

Załóżmy, że $L_1$ jest CF i niech $G$ będzie gramatyką w normalnej formie Chomsky'ego, i niech

w = u v = 0^{a} 1 0^{b} 1 0^{c} 1 0^{d} 1 0^{e} \in L_{1}

$w = uv = 0^a 1 0^b 1 0^c 1 0^d 1 0^e \in L_1$

Mamy $|u|=|v|$ (nawet długość) $d(u,v) \geq 2$

Jeśli ograniczymy naszą uwagę do sposobów, w jakie można wygenerować cztery 1-ty $w$ , mamy trzy przypadki pokazane na górze rysunku 1. Centralna część rysunku 1 pokazuje pierwszy przypadek (ale pozostałe są podobne) .

wprowadź opis zdjęcia tutaj
Rycina 1 (pełny obraz można pobrać tutaj )

Jeśli wybierzemy $a=e, c=2a$ i $b,d \gg a$ , zobaczymy, że zera między dwiema parami 1s muszą być niezależnie pompowalne (czerwone węzły na rysunku): w szczególności, dla wystarczająco dużego $b \gg a$ , otrzymujemy duplikat nieterminalnego węzła w wewnętrznym poddrzewie (węzeł X na ryc. 2) lub powtarzające się podsekwencje na ścieżce w kierunku pierwszego lub drugiego 1 (węzeł Y na ryc. 2). Należy zauważyć, że Figura 2 jest nieco uproszczony: nie może być bardziej nieterminalowi węzłów pomiędzy dwoma $X$ s, a także między tymi dwoma $Ys$ ( $Y\to ... \to Z_i \to ... Y$ ale przy $Z_i$ daje tylko 0 po prawej stronie pierwszego 1).

wprowadź opis zdjęcia tutaj
Rysunek 2

Możemy więc naprawić dowolny $a = e = k, c = 2a$ , a następnie wybrać wystarczająco duży $b$ aby uzyskać niezależnie pompowalny węzeł w sekwencji zer między pierwszym a drugim $1$ . Dla sekwencji zer między trzecim a czwartym 1 możemy wybrać $d = b! +b$ .
Ale $0^b$ jest pompowalne niezależnie, więc istnieje $p \leq b$ pompowalny substrat $y$ , tzn. Taki, że $b = xyz, |y|=p, |x|\geq 0, |z|\geq 0$ i $xy^iz = b!+b$ . Ciąg, który otrzymujemy to:

w^{'} = 0^{k} 1 0^{b! + b} 1 0^{2 k} 1 0^{b! + b} 1 0^{k}

$w' = 0^k 1 0^{b!+b} 1 0^{2k} 1 0^{b!+b} 1 0^k$

a $w' \notin L_1$ . Zatem $L_1$ jest CF i wreszcie $L$ jest CF.

Jeśli dowód jest poprawny (???), można go rozszerzyć na każdy język $L_k = \{ uv : |u|=|v|, d(u,v)\geq k\}, k\geq 2$

— Vor
źródło

Obawiam się, że nagroda wygaśnie, zanim będziemy w stanie zweryfikować ten dowód, więc jeśli nie pojawią się żadne drastyczne informacje w ciągu najbliższych 4 godzin, otrzymamy punkty za najlepszą jak dotąd próbę.

— jmite

@jmite: nie martw się, istnieje duża szansa, że jest to zła próba, tak jak poprzednia (która trwała około 30 minut przed wykryciem trywialnego błędu) :-) :-)

— Vor

Skąd ta różnica? Gałęzie gramatyki nie mają związku z połówkami słowa. Ale myślę, że to nie ma znaczenia; jeśli dowód działa, to rozróżnienie przypadków nie jest potrzebne. Spojrzenie na założoną gramatykę i użycie dowodu lematu pompującego zamiast samego lematu jest fajną sztuczką (należy to robić częściej). Mam jeden (prawdziwy) problem: jeśli pompujesz podciąg o wartości

, otrzymujesz

; Nie rozumiem, jak dostałeś się do

. Nie uważaj, że to powinno zaszkodzić dowodowi, ale lepiej sprawdź. Możesz także wyprostować notację (i literówki).

0^{b}

$0^b$

0^{b + p (i - 1)}

$0^{b+p(i-1)}$

b + b!

$b+b!$

— Raphael

@Raphael: dzięki za komentarze. Być może się mylę, ale jeśli wybierzesz docelową długość

następnie dla każdej długości pompowania

ciąg

można rozłożyć na

i można go przepompować do

, rzeczywiście w twoim przykładzie p na pewno dzieli

b + b!

$b+b!$

p

$p$

0^{b}

$0^b$

0^{x y z}, (| x y z | = b, | y | = p \leq b)

$0^{xyz}, (|xyz|=b, |y|=p \leq b)$

x y^{i} z = b + b!

$xy^iz = b + b!$

b!

$b!$ , więc istnieje

dla którego

, ale pierwotna długość ciągu wynosi

, więc całkowita długość pompowana wynosi

. Pamiętam to z kilku ćwiczeń, które wykorzystują lemat Ogdena ... teraz sprawdzę je dwukrotnie.

(i - 1)

$(i-1)$

p (i - 1) = b!

$p(i-1)=b!$

b

$b$

| x y^{(i - 1)} z | = b + b!

$|xy^{(i-1)}z| = b+b!$

— Vor

@Raphael: ... I didn't find the proof anywhere but only a paper by Zach Tomaszewski that proves that the complement of

L_{d u p} = {w w}

$L_{dup} = \{ ww \}$ is CF (see question ), so perhaps it is a new result (though simple); and a pumping-lemma-style theorem can be derived for languages with strings that contain a finite number of a particular symbol and substrings of arbitary length between them.

— Vor

After 2 failed attempts, that were disproved by @Hendrik Jan (thank you), here is another one, that is not more successful. @Vor found an example of a deterministic CF language where the same construction would apply, if correct. This allowed identifying an error in the anchoring of the $y$ string in the application of the lemma. The lemma itself does not seem at fault. This is clearly too simplistic a construction. See more details in the comments.

The language $L = \{ uxvy \mid u,v,x,y \in \{ 0,1 \}^*\text \{ \epsilon \} \ ,\ \mid u \mid = \mid v \mid \ , \ u \not= v \ , \ \mid x \mid = \mid y \mid \ , \ x \not= y \ \}$ is not Context-Free.

It is helpful to keep in mind the characterization $L= \{uv:|u|=|v|,d(u,v) \geq 2\}$ where d is the Hamming distance, proposed by @sdcvvc. What one needs to think about are 2 selected positions in each half string such that the corresponding symbols differ.

Then you consider a string $10^i10^j$ such that $i \lt j$ and $i+j$ is even. It is clearly in the language L, by cutting $u$ and $x$ anywhere between the two 1's. We want to pump that string on the first part between the 1's, so that it will become $10^j10^j$ which is not supposed to be in the language.

We first try to use Ogden's lemma, which is like the pumping lemma, but applies to $p$ or more distinguished symbols that are marked on the string, $p$ being the pumping length for marked symbols (but the lemma can pump more because it can pump also unmarked symbols). The pumping marked-length $p$ depends only on the language. This attempt will fail, but the failure will be a hint.

We can then choose $i=p$ and we mark symbols on the first sequence of $i$ 0's. We know that none of the two 1's will be in the pump, because it can pump out once (exponent 0) instead of pumping in. And pumping out the 1's would get us out of the language.

However, we could be pumping on both sides of the second 1 as fast or even faster on the right side, so that the second 1 would never get across the middle of the string. Also Ogden's lemma does not fix an upper limit to the size of what is being pumped, so that it is not possible to organize the pumping to get the rightmost 1 exactly across the middle of the string.

We use a modified version of the lemma, here called Nash's Lemma, which can handle these difficulties.

We first need a definition (it probably has another name in the literature, but I do not know which - help is welcome). A string $u$ is said to be an erasure of a string $v$ iff it is obtained from $v$ by erasing symbols in $v$ . We will note $u \prec v$ .

$L$ $p\gt0$ $q\gt 0$ $w$ $p$ $L$ $p$ $w$ $w$ $w=uxyzv$ with string $u$ , $x$ , $y$ , $z$ , $v$ , such that

$xz$ has at least one marked position,
$xyz$ $p$
1. $\hat x \prec x$ $\hat y \prec y$ $\hat z \prec z$
2. $1 \leq \mid \hat x \hat z \mid \leq q$ , $1 \leq \mid \hat y \mid \leq q$ , and
3. $ux^j\hat x^i\hat y\hat z^iz^jv$ is in $L$ for every $i \geq 0$ and for every $j \geq 0$ .

Proof: Similar to the proof of Ogden's lemma, but the subtrees corresponding to the strings $y$ and $xz$ are pruned so that they do not contain any path with twice the same non-terminal (except for the roots of these two subtrees). This necessarily limits the size of the generated strings $\hat x\hat z$ and $\hat y$ by a constant $q$ . The strings $x^j$ and $z^j$ , for $j \geq 0$ , corresponding to an unpruned version of the tree, are used mainly with $j=1$ to simplify the accounting when the lemma is applied.

We modify the above proof attempt by marking the $p$ leftmost symbols 0, but they are followed by $2q$ symbols 0 to make sure that we pump in the left part of the string, between the two 1's. That make a total of $i = p + 2q$ 0's between the 1's (actually $i = p + q$ would be sufficient, since the rightmost 1 cannot be in $\hat z$ , which would allow to simply remove it).

What is left is to have chosen $j$ so that we can pump exactly the right number of 0's so that the two sequences are equal. But so far, the only constraint on $j$ is to be greater than $i$ . And we also know that the number of 0's that are pumped at each pumping is between 1 and q. So let $h$ be product of the first $q$ integers. We choose $j=i+h$ .

Hence, since the pumping increment $d$ - whatever it is - is in $[1,q]$ , it divides $h$ . Let $k$ be the quotient. If we pump exactly $k$ times, we get a string $10^j10^j$ which is not in the language. Hence L is not context-free.

I think that I shall never see
A string lovely as a tree.
For if it does not have a parse,
The string is naught but a farce

— babou
źródło

Note however that the pass over the second half reads the stack in reverse. That seems to mean that the two positions are in the same position in both halves, but in reverse?

— Hendrik Jan

you are correct ... I goofed ... now I know what was nagging me at the back of my head.

— babou

I recognized the argument (because I could not make it work when I tried myself).

— Hendrik Jan

Should I leave this wrong answer ? It is somehow helping, I think, as it make the problem suspiciously similar to

a^{i} b^{j} c^{k} a^{i} b^{j} c^{k}

${a^ib^jc^ka^ib^jc^k}$ . The problem is that rules of the site are not intended to encourage wrong results for discussion ( I mean I do not enjoy downvotes more than anyone else).

— babou

@HendrikJan Did I goof again ? (BTW, thanks for making it a discussion)

— babou

-1

by this question I think $L$ is context-free and generated by the following grammar $\qquad\begin{align} S &\to AXBY \mid BYAX \\ A &\to 0 \mid 0A0 \mid 0A1 \mid 1A0 \mid 1A1 \\ B &\to 1 \mid 0B0 \mid 0B1 \mid 1B0 \mid 1B1 \\ X &\to 0 \mid 0X0 \mid 0X1 \mid 1X0 \mid 1X1 \\ Y &\to 1 \mid 0Y0 \mid 0Y1 \mid 1Y0 \mid 1Y1 \\ \end{align}$

— M.K. Dadsetani
źródło

This is incorrect; you cannot guard that length of AX is the same as BY. For example, your grammar generates S -> AXBY -> A011 -> 0A1011 -> 001011 which is not in the original language. Also, your symbols A and X generate the same language, same for B and Y; they can be merged.

— sdcvvc