1P5: Zmieniacz słów

Zostało to napisane w ramach First Periodic Premier Programming Puzzle Push .

Gra

Zapewnione jest słowo początkowe i końcowe o tej samej długości. Celem gry jest zmiana jednej litery w słowie początkowym, aby utworzyć inne prawidłowe słowo, powtarzanie tego kroku, aż do osiągnięcia słowa końcowego, przy użyciu jak najmniejszej liczby kroków. Na przykład, biorąc pod uwagę słowa TREE i FLED, wynikiem byłoby:

TREE
FREE
FLEE
FLED
2

Dane techniczne

Artykuły w Wikipedii dotyczące OWL lub SOWPODS mogą być przydatnym punktem wyjścia do listy słów.
Program powinien obsługiwać dwa sposoby wyboru słów początkowych i końcowych:
1. Określony przez użytkownika za pomocą wiersza polecenia, standardowego wejścia lub innego odpowiedniego dla wybranego języka (wystarczy wspomnieć o tym, co robisz).
2. Wybieranie losowo 2 słów z pliku.
Słowa początkowe i końcowe, a także wszystkie słowa pośrednie powinny mieć tę samą długość.
Każdy krok powinien być wydrukowany na linii.
Ostatnim wierszem danych wyjściowych powinna być liczba kroków pośrednich wymaganych do przejścia między słowami początkowymi i końcowymi.
Jeśli nie można znaleźć dopasowania między słowami początkowymi i końcowymi, wynik powinien składać się z 3 wierszy: słowa początkowego, słowa końcowego i słowa OY.
W odpowiedzi umieść oznaczenie Big O dla swojego rozwiązania
Dołącz 10 unikalnych par słów początkowych i końcowych (oczywiście z ich wynikiem), aby pokazać kroki, które wykonuje Twój program. (Aby zaoszczędzić miejsce, podczas gdy twój program powinien wypisywać je w poszczególnych wierszach, możesz skonsolidować je w jednym wierszu w celu opublikowania, zastępując nowe wiersze spacjami i przecinkiem między każdym uruchomieniem.

Cele / kryteria wygranej

Zwycięży najszybsze / najlepsze rozwiązanie Big O zapewniające najkrótsze tymczasowe kroki po tygodniu.
Jeśli remis wynika z kryteriów Big O, wygrywa najkrótszy kod.
Jeśli nadal będzie remis, wygra pierwsze rozwiązanie, które umożliwi najszybszą i najkrótszą wersję.

Testy / Wyjście próbki

DIVE
DIME
DAME
NAME
2

PEACE
PLACE
PLATE
SLATE
2

HOUSE
HORSE
GORSE
GORGE
2

POLE
POSE
POST
PAST
FAST
3

Uprawomocnienie

Pracuję nad skryptem, którego można użyć do sprawdzenia poprawności danych wyjściowych.

To będzie:

Upewnij się, że każde słowo jest poprawne.
Upewnij się, że każde słowo różni się dokładnie o 1 literę od poprzedniego słowa.

Nie będzie:

Sprawdź, czy zastosowano najmniejszą liczbę kroków.

Kiedy już to otrzymam, oczywiście zaktualizuję ten post. (:

code-challenge word-puzzle 1p5

— Rebecca Chernoff
źródło

Wydaje mi się dziwne, że wykonanie 3 operacji, aby dostać się HOUSEdo, GORGEjest zgłaszane jako 2. Zdaję sobie sprawę, że są 2 pośrednie słowa, więc ma to sens, ale # operacji byłoby bardziej intuicyjne.

— Mateusz

@Peter, według strony wikipedii sowpods jest ~ 15 000 słów dłuższych niż 13 liter

— gnibbler

Nie chcę być wszystkim znanym, ale łamigłówka faktycznie ma nazwę, została wymyślona przez Lewisa Carrolla en.wikipedia.org/wiki/Word_ladder

— st0le

Masz nierozstrzygalny cel w pytaniu: The fastest/best Big O solution producing the shortest interim steps after one week will win.Ponieważ nie możesz zagwarantować, że najszybszym rozwiązaniem jest w międzyczasie ten, który wykorzystuje najmniej kroków, powinieneś podać preferencję, jeśli jedno rozwiązanie używa mniejszej liczby kroków, ale osiągnie cel później.

— użytkownik nieznany

Chcę tylko potwierdzić BATi CATmam zero kroków, prawda?

— st0le

Odpowiedzi:

Ponieważ długość jest podana jako kryterium, oto wersja golfowa z 1681 znakami (prawdopodobnie nadal mogłaby być poprawiona o 10%):

import java.io.*;import java.util.*;public class W{public static void main(String[]
a)throws Exception{int n=a.length<1?5:a[0].length(),p,q;String f,t,l;S w=new S();Scanner
s=new Scanner(new
File("sowpods"));while(s.hasNext()){f=s.next();if(f.length()==n)w.add(f);}if(a.length<1){String[]x=w.toArray(new
String[0]);Random
r=new Random();q=x.length;p=r.nextInt(q);q=r.nextInt(q-1);f=x[p];t=x[p>q?q:q+1];}else{f=a[0];t=a[1];}H<S>
A=new H(),B=new H(),C=new H();for(String W:w){A.put(W,new
S());for(p=0;p<n;p++){char[]c=W.toCharArray();c[p]='.';l=new
String(c);A.get(W).add(l);S z=B.get(l);if(z==null)B.put(l,z=new
S());z.add(W);}}for(String W:A.keySet()){C.put(W,w=new S());for(String
L:A.get(W))for(String b:B.get(L))if(b!=W)w.add(b);}N m,o,ñ;H<N> N=new H();N.put(f,m=new
N(f,t));N.put(t,o=new N(t,t));m.k=0;N[]H=new
N[3];H[0]=m;p=H[0].h;while(0<1){if(H[0]==null){if(H[1]==H[2])break;H[0]=H[1];H[1]=H[2];H[2]=null;p++;continue;}if(p>=o.k-1)break;m=H[0];H[0]=m.x();if(H[0]==m)H[0]=null;for(String
v:C.get(m.s)){ñ=N.get(v);if(ñ==null)N.put(v,ñ=new N(v,t));if(m.k+1<ñ.k){if(ñ.k<ñ.I){q=ñ.k+ñ.h-p;N
Ñ=ñ.x();if(H[q]==ñ)H[q]=Ñ==ñ?null:Ñ;}ñ.b=m;ñ.k=m.k+1;q=ñ.k+ñ.h-p;if(H[q]==null)H[q]=ñ;else{ñ.n=H[q];ñ.p=ñ.n.p;ñ.n.p=ñ.p.n=ñ;}}}}if(o.b==null)System.out.println(f+"\n"+t+"\nOY");else{String[]P=new
String[o.k+2];P[o.k+1]=o.k-1+"";m=o;for(q=m.k;q>=0;q--){P[q]=m.s;m=m.b;}for(String
W:P)System.out.println(W);}}}class N{String s;int k,h,I=(1<<30)-1;N b,p,n;N(String S,String
d){s=S;for(k=0;k<d.length();k++)if(d.charAt(k)!=S.charAt(k))h++;k=I;p=n=this;}N
x(){N r=n;n.p=p;p.n=n;n=p=this;return r;}}class S extends HashSet<String>{}class H<V>extends
HashMap<String,V>{}

Wersja bez golfa, która używa nazw pakietów i metod i nie daje ostrzeżeń ani nie rozszerza klas tylko dla ich aliasu, to:

package com.akshor.pjt33;

import java.io.*;
import java.util.*;

// WordLadder partially golfed and with reduced dependencies
//
// Variables used in complexity analysis:
// n is the word length
// V is the number of words (vertex count of the graph)
// E is the number of edges
// hash is the cost of a hash insert / lookup - I will assume it's constant, but without completely brushing it under the carpet
public class WordLadder2
{
    private Map<String, Set<String>> wordsToWords = new HashMap<String, Set<String>>();

    // Initialisation cost: O(V * n * (n + hash) + E * hash)
    private WordLadder2(Set<String> words)
    {
        Map<String, Set<String>> wordsToLinks = new HashMap<String, Set<String>>();
        Map<String, Set<String>> linksToWords = new HashMap<String, Set<String>>();

        // Cost: O(Vn * (n + hash))
        for (String word : words)
        {
            // Cost: O(n*(n + hash))
            for (int i = 0; i < word.length(); i++)
            {
                // Cost: O(n + hash)
                char[] ch = word.toCharArray();
                ch[i] = '.';
                String link = new String(ch).intern();
                add(wordsToLinks, word, link);
                add(linksToWords, link, word);
            }
        }

        // Cost: O(V * n * hash + E * hash)
        for (Map.Entry<String, Set<String>> from : wordsToLinks.entrySet()) {
            String src = from.getKey();
            wordsToWords.put(src, new HashSet<String>());
            for (String link : from.getValue()) {
                Set<String> to = linksToWords.get(link);
                for (String snk : to) {
                    // Note: equality test is safe here. Cost is O(hash)
                    if (snk != src) add(wordsToWords, src, snk);
                }
            }
        }
    }

    public static void main(String[] args) throws IOException
    {
        // Cost: O(filelength + num_words * hash)
        Map<Integer, Set<String>> wordsByLength = new HashMap<Integer, Set<String>>();
        BufferedReader br = new BufferedReader(new FileReader("sowpods"), 8192);
        String line;
        while ((line = br.readLine()) != null) add(wordsByLength, line.length(), line);

        if (args.length == 2) {
            String from = args[0].toUpperCase();
            String to = args[1].toUpperCase();
            new WordLadder2(wordsByLength.get(from.length())).findPath(from, to);
        }
        else {
            // 5-letter words are the most interesting.
            String[] _5 = wordsByLength.get(5).toArray(new String[0]);
            Random rnd = new Random();
            int f = rnd.nextInt(_5.length), g = rnd.nextInt(_5.length - 1);
            if (g >= f) g++;
            new WordLadder2(wordsByLength.get(5)).findPath(_5[f], _5[g]);
        }
    }

    // O(E * hash)
    private void findPath(String start, String dest) {
        Node startNode = new Node(start, dest);
        startNode.cost = 0; startNode.backpointer = startNode;

        Node endNode = new Node(dest, dest);

        // Node lookup
        Map<String, Node> nodes = new HashMap<String, Node>();
        nodes.put(start, startNode);
        nodes.put(dest, endNode);

        // Heap
        Node[] heap = new Node[3];
        heap[0] = startNode;
        int base = heap[0].heuristic;

        // O(E * hash)
        while (true) {
            if (heap[0] == null) {
                if (heap[1] == heap[2]) break;
                heap[0] = heap[1]; heap[1] = heap[2]; heap[2] = null; base++;
                continue;
            }

            // If the lowest cost isn't at least 1 less than the current cost for the destination,
            // it can't improve the best path to the destination.
            if (base >= endNode.cost - 1) break;

            // Get the cheapest node from the heap.
            Node v0 = heap[0];
            heap[0] = v0.remove();
            if (heap[0] == v0) heap[0] = null;

            // Relax the edges from v0.
            int g_v0 = v0.cost;
            // O(hash * #neighbours)
            for (String v1Str : wordsToWords.get(v0.key))
            {
                Node v1 = nodes.get(v1Str);
                if (v1 == null) {
                    v1 = new Node(v1Str, dest);
                    nodes.put(v1Str, v1);
                }

                // If it's an improvement, use it.
                if (g_v0 + 1 < v1.cost)
                {
                    // Update the heap.
                    if (v1.cost < Node.INFINITY)
                    {
                        int bucket = v1.cost + v1.heuristic - base;
                        Node t = v1.remove();
                        if (heap[bucket] == v1) heap[bucket] = t == v1 ? null : t;
                    }

                    // Next update the backpointer and the costs map.
                    v1.backpointer = v0;
                    v1.cost = g_v0 + 1;

                    int bucket = v1.cost + v1.heuristic - base;
                    if (heap[bucket] == null) {
                        heap[bucket] = v1;
                    }
                    else {
                        v1.next = heap[bucket];
                        v1.prev = v1.next.prev;
                        v1.next.prev = v1.prev.next = v1;
                    }
                }
            }
        }

        if (endNode.backpointer == null) {
            System.out.println(start);
            System.out.println(dest);
            System.out.println("OY");
        }
        else {
            String[] path = new String[endNode.cost + 1];
            Node t = endNode;
            for (int i = t.cost; i >= 0; i--) {
                path[i] = t.key;
                t = t.backpointer;
            }
            for (String str : path) System.out.println(str);
            System.out.println(path.length - 2);
        }
    }

    private static <K, V> void add(Map<K, Set<V>> map, K key, V value) {
        Set<V> vals = map.get(key);
        if (vals == null) map.put(key, vals = new HashSet<V>());
        vals.add(value);
    }

    private static class Node
    {
        public static int INFINITY = Integer.MAX_VALUE >> 1;

        public String key;
        public int cost;
        public int heuristic;
        public Node backpointer;

        public Node prev = this;
        public Node next = this;

        public Node(String key, String dest) {
            this.key = key;
            cost = INFINITY;
            for (int i = 0; i < dest.length(); i++) if (dest.charAt(i) != key.charAt(i)) heuristic++;
        }

        public Node remove() {
            Node rv = next;
            next.prev = prev;
            prev.next = next;
            next = prev = this;
            return rv;
        }
    }
}

Jak widać, analiza kosztów bieżących jest O(filelength + num_words * hash + V * n * (n + hash) + E * hash). Jeśli zaakceptujesz moje założenie, że wstawianie / wyszukiwanie tablicy skrótów jest ciągłym czasem, to znaczy O(filelength + V n^2 + E). Szczegółowe statystyki wykresów w SOWPODS oznaczają, że O(V n^2)naprawdę dominuje ono O(E)dla większości n.

Przykładowe wyniki:

IDOLA, IDOLE, IDYLS, ODYLS, ODALS, OVALS, FOLIE, PIEKARNIKI, WIECZORY, ETENS, STENS, SKENS, SKÓRY, SPINY, KRĘGOSŁUP, 13

WICCA, PROSY, OY

BRINY, BRINS, TRINS, TAINS, TARNS, YARNS, YAWNS, YAWPS, YAPPS, 7

GALESY, GAZY, GASTY, GESTY, GESTE, GESSE, DESSE, 5

SURES, DURES, DUNES, DINES, DINGS, DINGY, 4

LICHT, LIGHT, BIGHT, BIGOT, BIGOS, BIROS, GIROS, GIRNS, GURNS, GUANS, GUANA, RUANA, 10

SARGE, SERGE, SERRE, SERRS, SEERS, DEERS, DYERS, OYERS, OVERS, OVELS, OVALS, ODALS, ODYLS, IDYLS, 12

KEIRY, SEIRS, SEERS, PIWA, BRERS, BRERE, BREME, CREME, CREPE, 7

Jest to jedna z 6 par o najdłuższej najkrótszej ścieżce:

GAINEST, FAINEST, FAIREST, SAIREST, SAIDEST, SADDEST, MADDEST, MIDDEST, MILDEST, WILDEST, WILIEST, WALIEST, WANIEST, CANIEST, CANTEST, KONKURS, KONFEST, CONFESS, CONFERS, CONKERS, KOOKERY, POPOPI, POPIERS, MIEDŹ MIESZANKI, MĄKI, POPSIE, MIĘSKI, MUZY, MUSZKI, MASY, PLUSZKI, MISZKI, PRASY, PRASY, PRÓBY, MOBILE, LĘKNIĘCIA, NIEPOKÓJ, NIEZBĘDNE, NIEZALEŻNE, NIEZBĘDNE, NIEZBĘDNE, NIEZMODNIONE, ZNOWIONE, ZNOWIONE, NIEDRODOWANE, ZAKOŃCZONE. INDEKSY, INDENE, INDENTACJE, INCENTY, INCESTY, INFESTY, INFEKTY, INIEKTY, 56

I jedna z najgorzej rozpuszczalnych 8-literowych par:

WYGŁADZANIE, ROZKŁADANIE, ROZKŁADANIE, ODKRYWANIE, ODKRYWANIE, ODKRYWANIE, ODKRYWANIE, WYGŁADZANIE, WYKRAWANIE, USUWANIE, USUWANIE, WYKŁADANIE, UTWORZANIE, WYKŁADANIE, OPRYSKIWANIE, MODLOWANIE, STROYING, STROKING, STUMK, ZESPÓŁ, STUMK ZACISKANIE, ZACISKANIE, CRISPINY, CRISPENS, CRISPERS, CRIMPERS, CRAMPERS, CLAMPERS, CLASPERS, CLASHERS, SLASHERS, SLATHERS, SLITHERS, SMITHERS, SMOTHERS, SOOTHERS, SOUTHERS, MOUTHERS, MOUCHERS PACHER, COACH LUNCHERY, LYNCHERY, LYNCHETY, LINCHETY, 52

Teraz, gdy wydaje mi się, że mam wszystkie wymagania pytania na bok, moja dyskusja.

W przypadku CompSci pytanie oczywiście sprowadza się do najkrótszej ścieżki na wykresie G, którego wierzchołki są słowami i których krawędzie łączą słowa różniące się jedną literą. Efektywne generowanie wykresu nie jest trywialne - mam pomysł, że muszę ponownie odwiedzić, aby zmniejszyć złożoność do O (Vn hash + E). Sposób, w jaki to robię, polega na utworzeniu wykresu, który wstawia dodatkowe wierzchołki (odpowiadające słowom z jednym znakiem wieloznacznym) i jest homeomorficzny dla danego wykresu. Zastanawiałem się nad użyciem tego wykresu zamiast zmniejszania do G - i przypuszczam, że z golfowego punktu widzenia powinienem to zrobić - na podstawie tego, że węzeł wieloznaczny z więcej niż 3 krawędziami zmniejsza liczbę krawędzi na wykresie, a standardowy najgorszy przypadek czasu działania algorytmów najkrótszej ścieżki to O(V heap-op + E).

Jednak pierwszą rzeczą, jaką zrobiłem, było przeprowadzenie analizy wykresów G dla różnych długości słów i odkryłem, że są one bardzo rzadkie dla słów o długości 5 lub więcej liter. 5-literowy wykres ma 12478 wierzchołków i 40759 krawędzi; dodanie węzłów łącza pogarsza wykres. Do czasu, gdy masz do 8 liter, jest mniej krawędzi niż węzłów, a 3/7 słów jest „na uboczu”. Odrzuciłem więc pomysł optymalizacji, ponieważ nie był zbyt pomocny.

Pomysł, który okazał się pomocny, to zbadanie stosu. Mogę szczerze powiedzieć, że w przeszłości stosowałem umiarkowanie egzotyczne sterty, ale żadne z nich nie było tak egzotyczne. Używam gwiazdki A (ponieważ C nie daje korzyści, biorąc pod uwagę stos, którego używam) z oczywistą heurystyką liczby liter różnych od celu, a trochę analizy pokazuje, że w dowolnym momencie nie ma więcej niż 3 różnych priorytetów w kupie. Kiedy wbijam węzeł, którego priorytetem jest (koszt + heurystyka) i patrzę na jego sąsiadów, rozważam trzy przypadki: 1) koszt sąsiada to koszt + 1; heurystyka sąsiada jest heurystyczna-1 (ponieważ zmieniana litera staje się „poprawna”); 2) koszt + 1 i heurystyka + 0 (ponieważ litera, którą zmienia, zmienia się z „zła” na „nadal zła”; 3) koszt + 1 i heurystyka + 1 (ponieważ litera, którą zmienia, zmienia się z „poprawna” na „zła”). Więc jeśli zrelaksuję sąsiada, wstawię go z tym samym priorytetem, priorytetem + 1 lub priorytetem + 2. W rezultacie mogę użyć 3-elementowej tablicy połączonych list dla sterty.

Powinienem dodać notatkę o moim założeniu, że wyszukiwania skrótów są stałe. Można powiedzieć, że dobrze, ale co z obliczeniami mieszania? Odpowiedź brzmi: amortyzuję je: java.lang.Stringbuforuję hashCode(), więc całkowity czas spędzony na obliczaniu wartości skrótu wynosi O(V n^2)(przy generowaniu wykresu).

Jest jeszcze jedna zmiana, która wpływa na złożoność, ale pytanie, czy jest to optymalizacja, czy nie, zależy od twoich założeń dotyczących statystyki. (IMO podając jako kryterium „najlepsze rozwiązanie Big O” jest błędem, ponieważ nie ma najlepszej złożoności z prostego powodu: nie ma jednej zmiennej). Ta zmiana wpływa na krok generowania wykresu. W powyższym kodzie jest to:

        Map<String, Set<String>> wordsToLinks = new HashMap<String, Set<String>>();
        Map<String, Set<String>> linksToWords = new HashMap<String, Set<String>>();

        // Cost: O(Vn * (n + hash))
        for (String word : words)
        {
            // Cost: O(n*(n + hash))
            for (int i = 0; i < word.length(); i++)
            {
                // Cost: O(n + hash)
                char[] ch = word.toCharArray();
                ch[i] = '.';
                String link = new String(ch).intern();
                add(wordsToLinks, word, link);
                add(linksToWords, link, word);
            }
        }

        // Cost: O(V * n * hash + E * hash)
        for (Map.Entry<String, Set<String>> from : wordsToLinks.entrySet()) {
            String src = from.getKey();
            wordsToWords.put(src, new HashSet<String>());
            for (String link : from.getValue()) {
                Set<String> to = linksToWords.get(link);
                for (String snk : to) {
                    // Note: equality test is safe here. Cost is O(hash)
                    if (snk != src) add(wordsToWords, src, snk);
                }
            }
        }

To O(V * n * (n + hash) + E * hash). Ale O(V * n^2)część pochodzi z wygenerowania nowego ciągu n-znaków dla każdego łącza, a następnie obliczenia jego kodu mieszającego. Można tego uniknąć dzięki klasie pomocniczej:

    private static class Link
    {
        private String str;
        private int hash;
        private int missingIdx;

        public Link(String str, int hash, int missingIdx) {
            this.str = str;
            this.hash = hash;
            this.missingIdx = missingIdx;
        }

        @Override
        public int hashCode() { return hash; }

        @Override
        public boolean equals(Object obj) {
            Link l = (Link)obj; // Unsafe, but I know the contexts where I'm using this class...
            if (this == l) return true; // Essential
            if (hash != l.hash || missingIdx != l.missingIdx) return false;
            for (int i = 0; i < str.length(); i++) {
                if (i != missingIdx && str.charAt(i) != l.str.charAt(i)) return false;
            }
            return true;
        }
    }

Następnie staje się pierwsza połowa generowania wykresu

        Map<String, Set<Link>> wordsToLinks = new HashMap<String, Set<Link>>();
        Map<Link, Set<String>> linksToWords = new HashMap<Link, Set<String>>();

        // Cost: O(V * n * hash)
        for (String word : words)
        {
            // apidoc: The hash code for a String object is computed as
            // s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
            // Cost: O(n * hash)
            int hashCode = word.hashCode();
            int pow = 1;
            for (int j = word.length() - 1; j >= 0; j--) {
                Link link = new Link(word, hashCode - word.charAt(j) * pow, j);
                add(wordsToLinks, word, link);
                add(linksToWords, link, word);
                pow *= 31;
            }
        }

Korzystając ze struktury skrótu, możemy wygenerować linki O(V * n). Ma to jednak efekt domina. Nieodłącznym elementem mojego założenia, że wyszukiwania skrótów są ciągłe, jest założenie, że porównywanie obiektów pod kątem równości jest tanie. Jednak test równości Link jest O(n)w najgorszym przypadku. Najgorszym przypadkiem jest zderzenie mieszające między dwoma równymi linkami wygenerowanymi z różnych słów - tj. Występuje to O(E)w drugiej połowie generowania wykresu. Poza tym, z wyjątkiem mało prawdopodobnego przypadku kolizji skrótu między nierównymi linkami, jesteśmy dobrzy. Więc wymieniliśmy się O(V * n^2)na O(E * n * hash). Zobacz mój poprzedni punkt na temat statystyk.

— Peter Taylor
źródło

Wierzę, że 8192 jest domyślnym rozmiarem bufora dla BufferedReader (na SunVM)

— st0le

@ st0le, pominąłem ten parametr w wersji golfowej i nie szkodzi to w wersji bez golfa.

— Peter Taylor

Jawa

Złożoność : ?? (Nie mam stopnia CompSci, więc byłbym wdzięczny za pomoc w tej sprawie.)

Wprowadzanie : Podaj parę słów (więcej niż 1 parę, jeśli chcesz) w wierszu polecenia. Jeśli nie podano linii poleceń, wybierane są 2 różne losowe słowa.

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.util.HashMap;
import java.util.HashSet;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import java.util.Random;
import java.util.Set;

public class M {

    // for memoization
    private static Map<String, List<String>> memoEdits = new HashMap<String, List<String>>(); 
    private static Set<String> dict;

    private static List<String> edits(String word, Set<String> dict) {
        if(memoEdits.containsKey(word))
            return memoEdits.get(word);

        List<String> editsList = new LinkedList<String>();
        char[] letters = word.toCharArray();
        for(int i = 0; i < letters.length; i++) {
            char hold = letters[i];
            for(char ch = 'A'; ch <= 'Z'; ch++) {
                if(ch != hold) {
                    letters[i] = ch;
                    String nWord = new String(letters);
                    if(dict.contains(nWord)) {
                        editsList.add(nWord);
                    }
                }
            }
            letters[i] = hold;
        }
        memoEdits.put(word, editsList);
        return editsList;
    }

    private static Map<String, String> bfs(String wordFrom, String wordTo,
                                           Set<String> dict) {
        Set<String> visited = new HashSet<String>();
        List<String> queue = new LinkedList<String>();
        Map<String, String> pred = new HashMap<String, String>();
        queue.add(wordFrom);
        while(!queue.isEmpty()) {
            String word = queue.remove(0);
            if(word.equals(wordTo))
                break;

            for(String nWord: edits(word, dict)) {
                if(!visited.contains(nWord)) {
                    queue.add(nWord);
                    visited.add(nWord);
                    pred.put(nWord, word);
                }
            }
        }
        return pred;
    }

    public static void printPath(String wordTo, String wordFrom) {
        int c = 0;
        Map<String, String> pred = bfs(wordFrom, wordTo, dict);
        do {
            System.out.println(wordTo);
            c++;
            wordTo = pred.get(wordTo);
        }
        while(wordTo != null && !wordFrom.equals(wordTo));
        System.out.println(wordFrom);
        if(wordTo != null)
            System.out.println(c - 1);
        else
            System.out.println("OY");
        System.out.println();
    }

    public static void main(String[] args) throws Exception {
        BufferedReader scan = new BufferedReader(new FileReader(new File("c:\\332609\\dict.txt")),
                                                 40 * 1024);
        String line;
        dict = new HashSet<String>(); //the dictionary (1 word per line)
        while((line = scan.readLine()) != null) {
            dict.add(line);
        }
        scan.close();
        if(args.length == 0) { // No Command line Arguments? Pick 2 random
                               // words.
            Random r = new Random(System.currentTimeMillis());
            String[] words = dict.toArray(new String[dict.size()]);
            int x = r.nextInt(words.length), y = r.nextInt(words.length);
            while(x == y) //same word? that's not fun...
                y = r.nextInt(words.length);
            printPath(words[x], words[y]);
        }
        else { // Arguments provided, search for path pairwise
            for(int i = 0; i < args.length; i += 2) {
                if(i + 1 < args.length)
                    printPath(args[i], args[i + 1]);
            }
        }
    }
}

— st0le
źródło

Użyłem Memoization, aby uzyskać szybsze wyniki. Ścieżka słownika jest zakodowana na stałe.

— st0le

@Joey, kiedyś tak było, ale już nie. Teraz ma pole statyczne, które zwiększa za każdym razem i powiększa System.nanoTime().

— Peter Taylor

@Joey, aah, ok, ale zostawię to na razie, nie chcę zwiększać moich wersji: P

— st0le

och, btw, jestem w pracy i te scrabble strony są najwyraźniej zablokowane, więc nie mam dostępu do słowników ... wygeneruje te 10 unikalnych słów najlepiej jutro rano. Twoje zdrowie!

— st0le

Możesz zmniejszyć złożoność (obliczeniową), wykonując dwukierunkowe bfs, tj. Szukaj z obu stron i zatrzymaj się, gdy napotkasz węzeł odwiedzany z drugiej strony.

— Nabb

c na Uniksie

Korzystanie z algorytmu dijkstra.

Duża część kodu to kostiumowa implementacja drzewa n-ary, która służy do przechowywania

Lista słów (minimalizując w ten sposób liczbę odczytów pliku wejściowego (dwa razy dla braku argumentów, raz dla innych przypadków) przy założeniu, że IO pliku jest wolne
Częściowe drzewa, kiedy je budujemy.
Ostatnia ścieżka.

Ktoś ciekaw jak to działa prawdopodobnie powinien czytać findPath, processoraz processOne(i związane z nimi komentarze). A może buildPathi buildPartialPath. Reszta to księgowość i rusztowania. Pozostało kilka procedur używanych podczas testowania i rozwoju, ale nie w wersji „produkcyjnej”.

Używam /usr/share/dict/wordsna moim Mac OS 10.5 skrzynki, która ma tak wiele długich, ezoteryczne wpisy pozwalając działać całkowicie losowo generowanych jest wiele z OYs.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <getline.h>
#include <time.h>
#include <unistd.h>
#include <ctype.h>

const char*wordfile="/usr/share/dict/words";
/* const char*wordfile="./testwords.txt"; */
const long double RANDOM_MAX = (2LL<<31)-1;

typedef struct node_t {
  char*word;
  struct node_t*kids;
  struct node_t*next;
} node;


/* Return a pointer to a newly allocated node. If word is non-NULL, 
 * call setWordNode;
 */
node*newNode(char*word){
  node*n=malloc(sizeof(node));
  n->word=NULL;
  n->kids=NULL;
  n->next=NULL;
  if (word) n->word = strdup(word);
  return n;
}
/* We can use the "next" links to treat these as a simple linked list,
 * and further can make it a stack or queue by
 *
 * * pop()/deQueu() from the head
 * * push() onto the head
 * * enQueue at the back
 */
void push(node*n, node**list){
  if (list==NULL){
    fprintf(stderr,"Active operation on a NULL list! Exiting\n");
    exit(5);
  }
  n->next = (*list);
  (*list) = n;
}
void enQueue(node*n, node**list){
  if (list==NULL){
    fprintf(stderr,"Active operation on a NULL list! Exiting\n");
    exit(5);
  }
  if ( *list==NULL ) {
    *list=n;
  } else {
    enQueue(n,&((*list)->next));
  }
}
node*pop(node**list){
  node*temp=NULL;
  if (list==NULL){
    fprintf(stderr,"Active operation on a NULL list! Exiting\n");
    exit(5);
  }
  temp = *list;
  if (temp != NULL) {
    (*list) = temp->next;
    temp->next=NULL;
  }
  return temp;
}
node*deQueue(node**list){ /* Alias for pop */
  return pop(list);
}

/* return a pointer to a node in tree matching word or NULL if none */
node* isInTree(char*word, node*tree){
  node*isInNext=NULL;
  node*isInKids=NULL;
  if (tree==NULL || word==NULL) return NULL;
  if (tree->word && (0 == strcasecmp(word,tree->word))) return tree;
  /* prefer to find the target at shallow levels so check the siblings
     before the kids */
  if (tree->next && (isInNext=isInTree(word,tree->next))) return isInNext;
  if (tree->kids && (isInKids=isInTree(word,tree->kids))) return isInKids;
  return NULL;
}

node* freeTree(node*t){
  if (t==NULL) return NULL;
  if (t->word) {free(t->word); t->word=NULL;}
  if (t->next) t->next=freeTree(t->next);
  if (t->kids) t->kids=freeTree(t->kids);
  free(t);
  return NULL;
}

void printTree(node*t, int indent){
  int i;
  if (t==NULL) return;
  for (i=0; i<indent; i++) printf("\t"); printf("%s\n",t->word);
  printTree(t->kids,indent+1);
  printTree(t->next,indent);
}

/* count the letters of difference between two strings */
int countDiff(const char*w1, const char*w2){
  int count=0;
  if (w1==NULL || w2==NULL) return -1;
  while ( (*w1)!='\0' && (*w2)!='\0' ) {
    if ( (*w1)!=(*w2) ) count++;
    w1++;
    w2++;
  }
  return count;
}

node*buildPartialPath(char*stop, node*tree){
  node*list=NULL;
  while ( (tree != NULL) && 
      (tree->word != NULL) && 
      (0 != strcasecmp(tree->word,stop)) ) {
    node*kid=tree->kids;
    node*newN = newNode(tree->word);
    push(newN,&list);
    newN=NULL;
    /* walk over all all kids not leading to stop */
    while ( kid && 
        (strcasecmp(kid->word,stop)!=0) &&
        !isInTree(stop,kid->kids) ) {
      kid=kid->next;
    }
    if (kid==NULL) {
      /* Assuming a preconditions where isInTree(stop,tree), we should
       * not be able to get here...
       */
      fprintf(stderr,"Unpossible!\n");
      exit(7);
    } 
    /* Here we've found a node that either *is* the target or leads to it */
    if (strcasecmp(stop,kid->word) == 0) {
      break;
    }
    tree = kid;
  }
  return list; 
}
/* build a node list path 
 *
 * We can walk down each tree, identfying nodes as we go
 */
node*buildPath(char*pivot,node*frontTree,node*backTree){
  node*front=buildPartialPath(pivot,frontTree);
  node*back=buildPartialPath(pivot,backTree);
  /* weld them together with pivot in between 
  *
  * The front list is in reverse order, the back list in order
  */
  node*thePath=NULL;
  while (front != NULL) {
    node*n=pop(&front);
    push(n,&thePath);
  }
  if (pivot != NULL) {
    node*n=newNode(pivot);
    enQueue(n,&thePath);
  }
  while (back != NULL) {
    node*n=pop(&back);
    enQueue(n,&thePath);
  }
  return thePath;
}

/* Add new child nodes to the single node in ts named by word. Also
 * queue these new word in q
 * 
 * Find node N matching word in ts
 * For tword in wordList
 *    if (tword is one change from word) AND (tword not in ts)
 *        add tword to N.kids
 *        add tword to q
 *        if tword in to
 *           return tword
 * return NULL
 */
char* processOne(char *word, node**q, node**ts, node**to, node*wordList){
  if ( word==NULL || q==NULL || ts==NULL || to==NULL || wordList==NULL ) {
    fprintf(stderr,"ProcessOne called with NULL argument! Exiting.\n");
    exit(9);
  }
  char*result=NULL;
  /* There should be a node in ts matching the leading node of q, find it */
  node*here = isInTree(word,*ts);
  /* Now test each word in the list as a possible child of HERE */
  while (wordList != NULL) {
    char *tword=wordList->word;
    if ((1==countDiff(word,tword)) && !isInTree(tword,*ts)) {
      /* Queue this up as a child AND for further processing */
      node*newN=newNode(tword);
      enQueue(newN,&(here->kids));
      newN=newNode(tword);
      enQueue(newN,q);
      /* This might be our pivot */
      if ( isInTree(tword,*to) ) {
    /* we have found a node that is in both trees */
    result=strdup(tword);
    return result;
      }
    }
    wordList=wordList->next;
  }
  return result;
}

/* Add new child nodes to ts for all the words in q */
char* process(node**q, node**ts, node**to, node*wordList){
  node*tq=NULL;
  char*pivot=NULL;
  if ( q==NULL || ts==NULL || to==NULL || wordList==NULL ) {
    fprintf(stderr,"Process called with NULL argument! Exiting.\n");
    exit(9);
  }
  while (*q && (pivot=processOne((*q)->word,&tq,ts,to,wordList))==NULL) {
    freeTree(deQueue(q));
  }
  freeTree(*q); 
  *q=tq;
  return pivot;
}

/* Find a path between w1 and w2 using wordList by dijkstra's
 * algorithm
 *
 * Use a breadth-first extensions of the trees alternating between
 * trees.
 */
node* findPath(char*w1, char*w2, node*wordList){
  node*thePath=NULL; /* our resulting path */
  char*pivot=NULL; /* The node we find that matches */
  /* trees of existing nodes */
  node*t1=newNode(w1); 
  node*t2=newNode(w2);
  /* queues of nodes to work on */
  node*q1=newNode(w1);
  node*q2=newNode(w2);

  /* work each queue all the way through alternating until a word is
     found in both lists */
  while( (q1!=NULL) && ((pivot = process(&q1,&t1,&t2,wordList)) == NULL) &&
     (q2!=NULL) && ((pivot = process(&q2,&t2,&t1,wordList)) == NULL) )
    /* no loop body */ ;


  /* one way or another we are done with the queues here */
  q1=freeTree(q1);
  q2=freeTree(q2);
  /* now construct the path */
  if (pivot!=NULL) thePath=buildPath(pivot,t1,t2);
  /* clean up after ourselves */
  t1=freeTree(t1);
  t2=freeTree(t2);

  return thePath;
}

/* Convert a non-const string to UPPERCASE in place */
void upcase(char *s){
  while (s && *s) {
    *s = toupper(*s);
    s++;
  }
}

/* Walks the input file stuffing lines of the given length into a list */
node*getListWithLength(const char*fname, int len){
  int l=-1;
  size_t n=0;
  node*list=NULL;
  char *line=NULL;
  /* open the word file */
  FILE*f = fopen(fname,"r");
  if (NULL==f){
    fprintf(stderr,"Could not open word file '%s'. Exiting.\n",fname);
    exit(3);
  }
  /* walk the file, trying each word in turn */
  while ( !feof(f) && ((l = getline(&line,&n,f)) != -1) ) {
    /* strip trailing whitespace */
    char*temp=line;
    strsep(&temp," \t\n");
    if (strlen(line) == len) {
      node*newN = newNode(line);
      upcase(newN->word);
      push(newN,&list);
    }
  }
  fclose(f);
  return list;
}

/* Assumes that filename points to a file containing exactly one
 * word per line with no other whitespace.
 * It will return a randomly selected word from filename.
 *
 * If veto is non-NULL, only non-matching words of the same length
 * wll be considered.
 */
char*getRandomWordFile(const char*fname, const char*veto){
  int l=-1, count=1;
  size_t n=0;
  char *word=NULL;
  char *line=NULL;
  /* open the word file */
  FILE*f = fopen(fname,"r");
  if (NULL==f){
    fprintf(stderr,"Could not open word file '%s'. Exiting.\n",fname);
    exit(3);
  }
  /* walk the file, trying each word in turn */
  while ( !feof(f) && ((l = getline(&line,&n,f)) != -1) ) {
    /* strip trailing whitespace */
    char*temp=line;
    strsep(&temp," \t\n");
    if (strlen(line) < 2) continue; /* Single letters are too easy! */
    if ( (veto==NULL) || /* no veto means chose from all */ 
     ( 
      ( strlen(line) == strlen(veto) )  && /* veto means match length */
      ( 0 != strcasecmp(veto,line) )       /* but don't match word */ 
       ) ) { 
      /* This word is worthy of consideration. Select it with random
         chance (1/count) then increment count */
      if ( (word==NULL) || (random() < RANDOM_MAX/count) ) {
    if (word) free(word);
    word=strdup(line);
      }
      count++;
    }
  }
  fclose(f);
  upcase(word);
  return word;
}

void usage(int argc, char**argv){
  fprintf(stderr,"%s [ <startWord> [ <endWord> ]]:\n\n",argv[0]);
  fprintf(stderr,
      "\tFind the shortest transformation from one word to another\n");
  fprintf(stderr,
      "\tchanging only one letter at a time and always maintaining a\n");
  fprintf(stderr,
      "\tword that exists in the word file.\n\n");
  fprintf(stderr,
      "\tIf startWord is not passed, chose at random from '%s'\n",
      wordfile);
  fprintf(stderr,
      "\tIf endWord is not passed, chose at random from '%s'\n",
      wordfile);
  fprintf(stderr,
      "\tconsistent with the length of startWord\n");
  exit(2);
}

int main(int argc, char**argv){
  char *startWord=NULL;
  char *endWord=NULL;

  /* intialize OS services */
  srandom(time(0)+getpid());
  /* process command line */
  switch (argc) {
  case 3:
    endWord = strdup(argv[2]);
    upcase(endWord);
  case 2:
    startWord = strdup(argv[1]);
    upcase(startWord);
  case 1:
    if (NULL==startWord) startWord = getRandomWordFile(wordfile,NULL);
    if (NULL==endWord)   endWord   = getRandomWordFile(wordfile,startWord);
    break;
  default:
    usage(argc,argv);
    break;
  }
  /* need to check this in case the user screwed up */
  if ( !startWord || ! endWord || strlen(startWord) != strlen(endWord) ) {
    fprintf(stderr,"Words '%s' and '%s' are not the same length! Exiting\n",
        startWord,endWord);
    exit(1);
  }
  /* Get a list of all the words having the right length */
  node*wordList=getListWithLength(wordfile,strlen(startWord));
  /* Launch into the path finder*/
  node *theList=findPath(startWord,endWord,wordList);
  /* Print the resulting path */
  if (theList) {
    int count=-2;
    while (theList) {
      printf("%s\n",theList->word);
      theList=theList->next;
      count++;
    }
    printf("%d\n",count);
  } else {
    /* No path found case */
    printf("%s %s OY\n",startWord,endWord);
  }
  return 0;
}

Niektóre dane wyjściowe:

$ ./changeword dive name
DIVE
DIME
DAME
NAME
2
$ ./changeword house gorge
HOUSE
HORSE
GORSE
GORGE
2
$ ./changeword stop read
STOP
STEP
SEEP
SEED
REED
READ
4
$ ./changeword peace slate
PEACE
PLACE
PLATE
SLATE
2
$ ./changeword pole fast  
POLE
POSE
POST
PAST
FAST
3
$ ./changeword          
QUINTIPED LINEARITY OY
$ ./changeword sneaky   
SNEAKY WAXILY OY
$ ./changeword TRICKY
TRICKY
PRICKY
PRINKY
PRANKY
TRANKY
TWANKY
SWANKY
SWANNY
SHANNY
SHANTY
SCANTY
SCATTY
SCOTTY
SPOTTY
SPOUTY
STOUTY
STOUTH
STOUSH
SLOUSH
SLOOSH
SWOOSH
19
$ ./changeword router outlet
ROUTER
ROTTER
RUTTER
RUTHER
OUTHER
OUTLER
OUTLET
5
$ ./changeword 
IDIOM
IDISM
IDIST
ODIST
OVIST
OVEST
OVERT
AVERT
APERT
APART
SPART
SPARY
SEARY
DEARY
DECRY
DECAY
DECAN
DEDAN
SEDAN
17

Analiza złożoności jest nietrywialna. Poszukiwanie jest dwustronnym, iteracyjnym pogłębianiem.

Dla każdego badanego węzła chodzę po całej liście słów (choć ograniczonej do słów o odpowiedniej długości). Zadzwoń na długość listy W.
Minimalna liczba kroków polega na tym, S_min = (<number of different letter>-1)że jeśli dzieli nas tylko jedna litera, zmianę oceniamy na 0 kroków pośrednich. Maksymalny poziom jest trudny do oszacowania, patrz powyżej TRICKY - SWOOSH. Każda połowa drzewa będą S/2-1doS/2
Nie przeprowadziłem analizy zachowania rozgałęzień drzewa, ale nazywam to B.

Tak więc minimalna liczba operacji jest w pobliżu 2 * (S/2)^B * W, niezbyt dobra.

— dmckee
źródło

Może to na mnie naiwne, ale nie widzę nic w twoim projekcie lub implementacji, które wymagałoby obciążeń krawędzi. Chociaż narzędzie Dijkstry rzeczywiście działa w przypadku nieważonych wykresów (waga krawędzi jest niezmiennie „1”), czy nie zastosuje się tutaj prostego wyszukiwania w celu zwiększenia swoich granic O(|V|+|E|)zamiast O(|E|+|V| log |V|)?

— MrGomez