Java jest odpowiednikiem kodu JavaScript komponentu encodeURIC, który generuje identyczne dane wyjściowe?

Question 1

Eksperymentowałem z różnymi fragmentami kodu Javy, próbując wymyślić coś, co zakoduje ciąg znaków zawierający cudzysłowy, spacje i „egzotyczne” znaki Unicode i wygeneruje dane wyjściowe identyczne z funkcją encodeURIComponent w JavaScript .

Mój ciąg testowy na tortury to: „A” B ± ”

Jeśli wprowadzę następującą instrukcję JavaScript w Firebug:

encodeURIComponent('"A" B ± "');

—Wtedy otrzymam:

"%22A%22%20B%20%C2%B1%20%22"

Oto mój mały testowy program Java:

import java.io.UnsupportedEncodingException;
import java.net.URLEncoder;

public class EncodingTest
{
  public static void main(String[] args) throws UnsupportedEncodingException
  {
    String s = "\"A\" B ± \"";
    System.out.println("URLEncoder.encode returns "
      + URLEncoder.encode(s, "UTF-8"));

    System.out.println("getBytes returns "
      + new String(s.getBytes("UTF-8"), "ISO-8859-1"));
  }
}

—Ten program generuje:

URLEncoder.encode zwraca% 22A% 22 + B +% C2% B1 +% 22
getBytes zwraca „A” B ± ”

Blisko, ale bez cygara! Jaki jest najlepszy sposób zakodowania łańcucha znaków UTF-8 przy użyciu języka Java, tak aby generował te same dane wyjściowe co JavaScript encodeURIComponent?

EDYCJA: Wkrótce używam Java 1.4, przenoszę się na Javę 5.

Question 2

Patrząc na różnice we wdrożeniach, widzę, że:

MDC w dniuencodeURIComponent() :

znaki dosłowne (reprezentacja wyrażenia regularnego): [-a-zA-Z0-9._*~'()!]

Dokumentacja Java 1.5.0 dotyczącaURLEncoder :

znaki dosłowne (reprezentacja wyrażenia regularnego): [-a-zA-Z0-9._*]
znak spacji " "jest konwertowany na znak plus "+".

Zasadniczo, aby uzyskać pożądany efekt, użyj, URLEncoder.encode(s, "UTF-8")a następnie wykonaj trochę przetwarzania końcowego:

zastąpić wszystkie wystąpienia "+"z"%20"
zamień wszystkie wystąpienia "%xx"reprezentujące którekolwiek z [~'()!]powrotem na ich dosłowne odpowiedniki

Question 3

To jest klasa, którą w końcu wymyśliłem:

import java.io.UnsupportedEncodingException;
import java.net.URLDecoder;
import java.net.URLEncoder;

/**
 * Utility class for JavaScript compatible UTF-8 encoding and decoding.
 * 
 * @see http://stackoverflow.com/questions/607176/java-equivalent-to-javascripts-encodeuricomponent-that-produces-identical-output
 * @author John Topley 
 */
public class EncodingUtil
{
  /**
   * Decodes the passed UTF-8 String using an algorithm that's compatible with
   * JavaScript's <code>decodeURIComponent</code> function. Returns
   * <code>null</code> if the String is <code>null</code>.
   *
   * @param s The UTF-8 encoded String to be decoded
   * @return the decoded String
   */
  public static String decodeURIComponent(String s)
  {
    if (s == null)
    {
      return null;
    }

    String result = null;

    try
    {
      result = URLDecoder.decode(s, "UTF-8");
    }

    // This exception should never occur.
    catch (UnsupportedEncodingException e)
    {
      result = s;  
    }

    return result;
  }

  /**
   * Encodes the passed String as UTF-8 using an algorithm that's compatible
   * with JavaScript's <code>encodeURIComponent</code> function. Returns
   * <code>null</code> if the String is <code>null</code>.
   * 
   * @param s The String to be encoded
   * @return the encoded String
   */
  public static String encodeURIComponent(String s)
  {
    String result = null;

    try
    {
      result = URLEncoder.encode(s, "UTF-8")
                         .replaceAll("\\+", "%20")
                         .replaceAll("\\%21", "!")
                         .replaceAll("\\%27", "'")
                         .replaceAll("\\%28", "(")
                         .replaceAll("\\%29", ")")
                         .replaceAll("\\%7E", "~");
    }

    // This exception should never occur.
    catch (UnsupportedEncodingException e)
    {
      result = s;
    }

    return result;
  }  

  /**
   * Private constructor to prevent this class from being instantiated.
   */
  private EncodingUtil()
  {
    super();
  }
}

Question 4

Korzystając z silnika javascript dostarczanego z Javą 6:


import javax.script.ScriptEngine;
import javax.script.ScriptEngineManager;

public class Wow
{
    public static void main(String[] args) throws Exception
    {
        ScriptEngineManager factory = new ScriptEngineManager();
        ScriptEngine engine = factory.getEngineByName("JavaScript");
        engine.eval("print(encodeURIComponent('\"A\" B ± \"'))");
    }
}

Wyjście:% 22A% 22% 20B% 20% c2% b1% 20% 22

Sprawa jest inna, ale jest bliżej tego, czego chcesz.

Question 5

Używam java.net.URI#getRawPath()np

String s = "a+b c.html";
String fixed = new URI(null, null, s, null).getRawPath();

Wartość fixedwoli a+b%20c.htmljest tym, czego chcesz.

Przetwarzanie końcowe danych wyjściowych URLEncoder.encode()spowoduje zatarcie wszelkich plusów, które powinny znajdować się w identyfikatorze URI. Na przykład

URLEncoder.encode("a+b c.html").replaceAll("\\+", "%20");

da ci a%20b%20c.html, co zostanie zinterpretowane jako a b c.html.

Question 6

Wymyśliłem własną wersję encodeURIComponent, ponieważ przesłane rozwiązanie ma jeden problem, jeśli w ciągu znaków byłby znak +, który powinien być zakodowany, zostanie przekonwertowany na spację.

Oto moja klasa:

import java.io.UnsupportedEncodingException;
import java.util.BitSet;

public final class EscapeUtils
{
    /** used for the encodeURIComponent function */
    private static final BitSet dontNeedEncoding;

    static
    {
        dontNeedEncoding = new BitSet(256);

        // a-z
        for (int i = 97; i <= 122; ++i)
        {
            dontNeedEncoding.set(i);
        }
        // A-Z
        for (int i = 65; i <= 90; ++i)
        {
            dontNeedEncoding.set(i);
        }
        // 0-9
        for (int i = 48; i <= 57; ++i)
        {
            dontNeedEncoding.set(i);
        }

        // '()*
        for (int i = 39; i <= 42; ++i)
        {
            dontNeedEncoding.set(i);
        }
        dontNeedEncoding.set(33); // !
        dontNeedEncoding.set(45); // -
        dontNeedEncoding.set(46); // .
        dontNeedEncoding.set(95); // _
        dontNeedEncoding.set(126); // ~
    }

    /**
     * A Utility class should not be instantiated.
     */
    private EscapeUtils()
    {

    }

    /**
     * Escapes all characters except the following: alphabetic, decimal digits, - _ . ! ~ * ' ( )
     * 
     * @param input
     *            A component of a URI
     * @return the escaped URI component
     */
    public static String encodeURIComponent(String input)
    {
        if (input == null)
        {
            return input;
        }

        StringBuilder filtered = new StringBuilder(input.length());
        char c;
        for (int i = 0; i < input.length(); ++i)
        {
            c = input.charAt(i);
            if (dontNeedEncoding.get(c))
            {
                filtered.append(c);
            }
            else
            {
                final byte[] b = charToBytesUTF(c);

                for (int j = 0; j < b.length; ++j)
                {
                    filtered.append('%');
                    filtered.append("0123456789ABCDEF".charAt(b[j] >> 4 & 0xF));
                    filtered.append("0123456789ABCDEF".charAt(b[j] & 0xF));
                }
            }
        }
        return filtered.toString();
    }

    private static byte[] charToBytesUTF(char c)
    {
        try
        {
            return new String(new char[] { c }).getBytes("UTF-8");
        }
        catch (UnsupportedEncodingException e)
        {
            return new byte[] { (byte) c };
        }
    }
}

Question 7

Wymyśliłem inną implementację udokumentowaną pod adresem http://blog.sangupta.com/2010/05/encodeuricomponent-and.html . Implementacja może również obsługiwać bajty Unicode.

Question 8

Z powodzeniem korzystałem z klasy java.net.URI w następujący sposób:

public static String uriEncode(String string) {
    String result = string;
    if (null != string) {
        try {
            String scheme = null;
            String ssp = string;
            int es = string.indexOf(':');
            if (es > 0) {
                scheme = string.substring(0, es);
                ssp = string.substring(es + 1);
            }
            result = (new URI(scheme, ssp, null)).toString();
        } catch (URISyntaxException usex) {
            // ignore and use string that has syntax error
        }
    }
    return result;
}

Question 9

Oto prosty przykład rozwiązania Ravi Wallau:

public String buildSafeURL(String partialURL, String documentName)
        throws ScriptException {
    ScriptEngineManager scriptEngineManager = new ScriptEngineManager();
    ScriptEngine scriptEngine = scriptEngineManager
            .getEngineByName("JavaScript");

    String urlSafeDocumentName = String.valueOf(scriptEngine
            .eval("encodeURIComponent('" + documentName + "')"));
    String safeURL = partialURL + urlSafeDocumentName;

    return safeURL;
}

public static void main(String[] args) {
    EncodeURIComponentDemo demo = new EncodeURIComponentDemo();
    String partialURL = "https://www.website.com/document/";
    String documentName = "Tom & Jerry Manuscript.pdf";

    try {
        System.out.println(demo.buildSafeURL(partialURL, documentName));
    } catch (ScriptException se) {
        se.printStackTrace();
    }
}

Wynik: https://www.website.com/document/Tom%20%26%20Jerry%20Manuscript.pdf

Odpowiada również na wiszące pytanie w komentarzach Lorena Shqipognja, jak przekazać zmienną typu String do encodeURIComponent(). Metoda scriptEngine.eval()zwraca an Object, więc można ją przekonwertować na String String.valueOf()między innymi za pomocą innych metod.

Question 10

dla mnie to zadziałało:

import org.apache.http.client.utils.URIBuilder;

String encodedString = new URIBuilder()
  .setParameter("i", stringToEncode)
  .build()
  .getRawQuery() // output: i=encodedString
  .substring(2);

lub z innym UriBuilder

import javax.ws.rs.core.UriBuilder;

String encodedString = UriBuilder.fromPath("")
  .queryParam("i", stringToEncode)
  .toString()   // output: ?i=encodedString
  .substring(3);

Moim zdaniem użycie standardowej biblioteki jest lepszym pomysłem niż ręczne przetwarzanie końcowe. Również odpowiedź @Chris wyglądała dobrze, ale nie działa w przypadku adresów URL, takich jak „ http: // a + b c.html”

Question 11

Oto, czego używam:

private static final String HEX = "0123456789ABCDEF";

public static String encodeURIComponent(String str) {
    if (str == null) return null;

    byte[] bytes = str.getBytes(StandardCharsets.UTF_8);
    StringBuilder builder = new StringBuilder(bytes.length);

    for (byte c : bytes) {
        if (c >= 'a' ? c <= 'z' || c == '~' :
            c >= 'A' ? c <= 'Z' || c == '_' :
            c >= '0' ? c <= '9' :  c == '-' || c == '.')
            builder.append((char)c);
        else
            builder.append('%')
                   .append(HEX.charAt(c >> 4 & 0xf))
                   .append(HEX.charAt(c & 0xf));
    }

    return builder.toString();
}

Wykracza poza JavaScript przez kodowanie procentowe każdego znaku, który nie jest niezarezerwowanym znakiem zgodnie z RFC 3986 .

To jest odwrotna konwersja:

public static String decodeURIComponent(String str) {
    if (str == null) return null;

    int length = str.length();
    byte[] bytes = new byte[length / 3];
    StringBuilder builder = new StringBuilder(length);

    for (int i = 0; i < length; ) {
        char c = str.charAt(i);
        if (c != '%') {
            builder.append(c);
            i += 1;
        } else {
            int j = 0;
            do {
                char h = str.charAt(i + 1);
                char l = str.charAt(i + 2);
                i += 3;

                h -= '0';
                if (h >= 10) {
                    h |= ' ';
                    h -= 'a' - '0';
                    if (h >= 6) throw new IllegalArgumentException();
                    h += 10;
                }

                l -= '0';
                if (l >= 10) {
                    l |= ' ';
                    l -= 'a' - '0';
                    if (l >= 6) throw new IllegalArgumentException();
                    l += 10;
                }

                bytes[j++] = (byte)(h << 4 | l);
                if (i >= length) break;
                c = str.charAt(i);
            } while (c == '%');
            builder.append(new String(bytes, 0, j, UTF_8));
        }
    }

    return builder.toString();
}

Question 12

Znalazłem klasę PercentEscaper z biblioteki google-http-java-client, którą można dość łatwo wykorzystać do implementacji encodeURIComponent.

PercentEscaper z google-http-java-client javadoc google-http-java-client - strona główna

Question 13

Biblioteka Guava ma PercentEscaper:

Escaper percentEscaper = new PercentEscaper("-_.*", false);

„-_. *” to bezpieczne znaki

false mówi PercentEscaper, aby uciec z przestrzeni za pomocą „% 20”, a nie „+”

Question 14

Kiedyś String encodedUrl = new URI(null, url, null).toASCIIString(); kodowałem adresy URL. Aby dodać parametry po istniejących w urlużywamUriComponentsBuilder