Usuń spacje / tabulatory / znaki nowej linii - Python

Question 1

Próbuję usunąć wszystkie spacje / tabulatory / znaki nowej linii w Pythonie 2.7 w systemie Linux.

Napisałem to, co powinno wystarczyć:

myString="I want to Remove all white \t spaces, new lines \n and tabs \t"
myString = myString.strip(' \n\t')
print myString

wynik:

I want to Remove all white   spaces, new lines 
 and tabs

Wydaje się, że jest to prosta rzecz, ale czegoś mi brakuje. Powinienem coś importować?

Question 2

Używaj str.split([sep[, maxsplit]])bez seplub sep=None:

Z dokumentów :

Jeśli sepnie jest określony lub jest None, stosowany jest inny algorytm podziału: ciągi następujących po sobie białych znaków są traktowane jako pojedynczy separator, a wynik nie będzie zawierał pustych łańcuchów na początku lub na końcu, jeśli łańcuch ma początkowe lub końcowe białe znaki.

Próbny:

>>> myString.split()
['I', 'want', 'to', 'Remove', 'all', 'white', 'spaces,', 'new', 'lines', 'and', 'tabs']

Użyj str.joinna zwróconej liście, aby uzyskać ten wynik:

>>> ' '.join(myString.split())
'I want to Remove all white spaces, new lines and tabs'

Question 3

Jeśli chcesz usunąć wiele białych znaków i zastąpić je pojedynczymi spacjami, najłatwiej jest użyć wyrażenia regularnego takiego:

>>> import re
>>> myString="I want to Remove all white \t spaces, new lines \n and tabs \t"
>>> re.sub('\s+',' ',myString)
'I want to Remove all white spaces, new lines and tabs '

Jeśli chcesz, możesz usunąć końcową spację za pomocą .strip().

Question 4

Skorzystaj z biblioteki re

import re
myString = "I want to Remove all white \t spaces, new lines \n and tabs \t"
myString = re.sub(r"[\n\t\s]*", "", myString)
print myString

Wynik:

Chcę usunąć wszystkie białe spacje, nowe linie i karty

Question 5

import re

mystr = "I want to Remove all white \t spaces, new lines \n and tabs \t"
print re.sub(r"\W", "", mystr)

Output : IwanttoRemoveallwhitespacesnewlinesandtabs

Question 6

Spowoduje to tylko usunięcie tabulatora, nowych linii, spacji i nic więcej.

import re
myString = "I want to Remove all white \t spaces, new lines \n and tabs \t"
output   = re.sub(r"[\n\t\s]*", "", myString)

WYNIK:

IwantoRemoveallwhiespaces, newlinesandtabs

Dobry dzień!

Question 7

Powyższe rozwiązania sugerujące użycie wyrażenia regularnego nie są idealne, ponieważ jest to tak małe zadanie, a wyrażenie regularne wymaga większego obciążenia zasobów niż uzasadnia to prostota zadania.

Oto co robię:

myString = myString.replace(' ', '').replace('\t', '').replace('\n', '')

lub gdybyś miał kilka rzeczy do usunięcia, tak że rozwiązanie z jedną linią byłoby darmowo długie:

removal_list = [' ', '\t', '\n']
for s in removal_list:
  myString = myString.replace(s, '')

Question 8

Ponieważ nie ma nic bardziej skomplikowanego, chciałem się tym podzielić, ponieważ pomogło mi.

Oto, czego pierwotnie użyłem:

import requests
import re

url = '/programming/10711116/strip-spaces-tabs-newlines-python' # noqa
headers = {'user-agent': 'my-app/0.0.1'}
r = requests.get(url, headers=headers)
print("{}".format(r.content))

Niepożądany wynik:

b'<!DOCTYPE html>\r\n\r\n\r\n    <html itemscope itemtype="http://schema.org/QAPage" class="html__responsive">\r\n\r\n    <head>\r\n\r\n        <title>string - Strip spaces/tabs/newlines - python - Stack Overflow</title>\r\n        <link

Oto, co zmieniłem na:

import requests
import re

url = '/programming/10711116/strip-spaces-tabs-newlines-python' # noqa
headers = {'user-agent': 'my-app/0.0.1'}
r = requests.get(url, headers=headers)
regex = r'\s+'
print("CNT: {}".format(re.sub(regex, " ", r.content.decode('utf-8'))))

Pożądany rezultat:

<!DOCTYPE html> <html itemscope itemtype="http://schema.org/QAPage" class="html__responsive"> <head> <title>string - Strip spaces/tabs/newlines - python - Stack Overflow</title>

Dokładny regex, o którym wspomniał @MattH, pomógł mi dopasować go do mojego kodu. Dzięki!

Uwaga: to jest python3

Question 9

A co z jedną linijką używającą rozumienia listy w złączu?

>>> foobar = "aaa bbb\t\t\tccc\nddd"
>>> print(foobar)
aaa bbb                 ccc
ddd

>>> print(''.join([c for c in foobar if c not in [' ', '\t', '\n']]))
aaabbbcccddd