I have the following string:
word = u'Buffalo,xa0ILxa060625'
I don’t want the “xa0” in there. How can I get rid of it? The string I want is:
word = 'Buffalo, IL 06025
If you know for sure that is the only character you don’t want, you can
u'xa0', ' ') u'Buffalo, IL 60625'word.replace(
If you need to handle all non-ascii characters, encoding and replacing bad characters might be a good start…:
'ascii', 'replace') 'Buffalo,?IL?60625'word.encode(
xa as you stated) is a NO-BREAK SPACE, and the closest ASCII equivalent would of course be a regular space.
import unidecode word = unidecode.unidecode(word)
There is no
xa there. If you try to put that into a string literal, you’re going to get a syntax error if you’re lucky, or it’s going to swallow up the next attempted character if you’re not, because
x sequences aways have to be followed by two hexadecimal digits.
What you have is
xa0, which is an escape sequence for the character U+00A0, aka “NO-BREAK SPACE”.
I think you want to replace them with spaces, but whatever you want to do is pretty easy to write:
word.replace(u'xa0', u' ') # replaced with space word.replace(u'xa0', u'0') # closest to what you were literally asking for word.replace(u'xa0', u'') # removed completely
You can easily use
unicodedata to get rid of all of
from unicodedata import normalize normalize('NFKD', word) 'Buffalo, IL 60625'
This seems to work for getting rid of non-ascii characters:
fixedword = word.encode('ascii','ignore')