I am confuse now why I am not able to parse this JSON string. Similar code works fine on other JSON string but not on this one – I am trying to parse JSON String and extract script from the JSON.

Below is my code.

for step in steps:
    step_path="/example/v1" +"https://stackoverflow.com/"+step

    data, stat = zk.get(step_path)
    jsonStr = data.decode("utf-8")
    j = json.loads(json.dumps(jsonStr))
    shell_script = j['script']

So the first print(jsonStr) will print out something like this –

{"script":"#!/bin/bash\necho Hello world1\n"}

And the second print(j) will print out something like this –

{"script":"#!/bin/bash\necho Hello world1\n"}

And then the third print doesn’t gets printed out and it gives this error –

Traceback (most recent call last):
  File "test5.py", line 33, in <module>
    shell_script = j['script']
TypeError: string indices must be integers

So I am wondering what wrong I am doing here?

I have used same above code to parse the JSON and it works fine..

The problem is that jsonStr is a string that encodes some object in JSON, not the actual object.

You obviously knew it was a string, because you called it jsonStr. And it’s proven by the fact that this line works:

jsonStr = data.decode("utf-8")

So, jsonStr is a string. Calling json.dumps on a string is perfectly legal. It doesn’t matter whether that string was the JSON encoding of some object, or your last name; you can encode that string in JSON. And then you can decode that string, getting back the original string.

So, this:

j = json.loads(json.dumps(jsonStr))

โ€ฆ is going to give you back the exact same string as jsonStr in j. Which you still haven’t decoded to the original object.

To do that, just don’t do the extra encode:

j = json.loads(jsonStr)

If that isn’t clear, try playing with it an interactive terminal:

>>> obj = ['abc', {'a': 1, 'b': 2}]
>>> type(obj)
>>> obj[1]['b']
>>> j = json.dumps(obj)
>>> type(j)
>>> j[1]['b']
TypeError: string indices must be integers
>>> jj = json.dumps(j)
>>> type(jj)
>>> j
'["abc", {"a": 1, "b": 2}]'
>>> jj
'"[\\"abc\\", {\\"a\\": 1, \\"b\\": 2}]"'
>>> json.loads(j)
['abc', {'a': 1, 'b': 2}]
>>> json.loads(j) == obj
>>> json.loads(jj)
'["abc", {"a": 1, "b": 2}]'
>>> json.loads(jj) == j
>>> json.loads(jj) == obj

Try replacing j = json.loads(json.dumps(jsonStr)) with j = json.loads(jsonStr).

Ok… So for people who are still lost because they are used to JS this is what I understood after having tested multiple use cases :

  • json.dumps does not make your string ready to be loaded with json.loads. It will only encode it to JSON specs (by adding escapes pretty much everywhere) !

  • json.loads will transform a correctly formatted JSON string to a python dictionary. It will only work if the JSON follows the JSON specs (no single quotes, uppercase for boolean’s first letter, etc).

Dumping JSON – An encoding story

Lets take an example !

$ obj = {"foobar": True}

This is NOT json ! This is a python dictionary that uses python types (like booleans).

True is not compatible with the JSON specs so in order to send this to an API you would have to serialize it to REAL JSON. That’s where json.dumps comes in !

$ json.dumps({"foobar": True})
'{"foobar": true}'

See ? True became true which is real JSON. You now have a string that you can send to the real world. Good job.

Loading JSON – A decoding story

So now lets talk about json.loads.

You have a string that looks like json but its only a string and what you want is a python dictionary. Lets walk through the following examples :

$ string = '{"foobar": true}'
$ dict = json.loads(string)
{'foobar': True}

Here we have a string that looks like JSON. You can use json.loads to transform this string in dictionary and do dict["foobar"] which will return True.

So, why so many errors ?

Well, if your JSON looks like JSON but is not really JSON compatible (spec wise), for instance :

$ string = "{'foobar': true}"
$ json.loads(string)
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes

BAM ! This is not working because JSON specs wont allow you to have single quotes but only double ones…
If you reverse the quotes to '{"foobar": true}' then it will work.

What you probably have tried is :

string = json.loads(json.dumps("{'foobar': true}"))

This JSON is invalid (check the quotes) and moreover you’ll get a string as a results. Disapointed ? I know…

  • json.dumps WILL fix you JSON string but will also encode it. The encoding will render json.loads useless even if the JSON is now good to go.

You have to understand that json.dumps encodes and json.loads decodes !

So what you did here is encode a string and then decode the string. But its still a string ! you haven’t done anything to change that fact ! If you want to get it from string to dictionary then you need an extra step… => A second json.loads !

Lets try that with a valid JSON (no mean single quotes)

$ obj = json.loads(json.loads(json.dumps('{"foobar": true}')))
$ obj["foobar"]

The json string went through json.dumps and got encoded. Then it when through json.loads where it got decoded (useless…YEAY). Finaly, it went through json.loads AGAIN and got transformed from string to dictionary. As you can see, using json.dumps only adds a useless step in that situation.

One last thing. If we do the same thing again but with a bad JSON:

$ string = json.loads(json.loads(json.dumps("{'foobar': true}")))
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes

Quotes are wrong here (ain’t you getting used to this by now ?).
What happend here is that json.dumps fixed your JSON. json.loads removed the fix (lol) and finaly json.loads got the bad JSON which did not change as the first 2 steps canceled each other.


In conclusion :
Fix you JSON yourself ! Don’t give to json.loads wrongly formated JSON and don’t try to mix json.loads with json.dumps to fix what only you can fix.
Hope this helped someone ๐Ÿ˜‰

Disclaimer. I’m no python expert.
Feel free to challenge this answer in the comment section.