I’ve pulled data from Twitter. Currently, the data is in multiple files and I could not merge it into one single file.
Note: all files are in JSON format.
I write this code as I have seen in some tutorials about merge JSON by using Python
from glob import glob import json import pandas as pd with open('Desktop/json/finalmerge.json', 'w') as f: for fname in glob('Desktop/json/*.json'): # Reads all json from the current directory with open(fname) as j: f.write(str(j.read())) f.write('n')
I successfully merge all files and now the file is finalmerge.json.
Now I used this as suggested in several threads:
df_lines = pd.read_json('finalmerge.json', lines=True) df_lines 1000000*23 columns Then, what I should do to make each feature in separate columns? I'm not sure why what's wrong with JSON files, I checked the file that I merge and I found it's not valid as JSON file? what I should do to make this as a data frame? The reason I am asking this is that I have very basic python knowledge and all the answers to similar questions that I have found are way more complicated than I can understand. Please help this new python user to convert multiple Json fils to one JSON file. Thank you
I think that the problem is that your files are not really json (or better, they are structured as jsonl ). You have two ways of proceding:
- you could read every file as a text file and merge them line by line
- you could convert them to json (add a square bracket at the beginning of the file and a comma at the end of every json element).
Try following this question and let me know if it solves your problem: Loading JSONL file as JSON objects
You can also try to edit your code this way:
with open('finalmerge.json', 'w') as f: for fname in glob('Desktop/json/*.json'): with open(fname) as j: f.write(str(j.read())) f.write('n')
Every line will be a different json element.