Each Answer to this Q is separated by one/two green lines.
I try to extract all files from .zip containing subfolders in one folder. I want all the files from subfolders extract in only one folder without keeping the original structure. At the moment, I extract all, move the files to a folder, then remove previous subfolders. The files with same names are overwrited.
Is it possible to do it before writing files?
Here is a structure for example:
my_zip/file1.txt my_zip/dir1/file2.txt my_zip/dir1/dir2/file3.txt my_zip/dir3/file4.txt
At the end I whish this:
my_dir/file1.txt my_dir/file2.txt my_dir/file3.txt my_dir/file4.txt
What can I add to this code ?
import zipfile my_dir = "D:\\Download\\" my_zip = "D:\\Download\\my_file.zip" zip_file = zipfile.ZipFile(my_zip, 'r') for files in zip_file.namelist(): zip_file.extract(files, my_dir) zip_file.close()
if I rename files path from zip_file.namelist(), I have this error:
KeyError: "There is no item named 'file2.txt' in the archive"
This opens file handles of members of the zip archive, extracts the filename and copies it to a target file (that’s how
ZipFile.extract works, without taking care of subdirectories).
import os import shutil import zipfile my_dir = r"D:\Download" my_zip = r"D:\Download\my_file.zip" with zipfile.ZipFile(my_zip) as zip_file: for member in zip_file.namelist(): filename = os.path.basename(member) # skip directories if not filename: continue # copy file (taken from zipfile's extract) source = zip_file.open(member) target = open(os.path.join(my_dir, filename), "wb") with source, target: shutil.copyfileobj(source, target)
It is possible to iterate over the
ZipFile.infolist(). On the returned
ZipInfo objects you can then manipulate the
filename to remove the directory part and finally extract it to a specified directory.
import glob import zipfile import shutil import os my_dir = "D:\\Download\\" my_zip = "D:\\Download\\my_file.zip" with zipfile.ZipFile(my_zip) as zip: for zip_info in zip.infolist(): if zip_info.filename[-1] == "https://stackoverflow.com/": continue zip_info.filename = os.path.basename(zip_info.filename) zip.extract(zip_info, my_dir)
Just extract to bytes in memory,compute the filename, and write it there yourself,
instead of letting the library do it – -mostly, just use the “read()” instead of “extract()” method:
Python 3.6+ update(2020) – the same code from the original answer, but using
pathlib.Path, which ease file-path manipulation and other operations (like “write_bytes”)
from pathlib import Path import zipfile import os my_dir = Path("D:\\Download\\") my_zip = my_dir / "my_file.zip" zip_file = zipfile.ZipFile(my_zip, 'r') for files in zip_file.namelist(): data = zip_file.read(files, my_dir) myfile_path = my_dir / Path(files.filename).name myfile_path.write_bytes(data) zip_file.close()
Original code in answer without pathlib:
import zipfile import os my_dir = "D:\\Download\\" my_zip = "D:\\Download\\my_file.zip" zip_file = zipfile.ZipFile(my_zip, 'r') for files in zip_file.namelist(): data = zip_file.read(files, my_dir) # I am almost shure zip represents directory separator # char as "/" regardless of OS, but I don't have DOS or Windos here to test it myfile_path = os.path.join(my_dir, files.split("/")[-1]) myfile = open(myfile_path, "wb") myfile.write(data) myfile.close() zip_file.close()
A similar concept to the solution of Gerhard Götz, but adapted for extracting single files instead of the entire zip:
with ZipFile(zipPath, 'r') as zipObj: zipInfo = zipObj.getinfo(path_in_zip)) zipInfo.filename = os.path.basename(destination) zipObj.extract(zipInfo, os.path.dirname(os.path.realpath(destination)))
In case you are getting badZipFile error. you can unzip the archive using 7zip sub process. assuming you have installed the 7zip then use the following code.
import subprocess my_dir = destFolder #destination folder my_zip = destFolder + "https://stackoverflow.com/" + filename.zip #file you want to extract ziploc = "C:/Program Files/7-Zip/7z.exe" #location where 7zip is installed cmd = [ziploc, 'e',my_zip ,'-o'+ my_dir ,'*.txt' ,'-r' ] #extracting only txt files and from all subdirectories sp = subprocess.Popen(cmd, stderr=subprocess.STDOUT, stdout=subprocess.PIPE)