Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How we can compress large Python files?
Working with large files in Python can be challenging due to storage space and processing requirements. Python's zipfile module provides an effective solution for compressing files, reducing their size significantly while maintaining data integrity.
Why Compress Files?
File compression offers several benefits:
- Storage Efficiency − Reduces disk space usage
- Transfer Speed − Faster file uploads and downloads
- Memory Management − Lower memory consumption when handling files
- Archiving − Better organization of multiple files
Basic Syntax
The zipfile module provides a simple interface for creating compressed archives:
import zipfile # Basic syntax zipfile.ZipFile(filename, mode, compression)
Where:
- filename: Name of the zip file to create
- mode: 'w' for write, 'r' for read, 'a' for append
- compression: Compression method (ZIP_DEFLATED, ZIP_STORED)
Creating and Compressing a Text File
Let's create a sample text file and then compress it using the zipfile module ?
# Create a sample text file
with open("sample.txt", "w") as f:
for i in range(10):
f.write(f"This is line {i + 1}\n")
# Read and display the file content
with open("sample.txt", "r") as f:
content = f.read()
print("Original file content:")
print(content)
Original file content: This is line 1 This is line 2 This is line 3 This is line 4 This is line 5 This is line 6 This is line 7 This is line 8 This is line 9 This is line 10
Using zipfile Module for Compression
Now let's compress the text file using different compression methods ?
import zipfile
import os
# Method 1: Basic compression
with zipfile.ZipFile('compressed_file.zip', 'w') as zip_file:
zip_file.write('sample.txt', compress_type=zipfile.ZIP_DEFLATED)
# Check file sizes
original_size = os.path.getsize('sample.txt')
compressed_size = os.path.getsize('compressed_file.zip')
print(f"Original file size: {original_size} bytes")
print(f"Compressed file size: {compressed_size} bytes")
print(f"Compression ratio: {compressed_size/original_size:.2%}")
Original file size: 140 bytes Compressed file size: 167 bytes Compression ratio: 119.29%
Advanced Compression with Multiple Files
For better compression results, let's create multiple files and compress them together ?
import zipfile
import os
# Create multiple text files
file_names = []
for i in range(3):
filename = f"file_{i+1}.txt"
with open(filename, "w") as f:
for j in range(100):
f.write(f"File {i+1}: This is line {j+1} with some repeated content.\n")
file_names.append(filename)
# Compress all files
with zipfile.ZipFile('multi_files.zip', 'w', zipfile.ZIP_DEFLATED) as zip_file:
for filename in file_names:
zip_file.write(filename)
print(f"Added {filename} to archive")
# Calculate compression statistics
total_original = sum(os.path.getsize(f) for f in file_names)
compressed_size = os.path.getsize('multi_files.zip')
print(f"\nTotal original size: {total_original} bytes")
print(f"Compressed size: {compressed_size} bytes")
print(f"Space saved: {total_original - compressed_size} bytes")
print(f"Compression ratio: {compressed_size/total_original:.2%}")
Added file_1.txt to archive Added file_2.txt to archive Added file_3.txt to archive Total original size: 16500 bytes Compressed size: 1087 bytes Space saved: 15413 bytes Compression ratio: 6.59%
Extracting Compressed Files
You can also extract files from a zip archive ?
import zipfile
# Extract all files from the archive
with zipfile.ZipFile('multi_files.zip', 'r') as zip_file:
# List contents
print("Files in archive:")
for filename in zip_file.namelist():
print(f" {filename}")
# Extract all files
zip_file.extractall('extracted_files/')
print("\nFiles extracted to 'extracted_files/' directory")
Files in archive: file_1.txt file_2.txt file_3.txt Files extracted to 'extracted_files/' directory
Comparison of Compression Methods
| Method | Compression Level | Speed | Best For |
|---|---|---|---|
| ZIP_STORED | None | Fastest | Already compressed files |
| ZIP_DEFLATED | Good | Moderate | Text files, source code |
| ZIP_BZIP2 | Better | Slower | Large files, better compression |
| ZIP_LZMA | Best | Slowest | Maximum compression needed |
Conclusion
Python's zipfile module provides an efficient way to compress large files, saving storage space and improving transfer speeds. Use ZIP_DEFLATED for general purposes, and consider ZIP_BZIP2 or ZIP_LZMA for maximum compression when dealing with very large files.
