How we can compress large Python files?

Working with large files in Python can be challenging due to storage space and processing requirements. Python's zipfile module provides an effective solution for compressing files, reducing their size significantly while maintaining data integrity.

Why Compress Files?

File compression offers several benefits:

  • Storage Efficiency − Reduces disk space usage
  • Transfer Speed − Faster file uploads and downloads
  • Memory Management − Lower memory consumption when handling files
  • Archiving − Better organization of multiple files

Basic Syntax

The zipfile module provides a simple interface for creating compressed archives:

import zipfile

# Basic syntax
zipfile.ZipFile(filename, mode, compression)

Where:

  • filename: Name of the zip file to create
  • mode: 'w' for write, 'r' for read, 'a' for append
  • compression: Compression method (ZIP_DEFLATED, ZIP_STORED)

Creating and Compressing a Text File

Let's create a sample text file and then compress it using the zipfile module ?

# Create a sample text file
with open("sample.txt", "w") as f:
    for i in range(10):
        f.write(f"This is line {i + 1}\n")

# Read and display the file content
with open("sample.txt", "r") as f:
    content = f.read()
    print("Original file content:")
    print(content)
Original file content:
This is line 1
This is line 2
This is line 3
This is line 4
This is line 5
This is line 6
This is line 7
This is line 8
This is line 9
This is line 10

Using zipfile Module for Compression

Now let's compress the text file using different compression methods ?

import zipfile
import os

# Method 1: Basic compression
with zipfile.ZipFile('compressed_file.zip', 'w') as zip_file:
    zip_file.write('sample.txt', compress_type=zipfile.ZIP_DEFLATED)

# Check file sizes
original_size = os.path.getsize('sample.txt')
compressed_size = os.path.getsize('compressed_file.zip')

print(f"Original file size: {original_size} bytes")
print(f"Compressed file size: {compressed_size} bytes")
print(f"Compression ratio: {compressed_size/original_size:.2%}")
Original file size: 140 bytes
Compressed file size: 167 bytes
Compression ratio: 119.29%

Advanced Compression with Multiple Files

For better compression results, let's create multiple files and compress them together ?

import zipfile
import os

# Create multiple text files
file_names = []
for i in range(3):
    filename = f"file_{i+1}.txt"
    with open(filename, "w") as f:
        for j in range(100):
            f.write(f"File {i+1}: This is line {j+1} with some repeated content.\n")
    file_names.append(filename)

# Compress all files
with zipfile.ZipFile('multi_files.zip', 'w', zipfile.ZIP_DEFLATED) as zip_file:
    for filename in file_names:
        zip_file.write(filename)
        print(f"Added {filename} to archive")

# Calculate compression statistics
total_original = sum(os.path.getsize(f) for f in file_names)
compressed_size = os.path.getsize('multi_files.zip')

print(f"\nTotal original size: {total_original} bytes")
print(f"Compressed size: {compressed_size} bytes") 
print(f"Space saved: {total_original - compressed_size} bytes")
print(f"Compression ratio: {compressed_size/total_original:.2%}")
Added file_1.txt to archive
Added file_2.txt to archive
Added file_3.txt to archive

Total original size: 16500 bytes
Compressed size: 1087 bytes
Space saved: 15413 bytes
Compression ratio: 6.59%

Extracting Compressed Files

You can also extract files from a zip archive ?

import zipfile

# Extract all files from the archive
with zipfile.ZipFile('multi_files.zip', 'r') as zip_file:
    # List contents
    print("Files in archive:")
    for filename in zip_file.namelist():
        print(f"  {filename}")
    
    # Extract all files
    zip_file.extractall('extracted_files/')
    print("\nFiles extracted to 'extracted_files/' directory")
Files in archive:
  file_1.txt
  file_2.txt
  file_3.txt

Files extracted to 'extracted_files/' directory

Comparison of Compression Methods

Method Compression Level Speed Best For
ZIP_STORED None Fastest Already compressed files
ZIP_DEFLATED Good Moderate Text files, source code
ZIP_BZIP2 Better Slower Large files, better compression
ZIP_LZMA Best Slowest Maximum compression needed

Conclusion

Python's zipfile module provides an efficient way to compress large files, saving storage space and improving transfer speeds. Use ZIP_DEFLATED for general purposes, and consider ZIP_BZIP2 or ZIP_LZMA for maximum compression when dealing with very large files.

Updated on: 2026-03-25T04:50:47+05:30

452 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements