Python > Working with Data > File Formats > CSV (Comma Separated Values) - reading and writing with `csv` module

Reading and Writing CSV Files with the `csv` Module

This example demonstrates how to read data from a CSV file into Python and how to write data from Python into a CSV file using the built-in csv module. It covers basic read and write operations, including handling headers and different delimiters.

Introduction to CSV Files and the `csv` Module

CSV (Comma Separated Values) files are a common way to store tabular data. Each line in a CSV file represents a row, and values within each row are separated by a delimiter (usually a comma). Python's csv module provides tools to easily read and write CSV files. It handles complexities like quoting, escaping, and different delimiters.

Reading Data from a CSV File

This snippet demonstrates reading a CSV file. First, we import the csv module. We then define some sample CSV data as a string. The csv.reader() function creates a reader object that allows us to iterate over the rows in the CSV file. The next(reader) function retrieves the header row. Subsequently, a for loop iterates over the remaining rows, printing each row as a list.

import csv

# Sample CSV data (in-memory)
csv_data = '''Name,Age,City
Alice,30,New York
Bob,25,London
Charlie,35,Paris'''

# Convert string data to a file-like object for reading
import io
csv_file = io.StringIO(csv_data)

reader = csv.reader(csv_file)

# Read and print the header
header = next(reader)  # Get the first row as header
print(f"Header: {header}")

# Iterate over rows and print data
for row in reader:
    print(f"Row: {row}")

Writing Data to a CSV File

This snippet demonstrates writing data to a CSV file. We define a list of lists called data, where each inner list represents a row. The csv.writer() function creates a writer object. The writerows() method writes all the rows from the data list to the CSV file. Finally the generated file in memory is displayed.

import csv

# Data to be written to the CSV file
data = [
    ['Name', 'Age', 'City'],
    ['Alice', 30, 'New York'],
    ['Bob', 25, 'London'],
    ['Charlie', 35, 'Paris']
]

# Write data to a CSV file (in-memory)
import io
csv_file = io.StringIO()

writer = csv.writer(csv_file)

# Write multiple rows at once
writer.writerows(data)

# Get the CSV content as a string
csv_content = csv_file.getvalue()
print(csv_content)

Specifying a Delimiter

The delimiter parameter in csv.reader() and csv.writer() allows you to specify a different delimiter than the default comma. This is useful for working with files that use tabs, semicolons, or other characters to separate values.

import csv

# Sample data with a tab delimiter
tsv_data = '''Name\tAge\tCity
Alice\t30\tNew York
Bob\t25\tLondon'''

import io
tsv_file = io.StringIO(tsv_data)

# Create a reader with a tab delimiter
reader = csv.reader(tsv_file, delimiter='\t')

# Iterate over rows and print data
for row in reader:
    print(row)

Real-Life Use Case

CSV files are commonly used for data exchange between different applications and systems. They are often used to export data from databases, spreadsheets, and other data storage systems. For example, a marketing team might export customer data from their CRM system into a CSV file for analysis in a data visualization tool.

Best Practices

  • Always handle exceptions when reading or writing files.
  • Specify the encoding (e.g., 'utf-8') when opening files to avoid encoding errors.
  • Be mindful of the delimiter used in the CSV file and specify it correctly when reading or writing.
  • Use a context manager (with open(...)) to ensure files are properly closed after use.

When to Use CSV

Use CSV when:

  • Data is tabular and relatively simple.
  • Human readability and editability are important.
  • Interoperability between different applications is required.
  • You need a lightweight format for data exchange.

Alternatives

Alternatives to CSV include:

  • JSON: Suitable for more complex data structures and hierarchical data.
  • Parquet: Optimized for analytical queries and efficient storage.
  • Excel (XLSX): Suitable for data manipulation within spreadsheets but less suitable for programmatic data processing.

Pros

  • Simple and widely supported format.
  • Human-readable and easy to edit.
  • Lightweight and efficient for simple tabular data.

Cons

  • Limited support for complex data structures.
  • Does not support data types (everything is treated as a string).
  • Can be inefficient for large datasets.

FAQ

  • How do I handle CSV files with different encodings?

    Specify the encoding parameter when opening the file. For example, open('file.csv', 'r', encoding='utf-8'). Common encodings include 'utf-8', 'latin-1', and 'ascii'.
  • How can I handle missing values in a CSV file?

    When reading the CSV file, you can check for empty strings in the rows. When writing, you can replace missing values with a specific placeholder, such as None or an empty string.
  • Can I read and write CSV files directly from and to URLs?

    Yes, you can use the urllib.request module to open a URL and then use the csv.reader to read the data. Similarly, you can POST data to a URL to simulate writing to a remote CSV file (though this is less common).