Python DataFrame to CSV explained with 4 Code Examples

Python’s pandas library is a powerful tool for data manipulation and analysis. One common task in data handling is saving a DataFrame to a CSV (Comma Separated Values) file. This guide will walk you through the process, providing multiple code examples and explanations.

Introduction to Python DataFrames

A DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns). It’s the most commonly used data structure in pandas. DataFrames can be thought of as an in-memory spreadsheet.

Saving Python DataFrames to CSV

Basic CSV Export

The most straightforward way to export a DataFrame to a CSV file is by using the to_csv() function provided by pandas.

import pandas as pd

# Create a sample DataFrame
data = {'Name': ['John', 'Jane', 'Bob', 'Emily'],
'Age': [28, 25, 32, 29],
'City': ['New York', 'San Francisco', 'Los Angeles', 'Chicago']}

df = pd.DataFrame(data)

# Export to CSV
df.to_csv('sample_data.csv', index=False)

In this example, we first import the Pandas library. Then, we create a sample DataFrame called df with columns ‘Name’, ‘Age’, and ‘City’. Finally, we use the to_csv() function to save the DataFrame to a CSV file named sample_data.csv. The index=False argument specifies that we don’t want to save the row indices.

Customizing CSV Export

The to_csv() function provides a range of options to customize the export process. Here are some common parameters:

  • sep: Specifies the delimiter to use (default is a comma).
  • header: Controls whether to write the header (column names).
  • columns: Allows you to select specific columns for export.
  • na_rep: String representation of NaN values.
  • date_format: Format for date columns.
# Customized CSV Export
df.to_csv('custom_data.csv', index=False, sep='\t', header=True, date_format='%Y-%m-%d')

In this example, we export the DataFrame df to a CSV file named custom_data.csv. We’ve customized the export by specifying a tab (\t) as the delimiter, including the header, and formatting the date columns according to the specified date format.

Examples with Outputs

Example 1: Basic Export

import pandas as pd

# Create a sample DataFrame
data = {'Name': ['John', 'Jane', 'Bob', 'Emily'],
'Age': [28, 25, 32, 29],
'City': ['New York', 'San Francisco', 'Los Angeles', 'Chicago']}

df = pd.DataFrame(data)

# Export to CSV
df.to_csv('sample_data.csv', index=False)

Output (sample_data.csv):

Name,Age,City
John,28,New York
Jane,25,San Francisco
Bob,32,Los Angeles
Emily,29,Chicago

Example 2: Custom Export

# Create a sample DataFrame with dates
data = {'Name': ['John', 'Jane', 'Bob', 'Emily'],
'Age': [28, 25, 32, 29],
'Birthdate': ['1993-02-15', '1996-08-23', '1990-11-10', '1988-05-03']}

df = pd.DataFrame(data)
df['Birthdate'] = pd.to_datetime(df['Birthdate'])

# Customized CSV Export
df.to_csv('custom_data.csv', index=False, sep='\t', header=True, date_format='%Y-%m-%d')

Output (custom_data.csv):

Name Age Birthdate
John 28 1993-02-15
Jane 25 1996-08-23
Bob 32 1990-11-10
Emily 29 1988-05-03

Example 3: Exporting Specific Columns

# Create a sample DataFrame
data = {'Name': ['John', 'Jane', 'Bob', 'Emily'],
'Age': [28, 25, 32, 29],
'City': ['New York', 'San Francisco', 'Los Angeles', 'Chicago'],
'Salary': [75000, 80000, 90000, 70000]}

df = pd.DataFrame(data)

# Export only 'Name' and 'Salary' columns to CSV
df[['Name', 'Salary']].to_csv('salary_data.csv', index=False)

Output (salary_data.csv):

Name,Salary
John,75000
Jane,80000
Bob,90000
Emily,70000

In this code example, we create a DataFrame with additional ‘Salary’ information. We then use double square brackets [['Name', 'Salary']] to select only the ‘Name’ and ‘Salary’ columns before exporting to a CSV file named salary_data.csv.

Example 4: Handling Missing Values

import pandas as pd
import numpy as np

# Create a sample DataFrame with missing values
data = {'Name': ['John', 'Jane', 'Bob', 'Emily'],
'Age': [28, np.nan, 32, 29],
'City': ['New York', 'San Francisco', np.nan, 'Chicago']}

df = pd.DataFrame(data)

# Export DataFrame with NaN values represented as 'Unknown'
df.to_csv('missing_data.csv', index=False, na_rep='Unknown')

Output (missing_data.csv):

Name,Age,City
John,28.0,New York
Jane,Unknown,San Francisco
Bob,32.0,Unknown
Emily,29.0,Chicago
In this example, we intentionally introduce missing values (NaN) in the ‘Age’ and ‘City’ columns. We then use the na_rep parameter to specify that these missing values should be represented as ‘Unknown’ in the exported CSV file.

Conclusion

Exporting a Python DataFrame to CSV file is a fundamental task in data handling. With pandas, this process is straightforward and provides various customization options. By following the examples and explanations in this guide, you should be well-equipped to save Python Dataframe to CSV. Remember to adjust parameters based on your specific needs and data requirements. Do visit our other articles on Python dataframes as well.

Thank you for reading.

Leave a Comment

Your email address will not be published. Required fields are marked *