Pandas DataFrame Concat Explained [6 Code Examples]

When working with data in Python, the Pandas library is a powerful tool that provides various data manipulation and analysis functionalities. One commonly used operation is Pandas Dataframe concat. This allows us to combine multiple DataFrames along a specified axis. In this article, we’ll explore the concat function in Pandas and provide detailed explanations with code examples.

Introduction to Pandas Dataframe Concat

The pd.concat function in Pandas is used to concatenate multiple DataFrames. It provides a flexible way to combine data from different sources. The key parameter, axis, determines whether the concatenation happens along the rows (axis=0) or columns (axis=1).

Example 1: Concatenating Pandas DataFrames Along Rows

Concatenating DataFrames along rows means stacking them vertically. This is useful when you want to combine datasets with the same columns.

import pandas as pd

# Sample DataFrames
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})

# Concatenate along rows
result = pd.concat([df1, df2], axis=0)

print(result)

Output:

  A B
0 1 3
1 2 4
0 5 7
1 6 8

Example 2: Concatenating DataFrames Along Columns

Concatenating DataFrames along columns means combining them horizontally. This is useful when you want to add more columns to your existing DataFrame.

# Sample DataFrames
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'C': [5, 6], 'D': [7, 8]})

# Concatenate along columns
result = pd.concat([df1, df2], axis=1)

print(result)

Output

  A B C D
0 1 3 5 7
1 2 4 6 8

Example 3: Ignoring the Index

By default, the index of the resulting DataFrame retains the original indices of the concatenated DataFrames. To reset the index, you can use the ignore_index=True parameter.

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})

# Concatenate and ignore index
result = pd.concat([df1, df2], axis=0, ignore_index=True)
print(result)

Output

  A B
0 1 3
1 2 4
2 5 7
3 6 8

Example 4: Handling Duplicate Indices

# Sample DataFrames with duplicate index
df1 = pd.DataFrame({'A': [1, 2]}, index=[0, 1])
df2 = pd.DataFrame({'A': [5, 6]}, index=[1, 2])

# Concatenate along rows without verifying integrity
res = pd.concat([df1, df2], axis=0)

print(res)

Output

  A
0 1
1 2
1 5
2 6

In this example, we’re concatenating the DataFrames along rows without verifying the integrity of the indices. As a result, the index 1 appears twice in the resulting DataFrame. If you want to handle duplicate indices differently, you might need to pre-process your data or choose a different approach based on your specific use case.

Note: If the DataFrames have duplicate indices and you want to keep them, you can use the verify_integrity=True parameter. This will raise an error if there are overlapping indices.

Example 5: Concatenating DataFrames with Different Columns

When concatenating DataFrames with different columns, missing values will be filled with NaN.

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'C': [5, 6], 'D': [7, 8]})

# Concatenate with different columns
result = pd.concat([df1, df2], axis=1)

print(result)

Output

  A B C D
0 1 3 5 7
1 2 4 6 8

Example 6: Concatenating DataFrames with Inner Join

You can perform an inner join while concatenating by setting join='inner'.

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'B': [5, 6], 'C': [7, 8]})

# Concatenate with inner join
result = pd.concat([df1, df2], axis=1, join='inner')
print(result)

Output

  A B B C
0 1 3 5 7
1 2 4 6 8

In this code, we’re using the Pandas library in Python. We create two sample DataFrames:

  1. df1 contains two columns, labeled ‘A’ and ‘B’, with the respective values of [1, 2] and [3, 4].
  2. df2 contains two columns, labeled ‘B’ and ‘C’, with the respective values of [5, 6] and [7, 8].
  3. We use the pd.concat function to concatenate DataFrames. Theaxis=1 indicates that we’re concatenating along columns. The join='inner' parameter specifies an inner join, which means that only rows with matching indices in both DataFrames will be included.
  4. Now, let’s look at the indices and columns of df1 and df2:
    • df1 has indices [0, 1] and columns [A, B].
    • df2 has indices [0, 1] and columns [B, C].
  5. Since both DataFrames have the same indices ([0, 1]), the inner join will include these rows. The resulting DataFrame (result) will contain columns [A, B, B, C].
  6. In this DataFrame, the columns are labeled with ‘A’, ‘B’, ‘B’, and ‘C’. The first ‘B’ comes fromdf1, and the second ‘B’ comes from df2. This is why they are distinguished as ‘B_x’ and ‘B_y’ respectively.

Conclusion

In this article, we’ve covered the Pandas Dataframe concat, which is a versatile tool for concatenating DataFrames. By understanding the different parameters and techniques, you can effectively combine data from various sources to perform more complex data analysis tasks. Remember to consider the axis along which you want to concatenate and how to handle indices based on your specific requirements of the project.

Do visit my other worth reading articles on Pandas Dataframe as well. Thank you for reading this one.

Leave a Comment

Your email address will not be published. Required fields are marked *