https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf
Task | Example in R | Example in Python |
---|---|---|
Load a package | library(tidyverse) | import pandas as pd |
Read a CSV file into a dataframe | read.csv('./Data/Raw/File.csv') | pd.read_csv('./Data/Raw/File.csv') |
Reveal the structure of a df | str(df) | df.info() |
Reveal the dimensions of a df | dim(df) | df.shape |
Reveal a summary of the df | summary(df) | df.describe() |
Show the first few rows | head(df) | df.head() |
Show the data type of column "colX" | class(df$colX) | df['colX'].dtype |
Convert a string column to a date column | as.Date(df$Date,format='%m/%d/%y') | pd.to_datetime('Date',format='%y/%m/%d') |
Separate date items into columns | separate(df,dateCol,c("Y", "m", "d")) | dateIdx=pd.DatetimeIndex(df['dateCol']) df['Y'] = dateIdx.year df['m'] = dateIdx.month df['d'] = dateIdx.day |
Select a slice of columns | select(df, col1:col3) | df.loc[:, 'col1':'col3'] |
Select a subset of columsn | select(df, c(col1,col2,col4)) | df.loc[:, 'col1':'col3'] |
Create a new column with specified values | mutate(df, col2 = "valueX") | df['col2'] = "valueX" |
Write df to a CSV file | write.csv(df,'./Data/Processed/File.csv') | df.to_csv('./Data/Processed/File.csv') |
Combine data frames (by stacking them vertically: must have common row names) | rbind(df1,df2,df3) | pd.concat((df1,df2,df3),axis='rows') |
Combine data frames (by appending them horizontally: must have common row counts) | cbind(df1,df2,df3) | pd.concat((df1,df2,df3),axis='columns') |
Selecting rows | filter(df, colX == 'MyValue') | via "query":df.query("colX == 'MyValue'") via Boolean mask: df[df['colX'] == 'MyValue']] |