Datetime parsing in Python with potential data inconsistency
Asked 9 months ago
0 Answers
40 Views
I have a large dataframe (approx. 100milions row) that I need to load and process datetime using python.
Below is an excerpt of my attempt:
date_cols = ['A', 'B'] # 2 formats existent dans ABARawData?
for col in date_cols:
try:
df[col] = pd.to_datetime(df[col],format='%m/%d/%Y')
except:
df[col] = pd.to_datetime(df[col],format='%d/%m/%Y')
I know for sure it will either come in one of the two formats: '%m/%d/%Y' or '%d/%m/%Y' but there are some errors due to data are collected and entered manually (some contain unwanted strings). My solution for now is to split each row by '/' then rejoin and exclude rows with non numeric value but it seems like I could run into some other errors in the future. I'm looking for some suggestions for this problem
0 Answer(s)
1
Sign in to answer the question