Datetime parsing in Python with potential data inconsistency

clock icon

Asked 3 months ago

Answers

0 Answers

eye

29 Views

I have a large dataframe (approx. 100milions row) that I need to load and process datetime using python.

Below is an excerpt of my attempt:

date_cols = ['A', 'B']    # 2 formats existent dans ABARawData?
for col in date_cols:
  try:
    df[col] = pd.to_datetime(df[col],format='%m/%d/%Y')
  except:
    df[col] = pd.to_datetime(df[col],format='%d/%m/%Y')

I know for sure it will either come in one of the two formats: '%m/%d/%Y' or '%d/%m/%Y' but there are some errors due to data are collected and entered manually (some contain unwanted strings). My solution for now is to split each row by '/' then rejoin and exclude rows with non numeric value but it seems like I could run into some other errors in the future. I'm looking for some suggestions for this problem

0 Answer(s)

1

Sign in to answer the question