How do I add information from one CSV file to another based off of matching values in Python?


I am trying to combine data from multiple CSV files into one CSV file. I have several files that have similar information. I want to add the information that I don't have to a single CSV file by matching similar cell data and appending the missing information to that corresponding row.

This is an image of the CSV file I want to add information to: table I want to add to

This is an image of the CSV file I want to parse for information, and then add missing information to joined CSV file: table I want to take information from

This is what I want my CSV file to look like in the end: desired result

I have simply copy and pasted here to demonstrate my goal, however I have a ton of data that I would like to add by checking it against my source table.

My strategy so far has been to use the CSV module and create a nested for loop that compares against the two rows. Right now my code only tries to see if there is a match between the two tables without trying to append anything. This is also proving difficult.

My output so far looks like this: current output (note that in this image I was trying to add superintendent info, that is why the output is showing a name instead of the other information)

My code is iterating over more than I wanted. I was hoping that it would just compare the dictionary values I had called to.

#Python 3.6.4    
import csv
import codecs

count = 0
original_csv = codecs.open("Texas School Districts.csv", 'r', encoding='utf-8', errors='ignore')
fieldnames1 = ['NCES LEA ID', 'State ID', 'District Name']
reader1 = csv.DictReader(original, fieldnames=fieldnames1)

additional_info = codecs.open("new_information.csv", 'r', encoding='utf-8', errors='ignore')
fieldnames2 = ['Name', 'District', 'Type', 'Description']
reader2 = csv.DictReader(additional_info, fieldnames=fieldnames2)

for row1 in reader1:
    for row2 in reader2:
        if row1['District Name'] == row2['Name']:
            print(row1['District Name'], ":", row2['Name'])
        else:
            pass
    count += 1  
    original.seek(count)
    append.seek(0)

What would you guys recommend I do to accomplish the task of merging new data to existing values? Am I on the right track or have I gone way off? Please let me know if you have any questions, or if I can provide any more information. I hope I made some sense. Thank you all.


Answer

If you want to continue doing this in native python, and not pandas, or a real database, you have to make this code faster:

for row1 in reader1:
    for row2 in reader2:
        if row1['District Name'] == row2['Name']:
            print(row1['District Name'], ":", row2['Name'])
        # seek to avoid reader2 to be EOF ...

no need for the second loop if you build a lookup set beforehand:

existing_names = {row["Name"] for row in reader2}

now use a single loop, and no need to rewind files:

for row1 in reader1:
    if row1['District Name'] in existing_names: # set lookup is faaast
            print(row1['District Name'])

Now code runs in O(n) (average) vs O(n**2)