I faced this problem many years ago when I first started writing programs to process data from multiple web sites. Following is how I solved the problem.
- Created a real team name file for each sport. This contained the standardized name that I would use for each team. It also contained an abbreviation for that team. (i.e. CLE INDIANS,cle).
- Created an alias team name file that contained any variations of the team name I encountered and the real name. (i.e. Cleveland,CLE INDIANS).
- Wrote a find team subroutine. When I encountered a team name, passed it to this routine and it would search the team name files and return the standardized name and abbreviation.
- Created a GameID that has the game date and two abbreviations. (i.e. 20180728detcle0). (The 0 becomes 1 or 2 for double headers.)
I can then sort, merge, and compare on the GameIDs. Obviously there are some utility functions needed to maintain these files. Yes, it was a lot of work initially, but it very effective now.