Help with excel data cleaning

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • sharpcat
    Restricted User
    • 12-19-09
    • 4516

    #1
    Help with excel data cleaning
    If I have a column in my database with the final score of the game grouped into one cell is there a function to split this into 3 columns easily?

    L 0-12
    L 6-23
    L 28-31
    L 20-23

    I would like to quickly change this to 3 columns without having to go through manually to eliminate both time consumption and human error.

    L 0 6
    L 6 23
    L 28 31
    L 20 23

    Any help would be greatly appreciated.
  • TomG
    SBR Wise Guy
    • 10-29-07
    • 500

    #2
    There are several ways to do this. The easiest way is to simply highlight the column and then go to Data -> Text to Columns and use delimiters.
    Comment
    • roasthawg
      SBR MVP
      • 11-09-07
      • 2990

      #3
      Yeah, what Tom said... text to columns then use space and "-" as the delimiters.
      Comment
      • dr_wolf
        SBR Sharp
        • 07-20-10
        • 417

        #4
        or make an macro and you separate what you want (you have to know macro programming) if you have questions give me a PM
        Comment
        • arwar
          SBR High Roller
          • 07-09-09
          • 208

          #5
          as I create scrapers (and associated data manipulators) on a pretty much full time basis, one trick that is often overlooked is using the 'search and replace' function. i am not an EXCEL fan but if load your data file in Notepad you can convert it to comma delimited (which will be columns) by doing the following:

          1. search and replace all <spaces> with <, >
          2. search and repalce all <-> with <, >

          so your data:
          L 0-12
          L 6-23
          L 28-31
          L 20-23

          becomes:
          L, 0, 12
          L, 6, 23
          L, 28, 31
          L, 20, 23

          you can also use this method to defeat some of the idiotic features of EXCEL like autocorrect and having scores that
          are formattted like 12-10 from being converted to dates.

          save the resulting Notepad file with an extension of .CSV rather than .TXT and it will be understandable to EXCEL as
          well as any other programs that use comma delimited file formats.
          Comment
          • Wrecktangle
            SBR MVP
            • 03-01-09
            • 1524

            #6
            For those of you not used to notepad, you can do similar as arwar mentions in MS Word.

            Also, I might mention when you import into MS Excel, you can control each column as to import character type. That way you can get around the auto-importing of data that looks like: 1-15, 3-2, 6-5 as dates.
            Comment
            • That Foreign Guy
              SBR Sharp
              • 07-18-10
              • 432

              #7
              Notepad is way better than word for data, it doesn't try any stupid autoformatting and handles special characters better.
              Comment
              • sharpcat
                Restricted User
                • 12-19-09
                • 4516

                #8
                Originally posted by arwar
                as I create scrapers (and associated data manipulators) on a pretty much full time basis, one trick that is often overlooked is using the 'search and replace' function. i am not an EXCEL fan but if load your data file in Notepad you can convert it to comma delimited (which will be columns) by doing the following:

                1. search and replace all <spaces> with <, >
                2. search and repalce all <-> with <, >

                so your data:
                L 0-12
                L 6-23
                L 28-31
                L 20-23

                becomes:
                L, 0, 12
                L, 6, 23
                L, 28, 31
                L, 20, 23

                you can also use this method to defeat some of the idiotic features of EXCEL like autocorrect and having scores that
                are formattted like 12-10 from being converted to dates.

                save the resulting Notepad file with an extension of .CSV rather than .TXT and it will be understandable to EXCEL as
                well as any other programs that use comma delimited file formats.
                Thank you very much.

                I will give this a try Tom's technique worked for me although like you said it changed many of the scores to dates. I found that it worked better for me to use the "fixed width" command in "text to columns" to split the scores and than just came back and selected "edit" "replace" and replaced all of the " - " with " ". This worked better than "delimited" in this scenario because of the auto correct to date feature.
                Comment
                • Insoluble
                  SBR Hustler
                  • 05-29-10
                  • 71

                  #9
                  You could also use the LEFT, MID, and RIGHT functions in Excel to help parse out the elements you need. This can be a little more difficult , especially with scores where you don't know whether they are single digit or double digit, but I think its a good tool to go over:

                  Suppose you had this in cell A1: "L 0-12"
                  If you wanted to separate the L, you could use the function "=LEFT(A1, 1)" This takes the 1 leftmost character from A1 and returns it in your cell. In order to get the number on the right side you could use "=RIGHT(A1,2)" and you would separate the 12 into your new cell. And finally to get the 0 separated you could use the mid function which goes a certain amount of characters from the left and returns the amount of character you specify. In this case we would need to go 3 over from the left and we only want to return 1 character so our formula would be "=MID(a1,3,1)" and this would return the 0.

                  The only catch with this is that if you are dealing with a ton of scores, you might not always have single digit data and you might need to return a 10 instead of a 0, in which case you would need to ask for 2 digits in your formula. The way I deal with this is I always try to take as many digits as I need and then I create an IF statement to clean up any extra "-" or "," I might accidentally get from getting too many characters returned. I can go into this more if you'd like but I don't want to get too confusing for ya unless you're interested.

                  Hope that helps, these functions have made it easy for me to parse tons of data really quickly into fields that I like.
                  Comment
                  • sharpcat
                    Restricted User
                    • 12-19-09
                    • 4516

                    #10
                    Originally posted by Insoluble
                    You could also use the LEFT, MID, and RIGHT functions in Excel to help parse out the elements you need. This can be a little more difficult , especially with scores where you don't know whether they are single digit or double digit, but I think its a good tool to go over:

                    Suppose you had this in cell A1: "L 0-12"
                    If you wanted to separate the L, you could use the function "=LEFT(A1, 1)" This takes the 1 leftmost character from A1 and returns it in your cell. In order to get the number on the right side you could use "=RIGHT(A1,2)" and you would separate the 12 into your new cell. And finally to get the 0 separated you could use the mid function which goes a certain amount of characters from the left and returns the amount of character you specify. In this case we would need to go 3 over from the left and we only want to return 1 character so our formula would be "=MID(a1,3,1)" and this would return the 0.

                    The only catch with this is that if you are dealing with a ton of scores, you might not always have single digit data and you might need to return a 10 instead of a 0, in which case you would need to ask for 2 digits in your formula. The way I deal with this is I always try to take as many digits as I need and then I create an IF statement to clean up any extra "-" or "," I might accidentally get from getting too many characters returned. I can go into this more if you'd like but I don't want to get too confusing for ya unless you're interested.

                    Hope that helps, these functions have made it easy for me to parse tons of data really quickly into fields that I like.
                    Very good info

                    I will play around with this I am sure I can find many applications to use this with.

                    I have been teaching myself all of this computer related stuff and have made tons of progress over the last year or two but it is very helpful to have a community to ask questions when you can not make sense of what you are reading in a book. Hopefully soon I can start understanding some programming languages.
                    Comment
                    • Wrecktangle
                      SBR MVP
                      • 03-01-09
                      • 1524

                      #11
                      Originally posted by That Foreign Guy
                      Notepad is way better than word for data, it doesn't try any stupid autoformatting and handles special characters better.
                      I agree, but I'm always surprised by how many folks have not used it.
                      Comment
                      • uva3021
                        SBR Wise Guy
                        • 03-01-07
                        • 537

                        #12
                        say you have "L 0-12" in A1, in B1 put "=LEFT(A1,1)", C1 put "=RIGHT(A1,LEN(A1)-FIND("-",A1))", D1 put "=TRIM(MID(A1,FIND("-",A1)-2,2))"

                        copy down to last cell, will save a lot of time and resources
                        Comment
                        • uva3021
                          SBR Wise Guy
                          • 03-01-07
                          • 537

                          #13
                          or actually reverse C1 and D1, my fault
                          Comment
                          SBR Contests
                          Collapse
                          Top-Rated US Sportsbooks
                          Collapse
                          Working...