Module 6: Importing Data from a Formatted Excel Report

Module 6: Importing Data from a Formatted Excel Report

“Module 6: Importing Data from a Formatted Excel Report … Advanced Text Query”
(Source URL)

Summaries

  • Module 6 > Importing Data from a Formatted Excel Report > Lecture

Module 6 > Importing Data from a Formatted Excel Report > Lecture

  • In some cases, you have access to a very nice data warehouse.
  • You’re a business analyst and you get access to the data warehouse.
  • You know the data warehouse and everything’s fine.
  • In other cases, you don’t have access to the data warehouse, but you still have access to some tool that will export the data you need into text files.
  • It has data for each month, you see January through December going across the rows.
  • You don’t have access to the data behind it.
  • Not only it’s challenging to make from this data some useful format or useful shape that you can use to create your own analyses, but also you are thinking about the number of times you will have to repeat the steps of creating from this a useful thing.
  • So let’s see how Excel, 2016 in this case, Power Query As is embedded in the previous version, can help you to use this and make it into a useful data to do your analysis.
  • I click on it, actually the query now reads this data, shows me there is something called Sheet1.
  • It reads the data, shows me the preview, which is not very encouraging because it has all kinds of things which I know I need to fix before I can use it.
  • It doesn’t matter because anyhow the data haven’t changed, but I can go ahead and refresh the preview.
  • So now I started to look at this data and identify what do I need to do.
  • I see that there are two rows here in the front that definitely seem to be, I don’t need these rows, so I want to remove them.
  • I go to the transformation and I see that there is one for removing rows.
  • I can do both top rows, bottom rows, here in this case top rows.
  • Now I’m actually at this row now that it’s going to be useful to use as the headers for the table that is going to be eventually created from that.
  • So now I can go and click on the transformation called Use First Row as Headers.
  • You see that it kind of promoted the first row, and uses it for headers.
  • Now, you see that because of the report, if you remember the way it was, the category is in row one.
  • What I can do, first of all, every row that contains the value, the word total, I want to filter it out.
  • Once I realized that there is a mistake here, that a lot of rows disappeared on me.
  • We saw before that we can go back, insert new rows in the steps, and I can also delete, this is like I’m doing, but I’m doing here is much simpler.
  • So first I want to actually fill the values for the categories on all these empty rows.
  • So that the value for Mix would be filled down on all the empty rows below it.
  • I will select the column and this column I want to do a Fill >Down.
  • I still have the totals, the rows that have the totals.
  • So I don’t want the rows that have the word Total in them.
  • So the first row of headers I’ve used as headers for my report.
  • So as an example of this header is this row number nine that has the word Category in it.
  • So again I can say I want to text filter where everything, I just want the rows that do not contain the word Category.
  • So we are gradually getting rid of all the rows we don’t need.
  • Actually there’s an extra column here that is empty so I’m going to remove it.
  • What I need is actually a column that will contain the name of the month, and another column that will contain the value.
  • Instead of having multiple columns for each month, I want each one of these columns to become a row.
  • So everything that used to be column now becomes row, so for each combination of category and manufacturer, now we have a row for January, February, March and so on.
  • So I do want a table, let’s say, and I don’t want it in the model, I just want a simple table in Excel.
  • This data, actually the total number is 437 rows, is the end result of the report that we showed before.
  • So it’s a simple table, it has rows and columns, it’s ready to go, it’s formatted correctly and I can use it for my reporting.
  • To summarize what we did in this module, we didn’t have a proper source of data from a database or from CSV file.
  • We started from there using a query, cleaned it, shaped it and actually created this data set that is useful to do further analysis.

Return to Summaries List.

(image source)
Print Friendly, PDF & Email