Module 4: Importing Data from Databases

Module 4: Importing Data from Databases

“Importing Data from Databases … Importing Data from Multiple Files … Creating Mash-ups of Data from Multiple Sources”
(Source URL)

Summaries

  • Module 4 > Importing Data from Databases > Lecture
  • Module 4 > Importing Data from Multiple Files > Lecture
  • Module 4 > Create a Date Table in Excel Data Model > Create a Date Table in Excel Data Model

Module 4 > Importing Data from Databases > Lecture

  • In this case we’re going to bring data from a database, in this case a SQL Server database which has multiple tables that I’m interested in, and I want to create a model made of this collection of tables.
  • There’s no data model, there are no tables, nothing.
  • So I’m going again to the data and in there I will start a new query.
  • If you browse further down you will see that they’re really a long list, an interesting list of data sources available.
  • There’s a checkbox here that allows me to select multiple tables, and actually I want to select all the tables.
  • Every time I’m in a table, as you can see, it shows me a preview of the content of this table in the database.
  • In some cases, it could be even be the entire table but mostly it’s just a subset of the rows of this database.
  • I’m not ready to use these tables as they are, so I’m going to click here the Edit, and it will bring me into the same editor window that we saw before.
  • On the left side I see the list of tables that I imported, and I can switch between each one of them and apply some steps to each one of these.
  • So what do I want to do? I want to go to the sales table, and first of all, I see here that I have a zip code and a country.
  • In this case the data is just for one country, the USA, but if I go to the locations table, the locations table have actually information about each one of these zips of this region.
  • So the problem with that is that I would like to create a relationship between this sales table and the locations table.
  • So it means that the zip code is not unique in the locations table.
  • It could potentially not be unique on the sales, well, obviously it’s not unique in the sales table because there’s multiple rows for each zip.
  • What else do I want to do? I have data here from 2008.
  • I don’t need all this data, this historic data.
  • I want to apply a filter that says, I just want data where the date is after 12/31/2010.
  • Everything I filter here is actually going to import just the data that I filter.
  • It’s not just a matter of showing me right here, the editor, less or more data.
  • It’s actually about going and later on, when it goes to input an entire dataset, it will input only the data which comply with this filter.
  • The other tables I don’t need to do anything to, so I can go back here and I say Close & Load. I know that my default is already to put the data into the model, so you know that I will actually get multiple tables into the model.
  • Now I see that it’s making progress on the sales table.
  • Even the subset of data from 2011 is almost 2 million rows, so we’re waiting a little bit to this data to finish.
  • Because I import from multiple tables, it will look to see if there are relationships that could be recognized from the database.
  • In the database we have something called referential integrity, relationship between tables, and let’s now see what indeed it was recognizing, so I can go to the manage table.
  • I see that the sales table in the bottom here, it said it has 1 million and 800-something rows 137 00:08:42,581 -> 00:08:46,270 If I switch to diagram view, you’ll see that the tables, some of them are connected and some of them are not.
  • The Products table is connected to a little table that has just information about the Manufacturers.
  • The Locations table is not automatically connected to the Sales table, and this is because the actual relationship exists only between the additional custom column that I created.
  • I can go to the Insert > Pivot Table, like we did a few times before.
  • So I have the data already coming from this table, so this will conclude the first part of this module.
  • In the next part we’re going to add another table that is actually required to do more sophisticated analysis, a calendar table.

Module 4 > Importing Data from Multiple Files > Lecture

  • Remember in the previous part, I imported data from SQL server, bringing in four tables, making sure that they have relationship between them.
  • In this part I will add some aspects to this model, bringing some additional data, combining it with existing data, and making it a more complete model.
  • So let me just first jump into the window, the manage window, and one of the things that in almost every model we do we need, is a calendar table.
  • Calendar tables are required we have a date In the cell’s table we do have a date column, but this is just a date.
  • For all these purposes, we do need a calendar table.
  • With previous versions you will need to either create the table in Excel and bring data in Excel.
  • By the way, I didn’t mention but you can actually create tables in Excel and bring them into the model.
  • So one of the sources available to you is just a table in Excel.
  • I see that there is a data table here, and I can say create me a new data table.
  • So this is just one click, and it added the table to my model, called calendar.
  • The only thing I still need to do is to create a relationship between this new table and the rest of the tables.
  • The new table, which is just hiding here because of the scale.
  • It actually created the necessary data in the data, in the calendar table for all the dates it found in any one of the tables.
  • It analyzes all the tables, found, in this case there is only one date column and it goes from 2011 to 2015.
  • I have data in the calendar table for all these years.
  • If I want to I can go to the data table and actually, I can extend it.
  • This query is actually, the reason that I need to do another query is that this company has the sales data for the US in this SQL server database but sales data from other countries are sent as text files to this guy who’s doing this analysis.
  • I am this guy in this case, and I need to combine the data from the text files to the data from the SQL server to be able to create some global analysis.
  • Here, It’s not a single CSV, but actually it’s a folder full of CSV data.
  • I can click on this icon here and get myself into the Editor with all the data in all these files combined together, as if they were one.
  • Because this will be actually the extension of the sales table which already have this column.
  • Because this is how, when we append this data to the previous data, each column here would find its matching in the existing data.
  • I can actually go from here, so for example I can say, in the zip column, it’s a text field and I can say I want to filter out all the rows which does not equal the word zip.
  • I would like to do it before, no actually now it’s, I didn’t filter the date yet so it’s actually a good time.
  • Remember, I did the same for the sales and there’s no point in bringing me the international data from 1999 when I’m actually interested in 2011.
  • All right so now we have these four files and I’m going to the close and load to dialogue, because I actually don’t want this query to be loaded to the data module as yet.
  • It’s not loaded to the data model, and also not loaded into Excel.
  • Those are very special columns that were added by the query, because it detected the relationship between this table and others.
  • There’s actually an option that I will not use here to select columns for these other tables, and add them automatically to here.
  • The end result of the sales table is going to be the entire content of the sales table.
  • Starting filtering to only 2011, plus the text files appended to it and all of them will become one big happy file, happy table in the model.
  • What happened? Actually, because now the sales table is not directly a table from the database, the tool recognized that the previously identified, detected relationship between this sales stable and the other tables are not valid anymore.
  • It’s actually very easy to fix because we have another, new for 2016, functionality to detect the relationships from the data.
  • So now when I clicked on it, it actually brought back the numbers to where they were and we now have the data from all countries.
  • If I go to the location and pick the country and make it into a slicer, I see all the countries from which I have data, and I can see the data from France and I can control click and see the data from France and Germany and so on.
  • So now we have a truly global analysis of the data for this company.
  • We added a calendar table using a functionality that was added to Power Pivot or to Excel, Power Pivot in Excel in this version of 2016.
  • We also created another query from a folder of text files, added all the contents of these text files to the sales table, and eventually we got a model that has data both from SQL and from the text files.

Module 4 > Create a Date Table in Excel Data Model > Create a Date Table in Excel Data Model

  • Date tables, in Excel data model, are essential for browsing and calculating data over time.
  • We yet to have a date table in the data model.
  • We want to create a date table and link it to the Sales table, so that we can perform analysis on the sales data over time.
  • In Excel 2016, creating a date table is very simple.
  • The New Date table provides the functionality to do this in a single click.
  • You can create a relationship between the newly created Calendar table and the Sales table, by linking the respective Date columns.
  • As you can see here, the Date table is made of all the dates available in the Sales table.
  • Excel automatically detects all the dates that are available in the Sales table, and create the Date column based on those dates.
  • The rest of the columns are calculated columns, created using formulas based on the Date column.
  • The Year column derives the year from the date in the row.
  • The Month column derives the month from the date in the row, and so on.
  • If you are running earlier version of Excel, there are several ways you can add a date table to your Data Model.
  • You can import a date table from a relational database, or other data source.
  • You can create a date table in Excel and copy or link it to a new table in Excel data model.
  • Creating a date table in Excel and copying it into a new table in the Data Model, is really quite easy to do, and it gives you a lot of flexibility.
  • When you create a date table in Excel, you begin with a single column with a contiguous range of dates.
  • Type Date as column header to identify a range of dates.
  • Click the fill handle and drag it down to a row number that includes an ending date.
  • Once you have imported the date table into your data model, you need to link it to the Sales table.

Return to Summaries List.

(image source)

Leave a Reply

Your email address will not be published. Required fields are marked *