Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
SQL Server 2012 Tutorials - Analysis Services Data Mining.pdf
Скачиваний:
38
Добавлен:
26.03.2016
Размер:
1.41 Mб
Скачать

Lesson 2: Building a Forecasting Scenario (Intermediate Data Mining Tutorial)

As the sales analyst for Adventure Works Cycles, you have been asked to forecast the sales of products for the next year. In particular, you have been asked to compare forecasts for the different regions and product lines. Additionally, you have been asked to determine whether sales of different products vary depending on the time of the year.

To find the requested information, in this lesson you will summarize the company's sales data at the monthly level, and you will also summarize sales figures by three regions: Europe, North America, and the Pacific.

After you complete the tasks in this lesson, you will be able to answer the following questions:

How do the sales of different bike models change over time?

Are there differences between the patterns for sales in the three regions?

Can we forecast sales peaks?

The lesson can be completed in two parts:

Part One introduces the basics of how to create and use a time series model.

Part Two walks you through creation of a general time series model, based on all regions, that can be used for cross-prediction.

To complete the tasks in this lesson, which are listed below, you will use the data source that you created in Lesson 1: Creating the Intermediate Data Mining Solution (Intermediate Data Mining Tutorial).

Warning

The dates in the Adventure Works Cycles sample database have been updated for this release. If you use an earlier version of Adventure Works Cycles, you can build the model following these steps, but you might see different results.

Creating a Simple Forecasting Model

Adding a Data Source View for Forecasting (Intermediate Data Mining Tutorial)

Creating a Forecasting Mining Model Structure (Data Mining Tutorial)

Modifying the Forecasting Structure (Data Mining Tutorial)

Customizing and Processing the Forecasting Model (Intermediate Data Mining Tutorial)

Exploring the Forecasting Model (Data Mining Tutorial)

Creating Time Series Predictions (Intermediate Data Mining Tutorial)

Creating a General Forecasting Model for Cross-Prediction

Adding an Aggregated Forecasting Model (Intermediate Data Mining Tutorial)

Understanding Trends in the Time Series Model (Intermediate Data Mining Tutorial)

47

Predicting using the Averaged Forecasting Model (Intermediate Data Mining Tutorial)

Comparing Predictions for Forecasting Models (Intermediate Data Mining Tutorial)

Next Task in Lesson

Adding a Data Source View for Forecasting (Intermediate Data Mining Tutorial)

All Lessons

Lesson 1: Creating the Intermediate Data Mining Solution

Lesson 2: Forecasting Scenario (Intermediate Data Mining Tutorial) Lesson 3: Market Basket Scenario (Intermediate Data Mining Tutorial)

Lesson 4: Sequence Clustering Scenario (Intermediate Data Mining Tutorial)

Lesson 5: Neural Network and Logistic Regression Scenario (Intermediate Data Mining Tutorial)

See Also

Data Mining Tutorial

Intermediate Data Mining Tutorial (Analysis Services - Data Mining) Microsoft Time Series Algorithm (Analysis Services - Data Mining)

Adding a Data Source View for Forecasting (Intermediate Data Mining Tutorial)

In this task, you add a data source view that will be used for the forecasting scenario. A forecasting model requires that the data contains a column that can be used to identify steps in a time series. If you plan to analyze multiple series of data, all series must end on the same date or time step.

Procedures

To add a data source view

1.In Solution Explorer, right-click Data Source Views, and then select New Data Source View.

2.On the Welcome to the Data Source View Wizard page, click Next.

3.On the Select a Data Source page, under Relational data sources, select the data source. Click Next.

Note

If you do not have this data source, you can find the steps to create the data source in the Basic Data Mining Tutorial.

4.On the Select Tables and Views page, select the table, vTimeSeries (dbo), and then click the right arrow to add it to the data source view.

48

5.Click Next.

6.On the Completing the Wizard page, by default the data source view is named

Adventure Works DW Multidimensional 2012 . Change the name to

SalesByRegion, and then click Finish.

Data Source View Designer opens and the SalesByRegion data source view appears.

Working with the Data Source View

After you have created the data source view, you can explore the data in the following ways:

Right-click the table vTimeSeries in the designer, and select Explore Data to open the selected table in a grid.

Click Sampling options and then use the Data Exploration Options dialog box to change the sampling method. Click Refresh to load data in the table using the new option settings. For example, you could specify the number of rows to output in the sample, or choose the top rows.

Right-click the table vTimeSeries and select Properties to assign a new name to the table. You can also select individual columns from the data source view, and the modify the column properties.

Click anywhere in the data source view design area to create a new query and assign a name to it, to create relationships between tables, or to change the layout of the design area.

Right-click a table and select New Named Calculation to create derived columns, including aggregations. You can also add new tables and views from the data source in this view.

In the next task, you will explore the time series data and determine the best column to use as the time series identifier. You will also learn how to handle gaps in time series data.

Next Task in Lesson

Understanding

See Also

Microsoft Time Series Algorithm (Analysis Services - Data Mining)

Understanding the Requirements for a Time Series Model (Intermediate Data Mining Tutorial)

When you are preparing data for use in a forecasting model, you must ensure that your data contains a column that can be used to identify the steps in the time series. That column will be designated as the Key Time column. Because it is a key, the column must contain unique numeric values.

49

Choosing the right unit for the Key Time column is an important part of analysis. For example, suppose your sales data is refreshed on a minute by minute basis. You would not necessarily use minutes as the unit for the time series; you might find it more meaningful to roll up sales data by the day, week, or even month. If you are unsure which unit of time to use, you can create a new data source view for each aggregation, and build related models, to see if different trends emerge at each level of aggregation.

For this tutorial, sales data is collected on a daily basis in the transactional sales database, but for data mining, the data has been pre-aggregated by the month, using a view.

Additionally, it is desirable for analysis that the data have as few gaps as possible. If you plan to analyze multiple series of data, all series should preferably start and end on the same date. If the data has gaps, but the gaps are not at the beginning or end of a series, you can use the MISSING_VALUE_SUBSTITUTION parameter to fill in the series. Analysis Services also provides several options for replacing missing data with values, such as using means or constants.

Warning

The PivotChart and PivotTable tools that were included in earlier versions of the data source view designer are no longer provided. We recommend that you identify gaps in time series data beforehand, by using tools such as the Data Profiler included in Integration Services.

Procedures

To identify the time key for the forecasting model

1.In the pane, SalesByRegion.dsv [Design], right-click the table vTimeSeries, and then select Explore Data.

A new tab opens, titled Explore vTimeSeries Table.

2.On the Table tab, review the data that is used in the TimeIndex and Reporting Date columns.

Both are sequences with unique values and can both be used as the time series key; however, the data types of the columns are different. The Microsoft Time Series algorithm does not require a datetime data type, only that the values be distinct and ordered. Therefore, either column can be used as the time key for the forecasting model.

3.In the data source view design surface, select the column, Reporting Date and select Properties. Next, click the column TimeIndex and select Properties.

The field TimeIndex has the data type System.Int32, whereas the field Reporting Date has the data type System.DateTime. Many data warehouses convert date/time values to integers and use the integer column as the key, to improve indexing performance. However, if you use this column, the Microsoft Time Series algorithm will make predictions using future values such as 201014, 201014, and

50

so forth. Because you want to represent your sales data forecast by using calendar dates, you will use the Reporting Date column as the unique series identifier.

To set the key in the data source view

1.In the pane SalesByRegion.dsv, select the vTimeSeries table.

2.Right-click the column, Reporting Date, and select Set Logical Primary Key.

Handling Missing Data (Optional)

If any series has missing data, you might get an error when you try to process the model. You have several ways to work around missing data:

You can have Analysis Services fill in missing values, either by calculating a mean, or by using a previous value. You do this by setting the MISSING_VALUE_SUBSTITUTION parameter on the mining model. For more information about this parameter, see Microsoft Time Series Algorithm Technical Reference (Analysis Services - Data Mining). For information about how to change parameters on an existing mining model, see How to Parameters.

You can alter the data source or filter the underlying view to eliminate ragged series or to replace values. You can do this in the relational data source, or you can modify the data source view by creating custom named queries or named calculations. For more information, see Designing Data Source Views (Analysis Services). A later task in this lesson provides an example of how to build both a named query and a custom calculation.

For this scenario, some data is missing at the beginning of one series: that is, there is no data for the T1000 product line until July 2007. Otherwise, all series end on the same date, and there are no missing values.

The requirement of the Microsoft Time Series algorithm is that any series that you include in a single model should have the same ending point. Because the T1000 bicycle model was introduced in 2007, the data for this series starts later than for other bicycle models, but the series ends on the same date; therefore the data is usable.

To close the data source view designer

Right-click the tab, Explore vTimeSeries Table, and select Close.

Next Task in Lesson

Creating a Forecasting Structure and Model (Intermediate Data Mining Tutorial)

See Also

Microsoft Time Series Algorithm (Analysis Services - Data Mining)

51

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]