Добавил:

Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Национальный исследовательский университет «Высшая школа экономики»

Предмет:

[НЕСОРТИРОВАННОЕ]

Файл:

SQL Server 2012 Tutorials - Analysis Services Data Mining.pdf

Скачиваний:

Добавлен:

26.03.2016

Размер:

1.41 Mб

Скачать

☆

<<< < Предыдущая 1 2 3 4 5 6 7 8 9 10 11 1213 / 4113 14 15 16 17 18 19 20 21 22 23 24 25 > Следующая >>>

Lesson 2: Building a Forecasting Scenario (Intermediate Data Mining Tutorial)

As the sales analyst for Adventure Works Cycles, you have been asked to forecast the sales of products for the next year. In particular, you have been asked to compare forecasts for the different regions and product lines. Additionally, you have been asked to determine whether sales of different products vary depending on the time of the year.

To find the requested information, in this lesson you will summarize the company's sales data at the monthly level, and you will also summarize sales figures by three regions: Europe, North America, and the Pacific.

After you complete the tasks in this lesson, you will be able to answer the following questions:

•How do the sales of different bike models change over time?

•Are there differences between the patterns for sales in the three regions?

•Can we forecast sales peaks?

The lesson can be completed in two parts:

•Part One introduces the basics of how to create and use a time series model.

•Part Two walks you through creation of a general time series model, based on all regions, that can be used for cross-prediction.

To complete the tasks in this lesson, which are listed below, you will use the data source that you created in Lesson 1: Creating the Intermediate Data Mining Solution (Intermediate Data Mining Tutorial).

Warning

The dates in the Adventure Works Cycles sample database have been updated for this release. If you use an earlier version of Adventure Works Cycles, you can build the model following these steps, but you might see different results.

Creating a Simple Forecasting Model

•Adding a Data Source View for Forecasting (Intermediate Data Mining Tutorial)

•Creating a Forecasting Mining Model Structure (Data Mining Tutorial)

•Modifying the Forecasting Structure (Data Mining Tutorial)

•Customizing and Processing the Forecasting Model (Intermediate Data Mining Tutorial)

•Exploring the Forecasting Model (Data Mining Tutorial)

•Creating Time Series Predictions (Intermediate Data Mining Tutorial)

Creating a General Forecasting Model for Cross-Prediction

•Adding an Aggregated Forecasting Model (Intermediate Data Mining Tutorial)

•Understanding Trends in the Time Series Model (Intermediate Data Mining Tutorial)

•Predicting using the Averaged Forecasting Model (Intermediate Data Mining Tutorial)

•Comparing Predictions for Forecasting Models (Intermediate Data Mining Tutorial)

Next Task in Lesson

Adding a Data Source View for Forecasting (Intermediate Data Mining Tutorial)

All Lessons

Lesson 1: Creating the Intermediate Data Mining Solution

Lesson 2: Forecasting Scenario (Intermediate Data Mining Tutorial) Lesson 3: Market Basket Scenario (Intermediate Data Mining Tutorial)

Lesson 4: Sequence Clustering Scenario (Intermediate Data Mining Tutorial)

Lesson 5: Neural Network and Logistic Regression Scenario (Intermediate Data Mining Tutorial)

See Also

Microsoft Time Series Algorithm (Analysis Services - Data Mining)

Understanding the Requirements for a Time Series Model (Intermediate Data Mining Tutorial)

When you are preparing data for use in a forecasting model, you must ensure that your data contains a column that can be used to identify the steps in the time series. That column will be designated as the Key Time column. Because it is a key, the column must contain unique numeric values.

Choosing the right unit for the Key Time column is an important part of analysis. For example, suppose your sales data is refreshed on a minute by minute basis. You would not necessarily use minutes as the unit for the time series; you might find it more meaningful to roll up sales data by the day, week, or even month. If you are unsure which unit of time to use, you can create a new data source view for each aggregation, and build related models, to see if different trends emerge at each level of aggregation.

For this tutorial, sales data is collected on a daily basis in the transactional sales database, but for data mining, the data has been pre-aggregated by the month, using a view.

Additionally, it is desirable for analysis that the data have as few gaps as possible. If you plan to analyze multiple series of data, all series should preferably start and end on the same date. If the data has gaps, but the gaps are not at the beginning or end of a series, you can use the MISSING_VALUE_SUBSTITUTION parameter to fill in the series. Analysis Services also provides several options for replacing missing data with values, such as using means or constants.

Warning

The PivotChart and PivotTable tools that were included in earlier versions of the data source view designer are no longer provided. We recommend that you identify gaps in time series data beforehand, by using tools such as the Data Profiler included in Integration Services.

Procedures

To identify the time key for the forecasting model

1.In the pane, SalesByRegion.dsv [Design], right-click the table vTimeSeries, and then select Explore Data.

A new tab opens, titled Explore vTimeSeries Table.

2.On the Table tab, review the data that is used in the TimeIndex and Reporting Date columns.

Both are sequences with unique values and can both be used as the time series key; however, the data types of the columns are different. The Microsoft Time Series algorithm does not require a datetime data type, only that the values be distinct and ordered. Therefore, either column can be used as the time key for the forecasting model.

3.In the data source view design surface, select the column, Reporting Date and select Properties. Next, click the column TimeIndex and select Properties.

The field TimeIndex has the data type System.Int32, whereas the field Reporting Date has the data type System.DateTime. Many data warehouses convert date/time values to integers and use the integer column as the key, to improve indexing performance. However, if you use this column, the Microsoft Time Series algorithm will make predictions using future values such as 201014, 201014, and

so forth. Because you want to represent your sales data forecast by using calendar dates, you will use the Reporting Date column as the unique series identifier.

To set the key in the data source view

1.In the pane SalesByRegion.dsv, select the vTimeSeries table.

2.Right-click the column, Reporting Date, and select Set Logical Primary Key.

Handling Missing Data (Optional)

If any series has missing data, you might get an error when you try to process the model. You have several ways to work around missing data:

•You can have Analysis Services fill in missing values, either by calculating a mean, or by using a previous value. You do this by setting the MISSING_VALUE_SUBSTITUTION parameter on the mining model. For more information about this parameter, see Microsoft Time Series Algorithm Technical Reference (Analysis Services - Data Mining). For information about how to change parameters on an existing mining model, see How to Parameters.

•You can alter the data source or filter the underlying view to eliminate ragged series or to replace values. You can do this in the relational data source, or you can modify the data source view by creating custom named queries or named calculations. For more information, see Designing Data Source Views (Analysis Services). A later task in this lesson provides an example of how to build both a named query and a custom calculation.

For this scenario, some data is missing at the beginning of one series: that is, there is no data for the T1000 product line until July 2007. Otherwise, all series end on the same date, and there are no missing values.

The requirement of the Microsoft Time Series algorithm is that any series that you include in a single model should have the same ending point. Because the T1000 bicycle model was introduced in 2007, the data for this series starts later than for other bicycle models, but the series ends on the same date; therefore the data is usable.

To close the data source view designer

•Right-click the tab, Explore vTimeSeries Table, and select Close.

Next Task in Lesson

Creating a Forecasting Structure and Model (Intermediate Data Mining Tutorial)