Chapter Contents
Chapter Contents
Introduction to Optimization

Model Formats: PROC LP and PROC NETFLOW

Model generation and maintainence are often difficult and expensive aspects of applying mathematical programming techniques. The flexible input formats for the optimization procedures in SAS/OR software simplify this task.

A small product mix problem serves as a starting point for a discussion of different types of model formats supported in SAS/OR software.

A candy manufacturer makes two products: chocolates and toffee. What combination of chocolates and toffee should be produced in a day in order to maximize the company's profit? Chocolates contribute $0.25 per pound to profit and toffee contributes $0.75 per pound. The variables chocolates and toffee are the decision variables.

Four processes are used to manufacture the candy:

  1. Process 1 combines and cooks the basic ingredients for both chocolates and toffee.
  2. Process 2 adds colors and flavors to the toffee, then cools and shapes the confection.
  3. Process 3 chops and mixes nuts and raisins, adds them to the chocolates, then cools and cuts the bars.
  4. Process 4 is packaging: chocolates are placed in individual paper shells; toffee are wrapped in cellophane packages.
During the day, there are 7.5 hours (27,000 seconds) available for each process.

Firm time standards have been established for each process. For Process 1, mixing and cooking take 15 seconds for each pound of chocolate and 40 seconds for each pound of toffee. Process 2 takes 56.25 seconds per pound of toffee. For Process 3, 18.75 seconds are required for each pound of chocolate. In packaging, a pound of chocolates can be wrapped in 12 seconds, whereas 50 seconds are required for a pound of toffee. These data are summarized below:

                      Available       Required per Pound
                        Time         chocolates     toffee
     Process            (sec)          (sec)         (sec)
   1 Cooking            27,000           15          40
   2 Color/Flavor       27,000                      56.25
   3 Condiments         27,000          18.75
   4 Packaging          27,000          126          50

The object is to

0.25(chocolates) + 0.75(toffee)
which is the company's total profit.

The production of the candy is limited by the time available for each process. The limits placed on production by Process 1 are expressed by the inequality
Process 1:
15(chocolates) + 40(toffee) \leq 27,000

Process 1 can handle any combination of chocolates and toffee that satisfies this inequality.

The limits on production by other processes generate constraints described by these inequalities:
Process 2:
56.25(toffee) \leq 27,000
Process 3:
18.75(chocolates) \leq 27,000
Process 4:
12(chocolates) + 50(toffee) \leq 27,000

This linear program illustrates the type of problem known as a product mix example. The mix of products that maximizes the objective without violating the constraints is the solution. There are two formats that can represent this model.

Dense format

The following DATA step creates a SAS data set for the preceding problem. Notice that the values of CHOCO and TOFFEE in the data set are the coefficients of those variables in the equations corresponding to the objective function and constraints. The variable _id_ contains a character string that names the rows in the problem data set. The variable _type_ is a character variable that contains keywords that describes the type of each row in the problem data set. And the variable _rhs_ contains the right-hand-side values.

   data factory;
      input _id_ $ CHOCO TOFFEE _type_ $ _rhs_;
object      0.25    0.75   MAX     .
process1   15.00   40.00   LE  27000
process2    0.00   56.25   LE  27000
process3   18.75    0.00   LE  27000
process4   12.00   50.00   LE  27000

To solve this using the Interior Point algorithm of PROC NETFLOW, specify:

   proc netflow arcdata=factory condata=factory;
However, this example will be solved by the LP procedure. Because the special variables _id_, _type_, and _rhs_ are used in the problem data set, there is no need to identify them to the LP procedure.

   proc lp;

The output from the LP procedure is in four sections.

Problem Summary

The first section of the output, PROBLEM SUMMARY, describes the problem by identifying the objective function (defined by the first observation in the data set used as input); the right-hand-side variable; the type variable; and the density of the problem. The problem density describes the relative number of elements in the problem matrix that are nonzero. The fewer zeros in the matrix, the higher the problem density. The PROBLEM SUMMARY describes the problem giving the number and type of variables in the model and the number and type of constraints. The types of variables in the problem are identified. Variables are either structural or logical. Structural variables are identified in the VAR statement, when the dense format is used. They are the unknowns in the equations defining the objective function and constraints. By default, PROC LP assumes that structural variables have the additional constraint that they must be nonnegative. Upper and lower bounds to structural variables can be defined.

The SAS System

The LP Procedure

Problem Summary
Objective Function Max object
Rhs Variable _rhs_
Type Variable _type_
Problem Density (%) 41.67
Variables Number
Non-negative 2
Slack 4
Total 6
Constraints Number
LE 4
Objective 1
Total 5


The Problem Summary shows, for example, that there are 2 nonnegative decision variables, namely CHOCO and TOFFEE. It also shows that there are 4 constraints of type LE.

After the procedure displays this information, it solves the problem, then produces the SOLUTION SUMMARY.

Solution Summary

This section (Figure 1.9) gives information about the solution that was found; whether the optimizer terminated successfully having found the optimum.

When PROC LP solves a problem, an iterative process is used. First, the procedure finds a feasible solution, one that satisfies the constraints. The second phase finds the optimum solution from the set of feasible solutions. The SOLUTION SUMMARY gives the number of iterations the procedure took in each of these phases, the number of variables in the initial feasible solution, the time the procedure used to solve the problem, and the number of matrix inversions necessary.

The LP Procedure

Solution Summary

Terminated Successfully
Objective Value 475
Phase 1 Iterations 0
Phase 2 Iterations 3
Phase 3 Iterations 0
Integer Iterations 0
Integer Solutions 0
Initial Basic Feasible Variables 6
Time Used (seconds) 0
Number of Inversions 2
Epsilon 1E-8
Infinity 1.797693E308
Maximum Phase 1 Iterations 100
Maximum Phase 2 Iterations 100
Maximum Phase 3 Iterations 99999999
Maximum Integer Iterations 100
Time Limit (seconds) 120


After performing 3 phase 2 iterations, the procedure terminated successfully finding an optimum with optimal objective value of 475.

Variable Summary

The next section of the output is the VARIABLE SUMMARY. For each variable, the variable summary (Figure 1.10) gives the value, objective function coefficient, status in the solution, and reduced cost.

The LP Procedure

Variable Summary
Col Variable Name Status Type Price Activity Reduced Cost
1 CHOCO BASIC NON-NEG 0.25 1000 0
3 process1   SLACK 0 0 -0.012963
4 process2 BASIC SLACK 0 10125 0
5 process3 BASIC SLACK 0 8250 0
6 process4   SLACK 0 0 -0.00463


The VARIABLE SUMMARY contains details about each variable in the solution. The Activity column shows that optimum profitability is achieved when 1000 pounds of chocolate and 300 pounds of toffee are produced. The variables process1, process2, process3, and process4 correspond to the four slack variables in the Process 1, Process 2, Process 3, and Process 4 constraints, respectively. Producing 1000 pounds of chocolate and 300 pounds of toffee a day leaves 10,125 seconds of slack time in Process 2 (where colors and flavors are added to the toffee) and 8,250 seconds of slack time in Process 3 (where nuts and raisins are mixed and added to the chocolate).

Constraint Summary

The last section of the output is the CONSTRAINT SUMMARY. This output (Figure 1.11) gives the value of the objective function, the value of each constraint, and the dual activities.

The ACTIVITY column gives the value of the right-hand side of each equation when the problem is solved using the information given in the previous table.

The DUAL ACTIVITY column reveals that each second in Process 1 (mixing-cooking) is worth approximately $.013, and each second in Process 4 (Packaging), is worth approximately $.005. These figures (called shadow prices) can be used to decide whether the total available time for Processes 1 and 4 should be increased. If a second can be added to the total production time in Processes 1 for less than $.013, it would be profitable to do so. The dual activities for Process 1 and 4 are zero since adding time to those processes does not increase profits. Keep in mind that the dual activity gives the marginal improvement to the objective and that adding time to Process 1 changes the original problem and solution.

The LP Procedure

Constraint Summary
Row Constraint Name Type S/S Col Rhs Activity Dual Activity
1 object OBJECTVE . 0 475 .
2 process1 LE 3 27000 27000 0.012963
3 process2 LE 4 27000 16875 0
4 process3 LE 5 27000 18750 0
5 process4 LE 6 27000 27000 0.0046296


For a complete description of the output from PROC LP, see the chapter on teh LP procedure.

Sparse Format

Typically, mathematical programming models are sparse. That is, few of the coefficients in the constraint matrix are nonzero. The dense problem format shown in the last section would be an inefficient way to represent sparse models. The LP procedure also accepts data in a sparse input format. Only the nonzero coefficients need be specified. It is consistent with the standard MPS sparse format and much more flexible so that models using the MPS format can be easily converted to the LP format. The appendix at the end of this book describes a SAS macro for conversion.

Although the factory example of the last section is not sparse, illustrated here is an example of the sparse input format. The sparse data set has four variables: a row type identifying variable, a row name variable, a column name variable, and a coefficient variable.

   data factory;
      format _type_ $8. _row_ $16. _col_ $16.;
      input _type_ $_row_ $ _col_ $ _coef_ ;
max      object     .            .
.        object     chocolate    .25
.        object     toffee       .75
le       process1   .            .
.        process1   chocolate    15
.        process1   toffee       40
.        process1   _RHS_        27000
le       process2   .            .
.        process2   toffee       56.25
.        process2   _RHS_        27000
le       process3   .            .
.        process3   chocolate    18.75
.        process3   _RHS_        27000
le       process4   .            .
.        process4   chocolate    12
.        process4   toffee       50
.        process4   _RHS_        27000

To solve this problem using the Interior Point algorithm of PROC NETFLOW, specify

   proc netflow sparsecondata arcdata=factory condata=factory;
However, this example will be solved by the LP procedure.

Notice that the _type_ variable contains keywords as for the dense format, the _row_ variable contains the row names in the model, the _col_ variable contains the column names in the model, and the _coef_ variable contains the coefficients for that particular row and column. Since the row and column names are the values of variables in a SAS data set, they are not limited to 8 characters. This feature, as well as the order independence of the format, simplify matrix generation.

The SPARSEDATA option in the PROC LP statement tells the LP procedure that the model in the problem data set is in the sparse format. This example also illustrates how the solution of the linear program is saved in two output data sets: the primal data set and the dual data set.

   proc lp 
      data=factory sparsedata 
      primalout=primal dualout=dual;

The primal data set (Figure 1.12) contains the information that is displayed in the VARIABLE SUMMARY plus additional information about the bounds on the variables.

1 object _RHS_ chocolate NON-NEG _BASIC_ 0 1000 1.7977E308 0.25 -0.000000
2 object _RHS_ toffee NON-NEG _BASIC_ 0 300 1.7977E308 0.75 0.000000
3 object _RHS_ process1 SLACK   0 0 1.7977E308 0.00 -0.012963
4 object _RHS_ process2 SLACK _BASIC_ 0 10125 1.7977E308 0.00 0.000000
5 object _RHS_ process3 SLACK _BASIC_ 0 8250 1.7977E308 0.00 0.000000
6 object _RHS_ process4 SLACK   0 0 1.7977E308 0.00 -0.004630
7 object _RHS_ PHASE_1_OBJECTIV OBJECT _DEGEN_ 0 0 0 0.00 0.000000
8 object _RHS_ object OBJECT _BASIC_ 0 475 1.7977E308 0.00 0.000000

Figure 1.12: Primal data set

The dual data set (Figure 1.13) contains the information that is displayed in the CONSTRAINT SUMMARY plus additional information about bounds on the rows.

1 object _RHS_ object OBJECT 475 475 475 475 .
2 object _RHS_ process1 LE 27000 -1.7977E308 27000 27000 0.012963
3 object _RHS_ process2 LE 27000 -1.7977E308 16875 27000 0.000000
4 object _RHS_ process3 LE 27000 -1.7977E308 18750 27000 0.000000
5 object _RHS_ process4 LE 27000 -1.7977E308 27000 27000 0.004630

Figure 1.13: Dual data set

Network Format

Network flow problems can be described by specifing the nodes in the network and their supplies and demands, and the arcs in the network and their costs and capacities and lower flow bounds. Consider the simple transshipment problem in Figure 1.14 as an illustration.

intromp1.gif (2357 bytes)

Figure 1.14: Transshipment Problem

Suppose the candy manufacturing company has two factories, two warehouses, and three customers for chocolate. The two factories each have a production capacity of 500 pounds per day. The three customers have demands of 100, 200, and 50 pounds per day, respectively.

The following data set describes the supplies (positive values for the supdem variable) and the demands (negative values for the supdem variable) for each of the customers and factories.

   data nodes;
      format node $16. ;
      input node $ supdem;
customer_1   -100
customer_2   -200
customer_3    -50
factory_1     500
factory_2     500

Suppose that there are two warehouses that are used to store the chocolate before shipment to the customers and that there are different costs for shipping between each factory, warehouse, and customer.

What is the minimum cost routing for supplying the customers?

Arcs are described in another data set. Each observation defines a new arc in the network and gives data about the arc. For example, there is an arc between the node factory_1 and the node warehouse_1. Each unit of flow on that arc costs 10. Although this example does not include it, lower and upper bounds on the flow across that arc can be listed here.

   data network;
      format from $16. to $16.;
      input from $ to $ cost ;
factory_1     warehouse_1  10 
factory_2     warehouse_1   5
factory_1     warehouse_2   7
factory_2     warehouse_2   9
warehouse_1   customer_1    3
warehouse_1   customer_2    4
warehouse_1   customer_3    4
warehouse_2   customer_1    5
warehouse_2   customer_2    5
warehouse_2   customer_3    6

Use PROC NETFLOW to find the minimum cost routing. This procedure takes the model as defined in the network and nodes data sets and finds the minimum cost flow.

   proc netflow arcout=arc_sav
                arcdata=network nodedata=nodes;
     node node;      /* node data set information */
     supdem supdem;
     tail from;      /* arc data set information */
     head to; 
     cost cost;


   proc print;
      var from to cost _capac_ _lo_ _supply_ _demand_  
          _flow_ _fcost_ _rcost_;
      sum _fcost_;

PROC NETFLOW produces the following on the SAS log:

NOTE: Number of nodes= 7 .
NOTE: Number of supply nodes= 2 .
NOTE: Number of demand nodes= 3 .
NOTE: Total supply= 1000 , total demand= 350 .
NOTE: Number of arcs= 10 .
NOTE: Number of iterations performed (neglecting any constraints)= 7 .
NOTE: Of these, 2 were degenerate.
NOTE: Optimum (neglecting any constraints) found.
NOTE: Minimal total cost= 3050 .
NOTE: The data set WORK.ARC_SAV has 10 observations and 13 variables.

The solution (Figure 1.15) saved in the arc_sav data set shows the optimal amount of chocolate to send across each arc (the amount to ship from each factory to each warehouse and from each warehouse to each customer) in the network per day.

Obs from to cost _CAPAC_ _LO_ _SUPPLY_ _DEMAND_ _FLOW_ _FCOST_ _RCOST_
1 warehouse_1 customer_1 3 99999999 0 . 100 100 300 .
2 warehouse_2 customer_1 5 99999999 0 . 100 0 0 4
3 warehouse_1 customer_2 4 99999999 0 . 200 200 800 .
4 warehouse_2 customer_2 5 99999999 0 . 200 0 0 3
5 warehouse_1 customer_3 4 99999999 0 . 50 50 200 .
6 warehouse_2 customer_3 6 99999999 0 . 50 0 0 4
7 factory_1 warehouse_1 10 99999999 0 500 . 0 0 5
8 factory_2 warehouse_1 5 99999999 0 500 . 350 1750 .
9 factory_1 warehouse_2 7 99999999 0 500 . 0 0 .
10 factory_2 warehouse_2 9 99999999 0 500 . 0 0 2

Figure 1.15: ARCOUT data set

Notice which arcs have positive flow (_FLOW_ is greater than 0). These arcs indicate the amount of chocolate that should be sent from factory_2 to warehouse_1 and from there on to the three customers. The model indicates no production at factory_1 and no use of warehouse_2.

intromp2.gif (2449 bytes)

Figure 1.16: Optimal Solution for the Transshipment Problem

Chapter Contents
Chapter Contents

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.