Health policy settings

Use this page to modify existing health policies. Health policies are used to maintain a healthy environment using prevention and detection methodologies.

To view this administrative console page, click Operational policies > Health policies > health_policy_name.

If you are a user with either a monitor or an operator role, you can view only health policy information. If you are a user with either a configurator or an administrator role, you have all configuration privileges for health policies.

This page has two tabs: Configuration and Local topology. On the Configuration tab, you can view and configure the settings for the health policy. On the Local topology tab, you can view the health policy memberships in a visual representation.

Name

Specifies the name of a health policy. The health policy name is required and must be unique among all the health policies in the cell.

The name cannot begin with a period (.) or a space. A space does not generate an error, but leading and trailing spaces are automatically deleted. Use meaningful and consistent health policy names. For example, age-based health policies can be indicated by naming the policies AGE_20DAYS, AGE_15DAYS.

Description

Specifies an additional description of the health policy. The description is optional. You can edit the description when you are creating or editing a health policy. Consider using the optional description when you are using many health policies or when multiple administrators manage the same set of health policies.

Health condition

The health condition defines the specific policy that is implemented.

Some policies are prevention-based and some are detection-based. Prevention-based policies are used to avoid conditions that might lead to problems, while the detection-based policies are used to identify existing conditions and to achieve resolution. These policies can be used to perform health-based assessments on clusters, dynamic clusters, and application server instances running on nodes. In the case of dynamic clusters, regardless of the health policy that you are using, the minimum number of dynamic cluster instances remains running.

When the age-based condition breaches the user-defined age value, the associated health actions are executed. This restart cleans out all cached and memory acquired data. If you select age-based condition policy, you must define the age criteria. The age-based condition is supported for all server types.
The excessive request timeout condition policy tracks the memory that is used for request timeouts. When the percentage of timeouts exceeds the breach of condition, the associated health actions are executed. If you select the excessive request timeout condition, you must set the memory used percentage threshold. The excessive request timeout condition is supported for all server types.
Restriction: The excessive request timeout condition does not apply to Java Message Service (JMS) and Internet Inter-ORB Protocol (IIOP) traffic.
The excessive response time condition policy tracks the requests and the amount of time they take to complete. When you select the excessive response time policy, you must define the response time threshold. If in a specific time interval, the average response time for the requests exceeds the threshold value, the health policy is triggered. The excessive response time condition is supported for all server types.
The memory condition: excessive memory usage policy tracks the memory usage for a member. When the memory usage exceeds a percentage of the heap size for a specified time, the health actions are executed. If you select the excessive memory usage policy, you must define the memory used and the time-over-memory threshold. The excessive memory usage condition is supported only on application servers on nodes that run WebSphere® Application Server or WebSphere Application Server Community Edition. You cannot define the excessive memory usage condition for other middleware server types.
The memory condition: memory leak policy tracks consistent downward trends in free memory that is available to a server in the Java heap. The detection level setting determines when these trends are detected. If you select the memory condition: memory leak policy, you must define a detection level. The slower detection level setting requires the most historical data. The normal and faster detection level settings require the same amount of historical data, but the faster setting allows analysis before the Java heap expanded to its maximum configured size. This provides earlier detection capability, but is also more prone to false positives. When this condition is breached, the health actions are executed. The memory leak condition is not supported for other middleware server types.
The storm drain condition policy tracks stuck requests. The health actions associated with this policy are executed when the specified detection level is reached. Storm drain detection relies on change point detection on a specific time series data. The metrics that are used for detecting storm drain are the response times and deployment workload manager weights that are observed for the server. The storm drain condition applies only to dynamic clusters and cells. If you select the storm drain condition policy, you must select the detection level.
To detect change points, the health controller calculates a left mean and a right mean for a specific point. For a point, the left mean consists of the mean value of N samples that arrive before this sample, and the right mean is the mean value of N samples, including the current point, that arrive later. The difference of the left and the right mean values is stored and compared with other differences in a set of values to N to determine if this difference is a local maxima. If this difference is the maximum difference, then the point to which this difference corresponds, is declared as a change point. The two metrics that are used for detecting storm drain are the response times and dynamic workload manager weights that are observed for the server.
The storm drain condition is supported for all server types.
Restriction: The storm drain condition does not apply to JMS and IIOP traffic.
The workload condition policy executes the associated health actions, when a specific user-defined number of requests are serviced. If you select the workload policy, you must define the total request criteria. The workload condition is supported for all server types.
The garbage collection percentage condition policy monitors a Java virtual machine (JVM) or set of JVM’s to determine whether they are spending more than the specified percentage of time in garbage collection over the sampling time period. Since garbage collection increases processor usage, this policy can inform you if poor performance is a result of time spent in garbage collection.

Custom condition

Define a custom condition when the default conditions are not what you need for your environment.

With a custom condition, you define a subexpression that is evaluated against environment metrics.

Health condition properties

Specifies properties that are specific to the health condition.

Table 1. Age-based condition properties
Setting	Description
Maximum age	This field is available for the age-based policy. Acceptable values are positive whole numbers in days or hours between `1` hour and `365` days. To enter a value like `1.2` days, use `36` hours, because decimal numbers are not supported.

Table 2. Excessive request timeout condition properties
Setting	Description
Timed out requests	This field is available for the excessive request timeout condition. The excessive request timeout condition detects, for each server that is a member of the policy, the percentage of requests directed at that server which timed out (over a `60` second period) after being routed from the on demand router. Acceptable values for this field are whole numbers between `1` and `99`.

Table 3. Excessive response time condition properties
Setting	Description
Response time	This field is available for the excessive response time condition policy. The excessive response time condition policy is breached when the average response time for the requests exceeds a specific time interval. Acceptable values for this field are between `1` millisecond and `60` minutes.

Table 4. Memory condition: excessive memory usage properties
Setting	Description
JVM heap size	The excessive memory usage condition policy is breached when the memory usage exceeds a percentage of your heap size over time. The total memory used percentage is used with the time over memory threshold value to determine when to restart members. Acceptable values for this field are whole numbers between `1` and `99`.
Offending time period	This field is available for the excessive memory usage condition policy. The excessive memory usage condition policy is breached when the memory usage exceeds a percentage of your heap size over time. Acceptable values for this field are between `1` second and `60` minutes.

Table 5. Memory condition: memory leak condition properties
Setting	Description
Detection level	You can choose from the following detection levels. For each level a trade-off exists between the speed and accuracy of detecting suspected memory leaks. Faster detection, higher probability of false alarms: A faster detection level detects a potential memory leak quickly, however this detection level has a greater chance of falsely identifying a memory leak than a slower detection policy because the analysis is done before the Java heap expands to its maximum configured size. Standard detection, standard probability of false alarms: A standard detection level is more accurate than a faster one, but not as quick to identify a potential memory leak. The standard and faster settings require the same amount of historical data, but the standard setting analyzes after the Java heap has expanded to its maximum configured size. Slower detection, lower probability of false alarms: A slower detection level is the most accurate, however this detection level does not detect a potential memory leak as quickly as the faster detection level does. This slower setting requires the most historical data.

Table 6. Storm drain condition properties
Setting	Description
Detection level	Standard detection, normal probability of false alarms: A standard detection policy is less accurate than a slower one, but quicker to identify a potential storm drain. This level uses fewer samples (N=10) for both response times and dynamic workload manager weights and detects a change point in each of the metrics based on the sample set. As a result, this policy reaches a conclusion faster because it waits for 20 samples, 10 for the left mean and 10 for the right mean, for calculating a difference of means and looking for a local maxima. The samples are collected at intervals of 15 seconds. Therefore, the storm drain can be detected within 5 minutes of its occurrence. However, because the samples are fewer, if the samples have multiple transient peaks or dips, then there is a higher probability for false alarms. Slower detection, lower probability of false alarms: A slower detection policy is the most accurate, however it does not detect a potential storm drain as quickly as the standard detection policy does. This level uses more samples (N=15) for both response times and dynamic workload manager weights. As a result, this policy reaches a conclusion slower because the policy has to wait for 30 samples (15 for the left mean and 15 for the right mean) for calculating a difference of means. The detection time is seven minutes and 30 seconds. However, because there are more samples, the presence of samples with transient peaks or dips does not overly affect the mean values. Therefore the probability of false alarms is lower.

Table 7. Workload condition properties
Setting	Description
Total requests	The workload condition policy is breached when a certain user-defined number of requests are serviced. A request value must be a whole number greater than 1000.

Table 8. Garbage collection percentage condition properties
Setting	Description
Percentage of time spent in garbage collection	The percentage of time spent in garbage collection policy monitors a Java virtual machine (JVM) or set of JVM’s to determine whether they are spending more than a percentage of time in garbage collection over a specified period of time. Units are percentages. The default value is 10. Acceptable values for this field are whole numbers between 1 and 99.
Sampling time period	This field specifies the period of time over which garbage collection data is collected. The percentage of time spent in garbage collection during the sampling time period must be over the threshold value prior to corrective action being taken. Units are minutes and hours. The default value is 2 minutes. Acceptable values for this field are between 1 minute and 24 hours.

Table 9. Custom condition properties
Setting	Description
Run reaction plan when	Specifies a subexpression that represents the metrics that you are evaluating in your custom condition.

Health management monitor reaction

Specifies how the Intelligent Management behaves when a defined health condition is reached.

Reaction mode

Specifies the reaction mode that defines the behavior of the health policy. The reaction mode can be Supervised or Automatic.

When the reaction mode is set to Supervised, health policies are active and recommendations on actions are sent to the administrator with a runtime task. The administrator can follow the recommendations. If the administrator approves a recommendation, actions are taken to improve the health condition automatically.
When the reaction mode is set to Automatic, health policies are actively logging data, and the product automatically takes actions to improve the health conditions, without approval from the administrator.

Take the following actions when the health condition breaches

You can define a specific set of actions to occur when the health condition breaches. These actions can be the existing default actions, or you can define custom actions to run an executable file.

A list of actions displays in the order that they are run when the health condition breaches. To add an action, click Add action.... You can choose an existing default health policy action, a custom action that you have created, or you can create a new custom action.

To remove a step, select the step and click Remove action. To change the order of your steps, select one step to move and click Move up or Move down.

Memberships

Specifies the members for the health policy, which activates the health policy that is defined for the members. Membership is not a one-to-one relationship; members can be associated with multiple policies.

Edit the Membership field by selecting the appropriate member type from the list. The resulting potential members display in the Available for membership field. Select the appropriate members from the Available for membership list. To select multiple members, press the control key until all of your selections are highlighted, and click Add to add your selection to the membership for the health policy.

File name: hc_detail_main.html