Home
BUGS
CHANGES.PLAINTEXT
CHANGES
CONFIG
HOWTO
INSTALL
IPCFile.pm
LICENSE
NEWAGENT
PROTOCOL
Q+A
README
README.rpm
README.win32
RELEASE_NOTES-0.30
RELEASE_NOTES-0.32
RELEASE_NOTES-0.33
RELEASE_NOTES-0.37
RELEASE_NOTES-0.38
RELEASE_NOTES-0.93
RELEASE_NOTES-0.94
RELEASE_NOTES-0.95
RELEASE_NOTES-0.98a
RELEASE_NOTES-0.98b4
RELEASE_NOTES-0.98b5
RELEASE_NOTES-0.98b6
RELEASE_NOTES-0.98c7
RELEASE_NOTES-0.98c8
RELEASE_NOTES-0.99b1
RELEASE_NOTES-0.99b1.win32
SNMP_AGENT
TODO
UPDATE



This file should be a detailed description of
the config files. Unfortunately it's not finished
yet ...


Common Features



Most of the config files now (release >=0.93) share
some characteristics:

  - The config files are now searched (in order) in
    the adm subdirectory, then in the etc directory,
    then in the root install directory. Therefore
    you decide which file should go into what directory
    at your site.

  - comments are lines starting with a '#' character

  - Other files can be included via the 'include'
    directive: put the word 'include' at the start
    of a line followed by the file name.

    The file name may contain variables (starting
    with '$'). Currently only one variable is
    defined though - "HOST".

    E.g.:

      include sample-file1

    will find sample-file1 in the adm, etc or install
    directory and include this file, while

      include etc/sample-file1

    will include sample-file1 located in the etc
    directory and

      include /usr/local/etc/sample-file1

    will include /usr/local/etc/sample-file1 and

      include $HOST-sample-file1

    will include the file myhost-sample-file1 if
    Big Sister is running on a machine called 'myhost'.




file adm/bb-display.cfg



This file contains the configuration for the Big
Sister Status Collector 'bbd'. Its name is bb-display.cfg
since it originally did merely describe what html
pages bbd was supposed to maintain.

The file contains a set of one-line statements
beginning with '%'. A few statements (e.g. %Groups)
expect multi-line argument data. In this case
the lines between this statement and the next
line starting with '%' is treated as arguments.
Lines starting with '#' are silently ignored.

Statements:


    %Port portnumber

	Arguments:

	    portnumber: a port number (or service name)

	Set the port bbd will listen to. Note that this has
	no influence on uxmon agents. If you configure the
	port in bb-display.cfg to be something else than
	1984 you will have to mention this port in uxmon-net
	too, e.g.

		myhost	port=3956 bsdisplay


    %Option [+|-]option1 ...

	Arguments:

	    optionN: option name

	Set Status Collector options. If prefixed with
	a '-' switch the option off, if prefixed with
	'+' switch it on. If no prefix is given '+' is
	assumed.

	Known options:

		DNS		(since 0.98: default: off)

			if enabled bbd and bsmon will under
			some circumstances heavily use
			gethostby* calls to translate IP
			addresses into host names and vice
			versa. Since on many systems name
			resolution is done via DNS this will
			often lead to performance degradation
			when there is a problem accessing the
			DNS server(s) - the situation when Big
			Sister should work reliably. It is
			a good idea to not use name resolution
			in critical parts of Big Sister at all.
			Switching the DNS option off will restrict
			name resolution usage to non-critical
			Big Sister modules.

		ImmediateHTML	(default: off)

			write HTML status files immediately
			on status receipt no matter if the
			status text has changed or not.

		KeepGroups	(default: off)

			read saved grouping information on
			startup and whenever the configuration
			changes - do not lose dynamic group
			information

		BBLog		(default: on)

			Log incoming status messages to
			www/logs/*.* files ("BB compatible
			logging")

		StartOK		(default: off)

			When the display server receives the
			first status message for a new host/check,
			assume the status was green before.
			This has effects on alarming: If StartOK
			is off then no alarm will be generated
			if a newly added check already reports
			"down". If StartOK is on then the alarm
			generator will handle the status as if
			it had changed from green to the reported
			status.


    %Autoconn host1 ... hostn

	Arguments:

	    hostx: host name

	NOTE: the host name must be resolvable!

	Tell bbd to automatically set the status of
	host.conn to 'green' each time a connection is
	coming in from this host. Set status to 'red'
	(not to 'purple') if no report for >15 min.


    %Autojoin what group

	Arguments:

	    what: either 'new', 'all' or 'all_hosts'

	    group: a group name

	%Autojoin new GROUP

	    tells bbd to automatically put
	    any newly appearing host into
	    the group 'GROUP'
	    ("newly appearing" means that bbd
	    is receiving status messages for a
	    host not yet known)

	%Autojoin all_hosts GROUP

	    tells bbd to automatically put
	    every known host into group 'GROUP'
	    (where "host" means every object
	    bbd is receiving status messages
	    for)

	%Autojoin all GROUP

	    tells bbd to automatically put
	    every known object - hosts and
	    groups - into group 'GROUP'


    %Autojoin pattern regexp GROUP

	   tells Big Sister to automatically
	   put a host which name matches "regexp"
	   into group GROUP as soon as it first
	   appears. Regexp is a Perl regular
	   expression.

	   Examples:

	       %Autojoin pattern domain1\.com$ DOMAIN1
		   %Autojoin pattern domain2\.com$ DOMAIN2


    %Pager cmd

	Arguments:

	    cmd: the name of a program that should
		 be invoked when bbd gets a page
		 request ...

	The %Pager command is provided for compatibility
	with Big Brother. Big Brother clients use to send
	pages directly via their Display Server. Usually
	- in a Big Brother environment - you will use %Pager
	like:

		%Pager /usr/where-bb-is/bin/bb-pager.sh


    %Groups

	Arguments: none

	The %Groups statement is followed by a
	number of lines with the syntax:

	    name(Display name)	GROUP1 ... GROUPN

	The meaning is: 'name' will be printed as
	'Display name' when appearing on a html
	page and belongs to the groups GROUP1 ... GROUPN.

	The group definitions are recursive.
	This means that e.g.:

	    host1(Computer 1)	GROUP1 GROUP2
	    GROUP1(Group of computers)	GROUP3

	is valid and means that the reported status
	values for host1 will influence not only
	GROUP1 and GROUP2 but also GROUP31.


    %Itemgroup groupname groupmember ...

	Arguments:

		groupname: name of the item group to be defined
		groupmember: item name

    The %Itemgroup statement defines a group containing
	a list of items (where an item is a check/column name).
	Item groups can be used in %select_items statements.

	Note: groupmembers are always column/check names,
	      itemgroups must not be groupmembers

    Example:

		%Itemgroup LOCAL disk cpu bak msgs
		%Itemgroup NETWORK net smtp ssh
		%Page top SomePage
		%select_items LOCAL
		table ALL
		%select_items NETWORK
		table ALL
		

	%Section title

	Arguments:

	    title: the section title

    When using a skin with a contents bar (frames) this will
	emit a section title in the contents bar
	

    %Page name title template

	Arguments:

	    name: the file name of the created page.
		  ".html" is automatically appended if
		  the name is not ending in ".htm" or
		  ".html". The file is created in
		  the directory 'www'

	    title: the title of the created page

	    template: the template file used (default
		  is 'template.proto'. The file is
		  expected to be in the 'www' directory.

	Note: Since release 0.29 'template' is obsolete
	      and should not be used any more. Rather
	      use the 'skin' mechanism as described
	      below

	The %Page statement is usually followed by
	one or more statements describing the contents
	of the page (e.g. %table).

	The template file is a html file which contains
	variable references. 'bbd' does - when creating
	a page - read in the template file, replace
	the variables by their values and write the
	result to the right file. Valid variables names
	are:

		@TITLE@		- The page title
		@BGROUND@	- the URL to the background
				  graphics file
		@EXPIRES@	- the time "now+5 minutes"
				  when the page contents
				  will expire
		@TIME@		- the current time in human
				  readable format
		@TEXT@		- the page text generated by
				  'bbd'


    %skin skin1 ... skinN

	Arguments: list of skin names

	Use the skinset skin1 thru skinN to describe the
	look of created pages. See the www/skins directory
	for valid skin names and www/skins/*/README files
	for what the respective skins are meant to introduce.

	Most of the skins are incomplete - means they add
	certain details to another skin. Therefore you can
	list more than one skin in one %skin statement.
	The skins are treated in order - later listed skins
	taking precedence over earlier listed ones. The
	'default' skin is always the first skin included.

	E.g.:

		%skin default white_bg static_lamps

	will force Big Sister to create pages with the
	default look but with a white background rather
	than the variable one and static instead of
	blinking lights. Since the default skin is
	always implicitly included it's exactly the
	same as:

		%skin white_bg static_lamps


    %Logskin skin1 ... skinN

	Arguments: like %skin

	%Logskin is similiar to %skin, but it does define
	the look of log pages (the pages where the current
	status of a monitor is shown)


    %title title

	Arguments: title

	This statement should only be used within %Page
	statements. It tells 'bbd' which title should
	be used for the tables generated by following
	%table statements. If 'title' is set to 'auto'
	the display names as defined in the %Groups
	section are used.


    %refto url
	
	(see also below)

	Arguments: url

	This statement should only be used within %Page
	statements. When creating tables 'bbd' will
	usually create hypertext links for hosts or
	groups appearing in the table (left row). This
	is done by appending a "#" charakter plus the
	name of the host or group to the url given.

	Special pseudo-urls:

		none	- omit creating hypertext links
		self	- the host/group will be found
			  on the same page
		clear	- clear table with individual
			  urls (see below)


    %refto name url

	Arguments: name	- name of group or host displayed
		   url	- the URL which should appear in this group's
			  hrefs

	(see also above)
	This statement allows to set individual hypertext
	links for specific groups or hosts. If a group or
	host called 'name' appears in a table the
	"%refto url" is overriden by the url given in the
	respective "%refto name url" command. You can set
	up a table of name/url pairs by using multiple
	%refto statements. The table is truncated with
	the statement "%refto clear".


    %sort criteria

	Arguments: criteria

	Sort hosts/groups within tables by "criteria".
	Known criterias so far are:

	  - severity: sort by status (most alarming status
	    first), within same status sort by name
	  - name: sort by name only


    %select [<|>|<=|>=]color

    	Arguments: comparison, color

	Selects which hosts should appear in a table. Only
	hosts which summarized status <, >, <=, >= the given
	status color are displayed. E.g. you can make Big
	Sister display every host with status worse than
	green with

		%select <green
		%table ALL


    %select_names pattern ...

	    Arguments: pattern: Perl regular expression

    Selects which hosts should appear in a table. Every
	host name is matched against the pattern list (Perl
	regular expressions, case insensitive).

	Examples:

		%select_names s.* a.*
		%table WEST

        will display all the hosts in table WEST with
		names starting with 's' or 'a'


    %select_items item|group ...

	Arguments: item: an item (column/check) to be
				  displayed in a status table
			   group: an item group (see %Itemgroup)
			   	  to be displayed in a status table

    Selects which columns should appear in a status
	table.

    Note: There is a special predefined group called
	      ALL (meaning "all items"). However, something
		  like "%select_items ALL -disk -cpu" will not
		  work!

	Examples:

		%select_items cpu disk bak
		%select_items ALL
		%Itemgroup SOME cpu disk bak ssh smtp
		%select_items SOME -ssh

    See also: %Itemgroup


    %itemref url

	Arguments: url

	This statement should only be used within %Page
	statements. When creating tables 'bbd' will
	usually create hypertext links for each status
	lamp in the table. This is done by appending
	a '/' character, the name of the host/group, a
	'.' character plus the name of the status item
	to the url given.

	If a file with the prefix '.html' exists, then
	this one is used...

	Special pseudo-urls:

		none	- omit creating hypertext links

	Reasonable %itemrefs are e.g.:

		%itemref logs

			point to the status message collected
			by the Status Collector

		%itemref html

			point to the HTML version of the
			status message



    %table GROUP1 ... GROUPN

	Arguments: list of group names

	This statement should only be used within %Page
	statements. It tells 'bbd' to create a table
	for each of the arguments containing all the
	hosts/groups contained in the respective group.

	Since 0.22beta each group may be prefixed by
	one or more "+" characters. In this case the
	table will not contain the groups/hosts in
	the respective group but all the groups/hosts
	found when descending the group tree.


    %image cfg-file

	Arguments: configuration file

	Inserts an image/HTML image map into the page.
	Please see below for a description of the config
	file (default: adm/display_map.cfg)

	NOTE: when using %image you must have installed
	      the GD perl module.


    %ref name

	Arguments: name

	This statement should only be used within %Page
	statements. It tells 'bbd' to create a HTML label
	<A NAME ...> at the current position in the html
	code. 'bbd' does automatically generate labels
	for any table appearing within the page, so this
	statement is not commonly used.


    %Rsync host[:port] prefix GROUP1 ... GROUPN

	Arguments: host	- remote status collector's host name
		   port - remote status collector's port
			  (defaults to 1984)
		   prefix - the prefix prepended to each
			  host name reported to the remote
			  status collector
		   GROUP1 ... GROUPN - a list of groups that
			  must be reported to the remote status
			  collector

	This statement tells 'bbd' to regularly (each 5 minutes
	cycle) build a list of stati known to it and report them
	to a remote status collector. Each host name is prepended
	by a prefix (prefix "none" means no prefix). Currently
	only host stati can be synced. No group information is
	exchanged.


    %Frameset output-file initial-page title

	Arguments: output-file	the name of the generated html file
		   initial-page	the name of the html file initially
				displayed
		   title	the title of the page

	This creates a frameset frameing the Big Sister status
	pages.

	NOTE: The frameset is only generated when bb-display.cfg
	      is re-read (e.g. after a configuration change or
	      on startup).

	NOTE: for %Frameset to work correctly a skinset must be
	      used that includes frame definitions (e.g. "frames")


	
    %include file1 ... filen

	Arguments: file		the name of a file to include (if
				no absolute path is given the
				path is supposed to be relative to
				the Big Sister root directory)

	This statement only works when used within %Page statements.
	It tells bbd to include a file whenever the respective page
	is rebuild. The file is included at the 'current' position,
	so e.g.

		%Page ...
		%table TEST
		%include file

	will include the file 'file' after the table 'TEST'









file adm/permissions and adm/hosts.allow



Note: Release 0.36 introduced a new security model. Though
      the old adm/hosts.allow file and format is still
      supported for compatibility reasons the
      adm/permissions file should now be used.

The permission file tells bbd which clients are allowed
to connect and which operations they may perform. The file
is read line by line. Each line contains both a pattern
and a list of operations accepted or rejected for the
matching clients. If a client matches multiple patterns
the associated access lists are treated in a cumulative
way and applied in the order they appear in the file.

The format of each line is

	pattern	  =>  access list

accepted patterns are:

	name hostname	client name matches 'name'
	ip ip		client IP address matches 'ip'
	anonymous	no user is logged on on this connection

NOTE: "name name" patterns are ignored if the DNS option in
      bb-display.cfg (see above) is switched off. This is the
      default now!

NOTE: "user" and "member" patterns as described in former versions
      of this document are not supported at the moment.

The access list is a list of keywords being associated
with an operation or a group of client operations. Each
keyword is preceded by either a "+" or a "-" character
allowing or rejecting corresponding request. Accepted
keywords are:

	all		all operations
	authenticate	client is permitted to send user
			authentication
	status		client is authorized to send
			status messages
	page		client is authorized to send page
			commands
	grouping	client is permitted to send group
			join/leave and name commands
	archiving	log file archiving operations
	alarm_acking	alarm acknowledging operations
	perf		performance data transmission
	remove		removal of status records

Empty lines and lines starting with '#' are ignored.

Example:

	name .*		=> -all
	name .*\.mydomain\.com	=> +status
	name archiver	=> -all +archiving
	name localhost	=> +all
	anonymous	=> -page
	name group1	=> +grouping
	name group2	=> +grouping







file adm/grouping



The Big Sister status collector allows clients to join/leave
groups on client request. The file adm/grouping is used by
uxmon (provided you are using the standard uxmon-rules.pl file)
and contains a list of hosts and the groups they should join.
The file is read line by line. Each line should start with a
host name continue with a list of groups uxmon should tell
the status collector to join this host, e.g.:

	server1		EAST UNIX ALL
	server2		WEST NETWARE ALL
	hub1		WEST HUB ALL

NOTE: lines starting with '#' are treated as comments

NOTE: Using the 'grouping' feature means that uxmon will send
      the grouping information to the status collector each
      time status is sent (usually each 5 minutes).

NOTE: If uxmon is terminated normally (e.g. when an agent machine
      is halted intentionally) it sends a group leave command
      to the status collector and the status collector will forget
      about these hosts (and remove the status files from www/logs).
      If it crashes for some reason or other the groups won't be left
      and the status collector will still have the hosts in its lists.







file adm/uxmon-asroot



If the file adm/uxmon-asroot exists bb_start will start up uxmon
with root priviledges.







file adm/uxmon-net



When using the provided uxmon-rules.pl file this file tells uxmon
what checks it should run on what hosts and where to send status
information to. It is read line by line. If a '#' character is
found all the characters behind # are treated as a comment. Lines
ending in '\' are treated as being multi-line entries (like e.g.
in shell). Each line is of one of the following formats:

	hostname	check1 var1=content1 check2 check3 ... checkn

or

	hostname(alias)	check1 var1=content1 check2 check3 ... checkn

(quotes " and ' allowed and interpreted)

Hostname is the name of the host uxmon should check. The name returned
by /bin/hostname or 'localhost' are recognized as the local system.
Alias is the name uxmon should use when reporting status to the status
collector. This allows e.g. the following check:

	server1-interface1(server1)	ping
	server1-interface2(server1)	ping http

(where server1 is meant to be a multihomed host with two network
 interfaces which should both be checked but should both be reported
 as being the same machine)

Var=content sets a variable named 'Var' with the value 'content'. Some of
the checks allow for passing arguments this way. The variable space is
cleared after each line ...

Common variables are:

	frequency=xx

		tell uxmon how frequently to run a check. This is
		not really a frequency but rather an interval - the
		value is a number specifying the time in minutes
		between 2 checks, e.g.

			foobar	frequency=10 ping

		will ping foobar every 10 minutes

Checks are names of checks uxmon should run for the respective host. When
interpreting checks uxmon-rules.pl does look for files carrying the name
of the check in first adm/Config then in uxmon/Config. If found it does
interpret it as a perl script setting up some check, runs it and passes
(optional) arguments (var=...) to it. Currently implemented checks are:


  bbdisplay

    not really a check ... tells uxmon to send status
    reports to this host and use Big Brother compatibility
    mode. Multiple hosts may be bbdisplays. Note that
    some of the functionality is lost when using 'bbdisplay':
    no dynamic grouping (see file adm/grouping), no
    multiple status reports per tcp connection.

     Usage:

	mydisplay	port=1984 timeout=8 fqdn=no bbdisplay

     (port, timeout and fqdn are optional and default to 1984,
      8 and no)

     'fqdn' tells uxmon to either report host names with stripped
     domain (fqdn=no) to the status collector or with "." in
     hostnames replaced by "_" or "," (fqdn=yes). So e.g.:

	mydisplay	fqdn=yes bsdisplay
	foo.bar.com	ping

     will report foo_bar_com.conn to status collector while

	mydisplay	fqdn=no bsdisplay
	foo.bar.com	ping

     will only report foo.conn

     'bbdisplay' will rather use a ',' to replace dots in
     FQDNs while bsdisplay will use '_'.

     Since version 0.95 bbdisplay does support collecting
     performance data. Two new arguments have been added:

	options		a list of optional tasks bsdisplay
			has to perform. Defaults to
			"perf,status,group" (send performance,
			status and grouping messages to
			server). This argument has been introduced

	perfdata	the argument value is a file containing
			a list of regular expressions describing
			which performance variables should be
			sent to the Status Collector. If at least
			one perfdata argument is present uxmon
			writes out a file 'var/uxmon_variables'
			listing all the available variables.

      Example:

	host1	options=perf,status,group perfdata=adm/log_often bsdisplay
	host1	options=perf perfdata=adm/log_hourly frequency=60 bsdisplay

      this will tell uxmon to send the variables listed in adm/log_hourly
      every 60 minutes, the variables listed in adm/log_often every
      5 minutes (default). The usual status and grouping messages
      are sent every 5 minutes. No need to send them every hour
      too (since "every 5 minutes" includes "every hour" anyway).

      adm/log_often might look like:

	.*cpu.*
	myhost:.*disk.*


  bsdisplay

     same as before but use Big Sister protocol. This should
     be preferred to bbdisplay.


  cpuload

     Usage:

	localhost	cpu_yellow=10 cpu_red=20 cpuload

     check the CPU load as reported by the 'uptime' command.
     cpu_yellow defaults to 10, cpu_red to 20.


  statusfile

     Usage:

	localhost	file=adm/mystatus statusfile

     read status information from a file and report it to
     the Status Collector. This monitor is thought to be
     useful for interfacing to external simple monitors -
     they can write their status to a file rather than
     careing about TCP connections to Status Collectors,
     e.g.:

	echo "status myhost.mytest green wow - everything is ok" > adm/mystatus
	echo "status myhost.oterh yellow something went wrong" >> adm/mystatus

     would do ...


  bbscript

      Usage:

	localhost	env="LIMIT1=5;LIMIT2=10" file=adm/bb-oracle.sh bbscript

      Use BB style monitor script. "file=" must point to the script
      to execute each 5 minutes, "env=..." lists optional
      environment variables to be set in the scripts environment.
      The common variables (such as BBHOME and the like) are
      automatically set.


  http

     check http response. When used without arguments it will
     connect to port 80 and try to get the file "/". Other
     URLs/ports may be passed in either of the following two
     ways:

	server	url=http://host:port/somewhere/somefile.html http

     or simple:

	server	http://host:port/somewhere/somefile.html

     The "check" column the check results will appear under on
     the status display can be specified via the "check" argument
     as in

     	server	check=proxy http://host:3128/whatever/

     The http check is based on the "tcp" check explained below.
     The same additional arguments (especially "timeout=") apply
     to http also.


  tcp

     check if the host does respond to tcp connection request.
     Some well known services (such as smtp, pop3, nntp, ica)
     will be recognized and not only connected to but also
     checked against some expect/send pairs (e.g. when checking
     SMTP uxmon will expect an answer starting with '22').

     Usage:

	service=pop3,smtp,printer tcp
	service=ftp timeout=20 tcp

     where service is a variable set to a comma separated list
     of services 'tcp' should check. Timeout is the maximum time
     the tcp check waits for a response (default: 8s).

     Some well known services have their own aliases, so they
     can directly be listed without "service=... tcp", e.g.

	server1		pop3 smtp printer

     is ok.

     More sophisticated usage:

     	server1		check=tcp service=ftp(21),smtp(3325) tcp

     You can apply a test for a well known service to a non-default
     port by using the syntax "service=service(port)". If the
     "check" argument is set it specifies the column the test
     results appear on the status page.

     Experienced system administrators will probably find the
     following construct useful:

     	server1		check=tcp service=custom(80) send="HEAD / HTTP/1.0\r\n\r\n" expect="HTTP" tcp

     The semantics of this is: connect to port 80 of server1,
     send the string specified with the "send" argument, check
     if the stream server1 sends contains the regular expression
     specified with "expect".



  ping

     does a ping. Note that ICMP pings are only possible when
     uxmon is running with root priviledges. By default 'ping'
     will therefore use the UDP protocol. Ping supports icmp/udp/tcp
     though. Most of the IP stacks are implementing both ICMP and
     UDP echo services ...

     For running uxmon with root priviledges create file
     adm/uxmon-asroot and bb_start will start up uxmon as root.

     Usage:

	server1		ping

     or

	server1		proto=udp ping proto=tcp ping proto=icmp ping
	server2		proto=fping ping


     special protocols:

	if proto=external then a operating system command will be
	executed instead of using the built-in ping methods. Use
	proto=external like:

		server1	proto=external pingcmd="ping -c1" ping

	proto=karl is another special implementation of using the
	operating system ping. It is limited to Solaris though.

	If you have got the fping command on your system you
	probably want to use

		proto=fping


  rpc

     does an 'rpc ping', means it does send a 'NULL' remote procedure
     call to the respective program and checks for a correct answer.

     Usage:

	server1		rpc=mount,nlm rpc rpc=nfs version=3 rpc
		
     Note: Some of the checks have their own aliases, so you can
	   also write:

		server1		mount nfs nlm yp yppasswd

     Note: currently known programs are:

		mount, nfs, nlm, yp, yppasswd
		(list can be extended in Monitor::rpc_ping)

     Note: rpc does need a working portmapper on the remote system


  procs

     does check for running processes. On Win32 Systems this monitor
     checks for running services.

     Usage:

	localhost	procs=nfsd(1-16),sendmail,lpd(1-40) procs

     ("there must be 1 to 16 nfsd processes, at minimum one sendmail
      process and 1 to 40 lpd processes")

	localhost	pscomm="ps cax" procs=nfsd(1-16),sendmail,lpd(1-40) procs

     (same as above but use the command "ps cax" for finding the running
      processes)

     The "ps" command used for determining the list of processes is
     guessed by the procs check. If available it preferably uses
     "ps -e", on systems without a working "ps -e" it tries to use
     "ps cax". Both "ps -e" and "ps cax" display the process name
     as being the same of argument 0 passed to a program. If you
     prefer checking against the full command line you can use

	localhost	alternate=yes procs=nfsd,oracleinst1 procs

     This will make "procs" use the "ps -ef" command.

     NOTE: On Win32 systems the "pscomm" argument is ignored.

     NOTE: On Win32 systems it is possible to monitor remote systems.


  diskfree

     does check file systems for free space

     Usage:

	localhost	type=ufs fs=/(1000-5000),/var(10000-20000) diskfree

     ("status red if / is below 1MByte free, yellow if below 5MBytes,...,
       the file system type is ufs")

	localhost	fs=all-ext2(10%-20%),all-fat(5%-15%),/boot(500k-1000k) diskfree

     ("status red if /boot goes below 500k free, one of the 'fat' filesystems goes
       below 5% free or one of the ext2 filesystems (except /boot) go below 10%
       free, yellow if /boot goes below 1000k free or one of the fat ...., green
       otherwise")

     Note: the all-... syntax should be used in preference to the "type=" argument

     Note: On Win32 systems only percentual limits are supported and Big Sister
	   does not distinguish between NTFS or FAT filesystems.

     Note: If specifying specific fileystems the respective file system will
	   only be checked if it is of the same type as specified in the
	   "type=" argument or one "fs=all-type"-Argument!


  diskload

     check the average disk load (4 minutes) as reported by 'sar'.

     Usage:

	localhost	yellow=3 red=8 diskload

     ("report status yellow when load >3%, report status red when load >8%")


  load

     check for CPU idle time, I/O-wait, freeswap as reported by
     sar (4 minutes period)

     Usage:

	localhost	idle=10 wio=50 freeswap_red=100000 freeswap_yellow=200000 load

     ("report 'yellow' when %idle is below 10%, or %wio is greater than
       50% or freeswap is below 200000 blocks, report red when freeswap
       is below 100000 blocks")

     Note: defaults: idle=15, wio=50, freeswap_red=20000,
	   freeswap_yellow=60000


  dumpdates

     check for last backup if using dump/ufsdump.

     Usage:

	localhost	fs=.7(6-10),.0(30-40),/dev/rdsk/c0t1d0s0.7(1-2) dumpdates

     yellow: last level 7 backup older than 6 days, last level 0 backup
	     older than 10 days, last level 7 backup of
	     /dev/rdsk/c0t1d0s0 older than 1 day

     red: last level 7 backup older than 10 days, last level 0 backup
	  older than 40 days, last level 7 backup of
	  /dev/rdsk/c0t1d0s0 older than 2 days

     Note: the dumpdates check needs a working "mount" command to work
	   correctly


  syslog

     check system log files

     Usage:

	localhost	syslog   cfg=/etc/bs_syslog.cfg syslog

     Note: syslog does need its own configuration file. By default this
	   is etc/syslog. See below for a description of the file format

     Note: On startup syslog will re-read the last 15 minutes of the log
	   file but at most 30kBytes


  eventlog

     check event log on Win32 systems

     Usage:

	server1		eventlog

     Note: eventlog does need its own configuration file. By default this
	   is etc/eventlog. This config file is the same format as the
	   one for the syslog monitor. See below for a description of
	   the syslog config file.


  snmp_trap

     check the log file being generated by bstrapd (var/snmp_traplog).
     This is only useful if bstrapd is running (usually bstrapd is
     started up if its configuration file adm/bstrapd.cfg exists).

     Usage:

	server1		snmp_trap

     Note: snmp_trap has its own configuration file called etc/snmp_trap.
     See below for a description (same format as syslog config file).


  snmp

     remotely monitor hosts running SNMP agents

     Usage:

	server1		novell
	hub2		hub
	server3		type=ping,net,storage,nwusers,cpu,novell snmp
	router1		community=secret type=net,ping snmp

     The default community (when not passwed with "community=...") is
     'public'. "type" may be a list of checks out of the following:

	ping	report in host.conn if snmp poll was successfull
	net	check any network interface for InputErrors and OutputErrors
		and report a failure if device reports more than 2/s (yellow)
		or 6/s (red)
	storage	(currently only together with "novell" and "nt") use the 'hrStorage'
		oid for checking disk and memory usage
	cpu	uses 'hrProcessorLoad' to monitor CPU load. Does report "yellow"
		if load is >80%
	nwusers	(will only work with Netware servers) Does check 'nwMaxLogins'
		against 'nwLoginCount', report 'yellow' if only 10 users are
		left, 'red' if less than 2 are left

	hub / nt / novell / caty / cds / notes / linux:
		these do not include any check but will tell 'snmp' which
		types of machine are checked. Some snmp checks are depending
		on the type of host ...

			hub:	network hub
			nt:	Windows NT
			novell:	Novell Netware
			caty:	Cisco Catalyst
			cds:	Axis CD-Server
			linux:	Linux (whith UCD SNMP)

      Note: snmp does read the file etc/mibs.txt. By default this contains
	    only the Internet MIB-II (ping/net). You need to add more mibs
	    from the contrib/mibs directory, e.g.

		cat host.txt mib-2.txt nwhostx.txt nwserver.txt > mibs.txt


  OV

     monitor HP Openview trapd.log

     Usage:

	localhost	cfg=/etc/bs_OV.cfg syslog

     Note: OV does need its own configuration file. By default this
	   is etc/OV. See below for a description of the file format

     Note: On startup OV will re-read the last 24 hours of the log
	   file but at most 1MByte


  metastat

     monitor Solstice Disksuite metadevices
		
     Usage:

	localhost	stat=/usr/opt/SUNWmd/sbin/metastat metastat

     default value of 'stat' is /usr/opt/SUNWmd/sbin/metastat, so
     'stat=' can be omitted in most cases.


  FQDN / noFQDN

     (obsolete, use "fqdn=" with bsdisplay/bbdisplay instead)

     Usage:

	localhost	FQDN

     rather an option than a monitor. Tells uxmon to either report
     host names with stripped domain (noFQDN) to the status collector
     or with "." in hostnames replaced by "_" (FQDN). So e.g.:

	localhost	FQDN
	foo.bar.com	ping

     will report foo_bar_com.conn to status collector while

	localhost	noFQDN
	foo.bar.com	ping

     will only report foo.conn


   ntp

     Usage:

	foo		ntp

     Check if the machine is running an ntp server. This
     check uses the ntpdate command - therefore only works
     on systems with ntpdate installed.


   mrtg

     Usage:

	foo.bar.com	 prefix=10.1.1.253/10.1.1.253.1 column=mrtg maxlev=10485760 bits=1 mrtg

     The MRTG monitor will retrieve data from a MRTG-style .log file.
     It will check the last value of the data being monitored against
     given thresholds and act accordingly.  In the example above, the
     log file signifies a router interface.  The log file is found in
     mrtglog directory. (see below)

     Here are the available arguments:

	column 		the column to which status messages are posted
			default: mrtg   (i.e post to foo.bar.com.mrtg)

	prefix		the directory/file prefix of the logfile
			no default, required.

	bits		determines if we are graphings bits/sec or
			bytes/sec (used w/ network interfaces)
			default: bytes

	maxlev		the maximum value the data may take on
			Only used to calculate the percentage reported
			and to calculate yellow/red levels.
			default: 10mb (bits=1), 1.25mB (bits=0)

	warnlev		These two options determine at which point the
	paniclev	monitor should report yellow and red statuses.
			(0 -> warnlev is green, warnlev-> paniclev is
			 yellow, >paniclev is red)
			You may specify a value to compare to the counter
			directly or a percentage of the maximum value.
			default: warnlev= 50%, paniclev= 75%
	
	units		This determines the units of the value.  Asthetic
			only.
			default: b/s (bits=1), Bytes/s (bits=0)

	intext		These are the labels for the data being graphed.
	outtext		For network interfaces, In and Out are the default
			and should be sufficient.


	resources variables:
	mrtglog		the directory with mrtg logs in it
	mrtgweb		the path for the url to link to the graph
	mrtgloghtml	1 to post html status, 0 to do text only

	So, it reads the file $mrtglog/$prefix.log, posts the status as
	a line of text, and optionally links to http://$mrtgweb/$prefix.html
	and http://$mrtgweb/$prefix.gif to show the graph and get more
	info.


   atmport
   etherport

     Usage:

	   fooswitch	switch=192.168.0.4 port=3A2 vpi=0 atmport
	   fooswitch	switch=192.168.0.4 port=5 etherport

     check operating status of specific ports.

     The atmport monitor is configured to work with Marconi ATM gear.  It
     has been tested with the ASX200BX, ASX1000, LE25, LE155, and ESX3000.
     It will use SNMP to query the switch specified for the status of the
     VPT given.  It checks that signalling is up.  The etherport module
     queries an ethernet switch for the link state of that port and
     reports accordingly.

        switch		the ip address of the host to SNMP query
	community	the SNMP community to use
			default: public
	port		(atmport) Symbolic name for port (Marconi
			convention) or the number used as the SNMP index
			(etherport) SNMP index number for the port.
	vpi		(atmport only) the Path number to monitor


     To find numeric port numbers for the etherport module, use snmpwalk:
	snmpwalk  10.100.101.61 public ifDescr

	interfaces.ifTable.ifEntry.ifDescr.2 = "FastEthernet0/1"
	interfaces.ifTable.ifEntry.ifDescr.3 = "FastEthernet0/2"

     For FastEthernet0/1, I would use port 2.  ifName or ifAlias may yield
     the desired result depending on the switch manufacturer.

   software

     Usage:

	 foo	type=asn9000 expected=ForeThought.*6\.0\.1 software
	 foo2	expected=Linux.*2\.0\.36 software

     Get the firmware release via SNMP and check it against
     a configured "expected" version. Report a yellow status
     if the version does not match.

     The "expected=" argument is a regular expression tested
     against the value of system.sysDescr.0.

     Some special "type=" values exist:

	type=asx	FORE ASX ATM Switches
			checks ASXSoftware.0 instead of sysDescr

	type=asn9000	FORE ASN9000 PowerHub
			checks ASNSoftware.0 instead of sysDescr

	type=es2810	FORE ES2810 Ethernet Switch


    realhttp

	Usage:

	     localhost url=http://www.mybc.com realhhtp url=http://www.mybc.com/cgi-bin/mycgi item=cgi realhhtp url=https://secure.mybc.com:447/index.html item=secure realhttp url=http://127.0.0.1/index.html greppos=success realhttp url=http://127.0.0.1/t.html grepneg=test realhttp 	This module is used to execute an HTTP request for a specified
	URL.  It utilizes the LWP libraries so it can support HTTPS
	if the Crypt/SSLeay libraries have been installed on your
	system.
	
	A query is considered successfull if LWP reports success.  There
	is no method currently implemented to check that the page
	returned is what would be expected.

	This check will only work if you have the LWP::UserAgent perl
	module installed (see CPAN)

	Arguments:

	    url     The url to retrieve.
	    item    The item to report this test as.  Defaults to 'http'.
	    greppos A regular expression which should be found in the
	            contents returned from the server.
	    grepneg A regular expression which should *NOT* be found in
	            the contents returned from the server.
	    user    A username to supply in basic authentication
	    pass    A password to supply in basic authentication
	

  memory

       Report memory usage on NT and Linux systems.

       Syntax:
	    
	    localhost	memory


  oracle

      (see also: HOWTO)

      Syntax:

	    localhost	ORACLE_CONNECT=... ORACLE_HOME=... ORACLE_BASE=... ORA_USER=... ORA_PASS=... ORACLE_TNSADMIN=... ORACLE_NLS_LANG=... ORACLE_ORA_NLS33=... oracle

      Tests oracle database server.

      Arguments:

	  ORACLE_CONNECT:
	  Connect String constisting looking like

	      [HOSTNAME]:[ORACLE_SID]:[BIG_SISTER_COLON_NAME] .

	  Eg. Your database runs on "Asterix" and the Oracle SID you want to
	  surveille is called TVD806. You want to have the collon beeing called
	  "ORACLE DEMO". So your ORACLE_CONNECT string looks like:

		"Asterix:TVD806:ORACLE DEMO".
	  
	  ORACLE_HOME:
	  Home directory of the oracle product (where you have installed oracle version)
	  
	  ORACLE_BASE:
	  Home of the oracle user.
	  
	  ORA_USER:
	  User to connect to oracle db. ( with our prepared PLSQL default is check_db )
	  
	  ORA_PASS:
	  Password for oracle user ORA_USER ( with our prepared PLSQL default is check_db )
	  
	  ORACLE_TNSADMIN:
	  Path where oracle will find the tnsnames.ora
	  
	  optional fields: ORACLE_NLS_LANG, ORACLE_ORA_NLS33


	  eg:
	  localhost       ORACLE_CONNECT="ds1skeys:DKMS2:oAPP2" ORACLE_HOME=/u00/app/oracle/product/8.1.6 ORACLE_BASE=/u00/app/oracle ORACLE_NLS_LANG=american_america.WE8ISO8859P1 ORACLE_ORA_NLS33=/u00/app/oracle/product/8.1.6/ocommon/nls/admin/data ORA_USER=TSMW ORA_PASS=TSMW ORACLE_TNSADMIN=/u00/app/oracle/network oracle


  tripwire

      Syntax:
	  
	  localhost		tripwire
	  localhost	timeout=10 tripwire

      Check system consistency via tripwire.
      (see also HOWTO)


  ldap
  ldap_mozilla

      Description
 
      This module is used to query a ldap server and attempt to
      retriew a register. The "ldap" check uses Net::ldap module
      which is available at CPAN while "ldap_mozilla" uses
      Mozilla::LDAP available from Mozilla.org.
      
      Parameters
      
      dn    DN to check
      pass  UserPassword
      port  ldap tcp port (default: 389)
      item	The item to report (default: ldap)
      time	The number of seconds to wait for a response before
      		timing out.  (default: 5)
      
      Manuel de Vega Barreiro
      Madrid,Spain
      mbarreiro@red.madritel.es
      
      Based in Kevin O'Donnell (kevin_odonnell@telus.net)  Radius Monitor

      Example:

	myhost		dn="cn=whoever,o=wherever,c=us" pass="test" ldap


  command

      Description

      This module is used to execute an arbitrary command
      this command must return status 0 if test is ok,
      and status != 0 if it is wrong.

      In order to allow interfacing to commands not explicitly
      written for use with Big Sister the default behaviour
      regarding exit codes and other error conditions is
      configurable (see "results" parameter below).

      All text messages send to standard output/error is send as status
      messages to bsdisplay.
      
      note that command are exec with "/" base directory.
      uxmon-net example:
      
	  localhost exec="/opt/bs/otros/test.pl test1" time=10 item=test command

	  localhost exec="ntpdate -q localhost" results="0:green:ntp OK;stratum 16:yellow:ntp running but not synchronized;timeout:red:ntp not running;*:red:ntp FAILURE" command
      
      Parameters
      
            exec	command to execute
            item	The item to report (default: command)
            time	The number of seconds to wait for a response before
	      		timing out.  (default: 5)
	    results	patterns and resulting status to report when a pattern
			matches. Patterns are separated by semicolons and
			the results string looks like

				pattern1:color1:text1;pattern2:color2:....

			where pattern may be one of

				<number>: matches the commands exit code
				timeout : matches if command timed out
				failed  : matches if uxmon was not able
					  to even start the command
				<any regex>: matched against the stdout/
					  stderr output of the command

			The first matching pattern decides which rule
			will apply.

  ups
      
      Query an uninterruptible power supply supporting SNMP queries
      via the UPS MIB.

      Syntax:

	  myups		ups

      This test is very limitted. It reports green unless

	  - the number of "Line Bads" increased during the last hour
	    => yellow
	  - the reported power source is something different from "normal"
	    => red

      In addition to this it collects some useful performance data
      like the Battery voltage (indicating battery quality), the
      output power and output load. The standard graphtemplates file
      includes graph definitions for these values.

  qmqueue

      monitor qmail's mail queue

      Syntax:

	  localhost	queue_dir=/var/qmail/queue/ queue_yellow=2000 queue_red=3500 method=direct qmqueue

	  localhost	queue_dir=/var/qmail/queue/ warnlevl=2000 paniclevl=2000 warnlevr=2000 paniclevr=2000 socket=/tmp/.qmail-qstat warnlevt=2000 paniclevt=2000 method=socket qmqueue

      Checks the number of messages in qmail's queue, raises
      a warning if message number passes the queue_yellow limit,
      an error if it passes the queue_red limit. Queue_yellow
      defaults to 2000, queue_red to 3500, queue_dir to
      "/var/qmail/queue".

      The monitor will check the remote, local, and todo
      directories against this limit.  The warnlevl and
      paniclevl directives will allow you to specify the
      thresholds for the local queue only.  The warnlevr
      and paniclevr directives work on the remote queue,
      and the warnlevt and paniclevt directives work on
      the todo queue.  If queue_yellow or queue_red is
      specified, then all warn levels default to
      queue_yellow and all panic levels default to
      queue_red for compatibility.

      The method directive can be either direct or socket.
      Direct traverses /var/qmail/queue and counts files.
      Socket checks a UNIX domain socket as provided by
      Bruce Guenter's RPM and using the ucspi-unix package.
      The default socket is /tmp/.qmail-qstat.

 

file etc/syslog, etc/eventlog, etc/logfile, etc/OV etc.



Note: There are quite many things you can do with the log file
      monitor. Unfortunately this fact rendered its config files
      rather complex. It's always a good idea to have a look at
      some samples - e.g. the etc/OV or etc/syslog file.

Various monitors perform about the same function: They watch log
files. These monitors include syslog, eventlog, logfile, OV. Each
of them has its own config file usually called etc/syslog,
etc/eventlog and so on. The format of all of these files is identical.
There are only very few differences in the semantics.

Each file consists of one section per system log file that should be
watched. Empty lines and lines starting with '#' are ignored. Each
section starts with a line containing the file name of a log file
that should be watched followed by a ':', e.g.

	/var/adm/messages:

If you would like to use the same config file on multiple systems
and the file is called differently on some of them you might find
the following solution handy. You can list multiple filenames comma
separated on the same line. The monitor will then use the first
existing file out of this list:

	/var/adm/messages,/var/log/messages:

For the eventlog monitor the semantics is different. The 'file' here
must be the name of a log, e.g. "System" or "Security".

The lines following the file name are of the format (fields separated
by one or more tabs):

	pattern		status		minutes		text	topic

where 'pattern' is a regular expression (perl style!) that should be
matched against each line appearing in the log file, status is the
status that should be reported if a matching line is found (e.g.
"yellow" or "red"), minutes is the time in minutes the status should
be reported (remember that "line appearing in log file" is an event
passing by very fast - so we extend the time a little :-)), text is
the text that should be appearing in the status message and topic is
service type part of the status name (defaults to "msgs", thus 'topic'
is optional). Since 0.98 'topic' may also include a host name (so
that it optionally looks like 'host.item').

Like in perl patterns may include sections in parentheses "(...)".
These can be referenced in text with $1 through $9. E.g. the following
rule:

	/var/log/syslog:
	to=([^,]+)	yellow		10	someone sent mail to $1	  funny

will report a status of

	machine.funny:	yellow someone sent mail to blabla

for 10 minutes if someone sends an email message to user 'blabla'.

Apart from a color 'status' may be the word 'clear'. In this case the
log file monitor will not log the corresponding message but will
rather find any tracked message matching the text and remove it.
E.g.:

	host (.*) is down	red	20	host $1 down
	host (.*) is up		clear	0	host $1 down

will make the monitor report "host ... down" for 20 minutes after
seeing "host ... is down" in the watched log file. If within these
20 minutes the message "host ... is up" is detected the message
is immediately cleared from the memory though without waiting for
the whole 20 minutes. In 'clear' patterns the message text is
treated as a regular expression if it is preceded by a '+' sign,
e.g.

	host (.*) was disposed	clear	0	+host.*$1

will clear any message containig host.*... if the message "host ...
was disposed" is detected.

If pattern is 'default' the line is treated as the default status that
should be reported if no other event is pending. E.g.:

	default		green		hostname	everything looks fine

Will report "green everything looks fine" to the Display Server for
the host 'hostname' if there's nothing else to report. Usually you
will use "*" for hostname - semantics: set the default for any host
known to the log file monitor.

This can be used in a less obvious way too. Consider the following
example:

	/var/log/dhcp:
	default		yellow		0	don't know what's going on		dhcp
	DHCPOFFER	green		30	we are still giving out addresses	dhcp
	DHCPACK		green		30	we are still ack'ing leases		dhcp
	no free leases	red		10	oops - we are out of leases		dhcp

This will report 'yellow don't know what's going on' if no log entries are
written, "green we are still giving out addresses" if an address was given
out during the last 30 minutes ('DHCPOFFER'), "green we are still
ack'ing leases" if a lease was acknowledged during the last 30 minutes
and "red oops - we are out of leases" if "no free leases" was logged
during the last 10 minutes.

Since you have not got much influence on what is written to log files
host names appearing there may sometimes be different from the names
you want them to appear in Big Sister. For this reason the pattern

	node logname=truename

has been introduced. Some monitors will automatically add 'node'
entries during runtime. E.g. the OV monitor will add one each time
it detects a "System name changed" message.

One log file can influence more than one column - e.g. you will find
messages concerning disks, cpu, etc. in /var/log/messages. By issuing
the 'topic' column (as seen above) you can distribute log file entries
to multiple columns. Note that you also need multiple "default" lines
in this case, e.g.

	default		green	0 	no errors	disk
	default		green	0 	no errors	msgs
	full		red	0	fs full		disk
	error		red	0	some error	msgs

Also be carefull about re-using already reported columns. It is only
possible to have multiple checks (e.g. 'syslog' and 'diskfree')
report to the same column if they are running within the same
uxmon instance!








file adm/display_map.cfg (%image)



Note: The file may be called as you like - adm/display_map.cfg is the
default.

This file describes how to build a graphical status display when using the
%image statement in bb-display.cfg. It consists of a series of one line
statements. Empty lines and lines starting with "#" are ignored. All file
names are paths relative to the Big Sister root directory. Known
statements are:

Note that the displaymap feature only works if you have got a working
GD module installed.  Big Sister <0.93 only works with old versions
of GD (the one supporting the GIF image file format), while 0.93
works with old and new (the one supporting the PNG format) modules.

	template filename

		read background graphic from file filename (must be GIF
		or PNG format depending on your version of the GD module).
		This is mandatory! No image without "template" ...

	name coord coordname

		remember the display position 'coord' under the name
		'coordname'. After this statement whenever a display
		position is expected 'coordname' can be used instead
		(e.g. in 'at' or 'line' statements).

		Example: name 100,150 NewYork
			 name 70,200 Dallas
			 line NewYork Dallas DALLASWAN
			 at NewYork NEW_YORK
			 at Dallas DALLAS

	red filename
	yellow filename
	purple filename
	green filename
	blue filename
	clear filename

		the graphic to be inserted for red/yellow/purple/green
		status (GIF or PNG).

	at coord group

		display status for group "group" (groups as configured in
		bb-display.cfg) at position "coord" (0,0 is at upper left
		corner). Since version 0.98b4 you can also directly address
		host.check or group.check pairs in place of groups

		Example: at 100,150 NEW_YORK
			 at 30,80 WASHINGTON
			 at 50,50 mymachine.conn
			 at 15,10 WASHINGTON.disk

	line coord1 coord2 group

		draws a line from display position coord1 to display
		position coord2 with the color of the status of the
		group group

	link coord1 href coord2/image

		includes a hyperlink pointing to "href" in the image
		map at position coord1. If the 3rd argument is the
		name of an image file, the image is read and printed
		at coord1. Otherwise the 3rd argument is expected to
		be the size of the rectangular region associated with
		href.

		Example: link 100,150 http://wherever.com/bla adm/bla.png

	dump filename

		write the image to file filename. This must be the last
		statement. The file must be in the "www" directory,
		otherwise browsers will not find it.








file adm/bb_event_generator.cfg



Bb_event_generator.cfg is the configuration file for the alarm generator
(bb_event_generator). The file tells on what conditions alarms should
be sent to whom and with what priority. It consists of one-line-rules.
Like in other files empty lines and lines starting with '#' are ignored.
'\' at the end of a line will tell bb_event_generator that the statement
continues on the following line.

Each rule is composed of one or more patterns separated by ';' and a list
of variable settings. The variable settings are separated by spaces or
tabs.  So a rule looks like:

    pattern1;...;patternN	var1=text1	var2=text2 ... varN=textN

where pattern1 thru patternN are patterns, var1...varN are variable names
and text1...textN are variable values.

Each pattern is composed of a host part and a check part:

    host.check

where 'host' may be one out of the following:

    - a host name as reported to Big Sister status collector
	    e.g. myserver, www, ...
    - an IP host or network address in "[" "]" parentheses
	    e.g. [139.79.159.1], [192.168.50]
      NOTE: this will only work for host names that can be
	    resolved into an IP address
      NOTE: this will not work if DNS option in bb-display.cfg
            (see above) is switched off. This is currently the
	    default!
    - a group as known to the Big Sister status collector with
      prefix '@'
	    e.g. @ALL, @ROUTER
    - an asterisk "*" matching any host

and 'check' may be either an asterisk "*" matching any check or
a check as displayed in the columns of the status display.

Since version 0.29, the pattern may be extended by an additional
condition. The syntax is

    host.check{condition}

where condition is a boolean expression, e.g.

    *.*{$mail == "test"}	mail="nobody"

special functions are 'daytime' and 'weekday', they can be used
like this:

    *.*{daytime 22:00-06:00 or weekday Sat,Sun}	postpone=30
    *.*{daytime 22:00-06:00}	postpone_to=06:00

Version 0.98 introduced special rules allowing us to modify the way
how alarms are delivered. E.g. the rules

    PAGER{$mail eq "someaddress@somehost"} pager=myscript mail=someaddress
    PAGER{$pager eq "sendmail" and $mail eq "test"} mail=addr1,addr2,addr3

will re-direct alarms sent to someaddress@somehost to the address
someadress and invoke myscript for sending the alarm (first rule).
If the pager equals sendmail and the target address is test the
alarm is redirected to the three addresses addr1,addr2 and addr3.

The PAGER-Rules are applied once per target address, thus the above
rules would also apply if the page originated e.g. from a

    *.* mail=someaddress@somehost,test pager=sendmail

(this would cause the event generator to apply the PAGER rules
twice: once for "someaddress@somehost", once for "test")

Note also that target addresses are split into "pager" and "mail"
before PAGER rules are processed. An address like

	mail=sendmail:test@somewhat.strange

will appear to the PAGER rules as

	mail=test@somewhat.strange pager=sendmail

Whenever a status change is detected, bb_event_generator.cfg goes
through the config file and looks for matching patterns. Each
variable associated with the matching patterns is then set as
listed. If multiple patterns are matching the associated variables
are set in order.

Interpreted variables are:

    - mail	mail addresses where to send alarm (comma separated).
	 	Alternatively to a mere mail address you can also
		specify a pager (see "pager" variable) explicitly,
		e.g.

			mail=myscript:test,sendmail:me@somewhere.com

		will send alarms to both "test" via the myscript
		command and to "me@somewhere.com" via sendmail

    - upmail	mail addresses usen when sending "up" messages -- this
		defaults to the same as "mail"

    - prio	priority level (0..100)

    - repeat	if set bb_event_generator will send the alarm again all
		x minutes until the alarm condition has cleared

    - repeatprio	the priority level for repeated alarms (see "repeat")

    - keep	the duration in minutes the alarm is not cleared
		by the event_generator after the alarm condition
		is telling us that everything is ok again

    - norepeat	the duration in minutes no alarm can be sent for
		the same condition

    - delay	the duration in minutes between when the alarm
		is raised and sent to the user

    - check	a boolean expression that is checked during the
		'delay' time and forces the alarm to be aborted
		if the condition is not met once during this time

    - down	(one out of "green", "purple", "yellow", "red",
		"never") tells the event generator which status
		should be interpreted as "down". E.g.: "yellow"
		means that if the status is "yellow" or below
		("red") is detected then the corresponding
		service is down.

    - up	(like down) tells the event generator which
		status should be considered as "up". E.g.
		down=yellow up=green means that a service
		is considered as down from the time when it
		changes to yellow or red to the time when it
		goes to "green" again (but not if it's going
		to "purple"!)

    - maxmsg	a numeric value which is the maximum size of
		a message sent in the subject line of the alarm
		mail (e.g. if you send it through a pager gateway
		...)

    - postpone	if set alarms won't be sent for additional x minutes
		and rather stay in the queue. If during the postpone
		time period the alarm condition is cleared the alarm
		is silently thrown away. Postpone is meant to be used
		e.g. during night when you don't want to get an alarm.

    - postpone_to	same as postpone but the value is expected to be a
		daytime rather than an interval (e.g. "06:00").

    - pager	use alternative pager program (by default Big Sister
		tries to use the system's 'sendmail' program, if mailhost
		is set Big Sister's 'smtpmail' is the default)

    - skin	use the skin specified here for alarm messages

    - trap	if set bb_event_generator will raise an SNMP event
		for any alarm/acknowledgement. The contents of trap
		is a trap destination composed of a community and
		a host of the form community@host. If the community
		is missing "public" is assumed. See also SNMP_AGENT.

Examples:

Usually you will put a general rule with a pattern matching any
host/check and the default variable values as your first rule, e.g.:

    *.*	mail=alarm prio=50 norepeat=20 down=yellow up=green maxmsg=60

if you do not want to get an alarm about e.g. smtp being down when you
already know that the connection to the host is down then you could
use the following rule for instance:

    *.smtp	delay=5 check="$host.conn"

(semantics: if the "conn" goes down within 5 minutes after smtp down is
	    detected then throw away the smtp alarm, otherwise send it
	    after 5 minutes)

If your very important machines are in a group called "IMPORTANT" then
you may wish to do something like:

    @IMPORTANT.*	prio=100 repeat=30 repeatprio=60

(semantics: if a service of a machine in the group IMPORTANT goes down
	    then send an alarm with priority 100 and send a reminder
	    with priority 60 each 30 minutes ("yell for help"))

If the machines in a group EAST are all located in a network connected
to router "router-east" then you may get plenty of alarms when
"router-east" goes down since any machine behind is unreachable. You can
avoid this by e.g.:

    @EAST.conn	check=router-east.conn delay=5
    router-east.conn	check="1" delay=0

(semantics: if a host is in group EAST and the connection to it goes
	    down wait for five minutes and if within these five minutes
	    the connection to router-east is lost too then do not
	    send an alarm for this host. If the host is the router
	    itself send an alarm immediately)
or

    @EAST.*		router=router-east
    *.conn		check="($router.conn) or not $router" delay=5
    router-east.conn	check="1" delay=0

(semantics: if a host is in group EAST set the variable "router" to
	    "router-east". If the connection to any host is going
	    down then wait for five minutes and check if either there
	    is no router configured for this machine or the connection
	    to the router goes down as well. Discard the alarm if
	    the router goes down. Of course except for if the machine
	    is the router itself)

NOTE: you cannot use variables in patterns, so e.g. the example above
      cannot be written as (not yet):

    @EAST.*		router=router-east
    $router.conn	check=1 delay=0


Postpone is used during times when system failures are less important,
e.g. during night. You can postpone alarms for a time interval:

    *.*{daytime 22:00-06:00}	postpone=60

This will tell the event generator to keep a raising alarm in the postpone
queue for 1h before sending an alarm mail. If during this time the alarm
condition clears no alarm is sent at all. If you never want to be waked
up by alarms, then

    *.*{daytime 22:00-06:00}	postpone_to=06:00

might be what you want (Semantics: when an alarm is detected during night
send it at 06:00)






file adm/notify.cfg



Notify was obsoleted in Big Sister 0.98 though it is still included in the
distribution. Please do not use notify any more. Instead make use of the
PAGER rules in adm/bb_event_generator.cfg (see above and see also the
HOWTO section about migrating from notify.cfg).