|
How to negate tests
How to migrate existing notify.cfg configurations
How to include host specific manually maintained text in status display
How to get rid of unwanted status lights ("purples")
How to set up tripwire monitor
How to install and use Oracle Big Sister Module
How to set the path for Big Sister
How to setup performance data collection
How to setup SNMP trapping
How to tunnel Big Sister connection through ssh
How to use check frequencies other than the default
How to re-use your Big Brother monitoring scripts
How to centrally archive your history log files
How to make bbd rotate display.history log file
How to set up an image map
How to interoperate with Big Brother
How to debug Big Sister
How to add your own "checks"
How to reduce network traffic in a large network
How to set up a Relay Status Collector
How to set up redundant Status Collectors
Sometimes it is useful to negate tests, e.g. to consider it a problem
if a service *is* actually running. Since 0.98b4 uxmon accepts a
syntax like e.g.:
myhost service=smtp !tcp
meaning: test if myhost responds to SMTP requests and report red if
yes.
Note: This does only work for new-style sensors. If you get log
messages like "check !tcp failed to start up" you possibly need to
read the NEWAGENT document.
Since version 0.98 the notify pager (alarming) and therefore notify.cfg
are no longer supported. Instead the functionality of notify has been
added to the alarm generator and its configuration bb_event_generator.cfg.
A short description of the new configuration options can be found in
the CONFIG documentation in the section about bb_event_generator.cfg.
Look out for PAGER rules ...
Migrating from notify/notify.cfg back to the alarm generator builtin
notifier support is not too difficult, however. There are mainly two
new mechanisms in bb_event_generator.cfg:
1. When specifying mail addresses via the "mail=" setting you can
specify (optionally) the pager program as well as the mail
address, e.g.
mail=sendmail:me@somewhere.com,myscript:test
will make the alarm generator send alarms via the sendmail program
to me@somewhere.com and via the program myscript to test.
2. Every time an alarm is sent and for each alarm recipient the
alarm generator goes through a set of PAGER rules (of course
only if they exist :-)). A PAGER rule may specify additional
routing information for the message, e.g.
PAGER{$mail eq "test"} pager=myscript mail=012345
will redirect an alarm sent to "test" via the myscript program
to the address 012345 (no matter what "myscript" uses this
address for).
Note that the PAGER rules are applied to each individual address
in "mail=" and that before the PAGER rules are applied each
address is split into its pager (before ":") and address (after
the ":") part. So e.g.
mail=sendmail:abc,myscript:test
would make the alarm generator to generate two alarm messages,
one matching PAGER{$mail eq "abc" and $pager eq "sendmail"},
the other matching PAGER{$mail eq "test" and $pager eq "myscript}.
[Simon Clift]
I needed to include a piece of HTML, specifically with custom help
links, for each log page (where the monitor light details are given).
After some contemplation of the code I did the following.
1. I created a skin directory, for example, skins/logspecial
2. I copied the skins/default/loghtml_text.proto file to the
new skins/logspecial directory.
3. Edit the skins/logspecial/loghtml_text.proto and put text
in it like this:
<hr />
<h3>Special Help for @HOST@.@ITEM@</h3>
@(@HOST@_@ITEM@_help.inc)@
4. Create my specific help files. So for FooMach.BarTest I
have file
skins/logspecial/FooMach_BarTest_help.inc
which contains my specific material.
5. Restart Big Sister...
Whenever an agent stops reporting status messages for a certain
host/check the Big Sister display server will still remember this
host/check and display a purple ("no report") status light. This
is the indended behaviour since by this means you notice which
checks are currently not working (e.g. due to a connection loss
with the respective agent).
However sometimes you will remove or rename a host/check and want
those purples to just disappear. Big Sister 0.96 introduces a
"remove" command for exactly this purpose. Agents now can send
a "remove host.check" to the display server and the respective
status light will disappear.
If you want to get rid of status lights manually you can use the
"bsadmin" command, e.g.:
bin/bsadmin -d mydisplayserver remove host1.cpu host1.disk
bin/bsadmin -d myserver remove host1.\*
Use the adm/permissions file on the server to limit access to
this new removal command (see CONFIG).
This is the README file for the tripwire monitor check for Big Sister.
Has been tested with RedHat 7.0 and Solaris 7.0 with commercial/free version of Tripwire 2.3.
Tripwire 2.3 Portions copyright 2000 Tripwire, Inc. Tripwire is a registered
trademark of Tripwire, Inc. This software comes with ABSOLUTELY NO WARRANTY;
for details use --version. This is free software which may be redistributed
or modified only under certain conditions; see COPYING for details.
All rights reserved.
* Create/modify the tripwire database as root in following way:
- /etc/tripwire/twinstall.sh
- modify the /etc/tripwire/twpol.txt for you need
- cd /etc/tripwire/
- twadmin --create-polfile ./twpol.txt #Update the policy file
- tripwire --init #Create a tripwire DB
- chmod a+r tw.pol localhost-local.key site.key #Give read permission to ALL (BS user)
- chmod 777 /var/lib/tripwire/report #Give write permission to ALL
- tripwire -m c #Check if the report is OK
- cp /cdrom/cdrom0/install.cfg /tmp
- vi /tmp/install.cfg
TWROOT="/opt/TSS" #<--Modify as needed
TWMAILMETHOD=SMTP
TWSMTPHOST="mailhost.mydomain.com"
TWSMTPPORT=25
- cd /cdrom/cdrom0/
- ./install.sh /tmp/install.cfg
Enter the site keyfile passphrase:xxxxxxxxxxx
Enter the local keyfile passphrase:xxxxxxxxxxx
- vi /opt/TSS/policy/twpol.txt # Modify as needed
- cd /opt/TSS/policy
- /opt/TSS/bin/twadmin --create-polfile ./twpol.txt
- /opt/TSS/bin/tripwire --init
- chmod a+r tw.pol localhost-local.key site.key #Give read permission to ALL (BS user)
- chmod 777 /opt/TSS/report #Give write permission to ALL
- tripwire -m c #Check if the report is OK
* Add the tripwire monitor to the $BS/uxmon/Config directory and set 755 permission.
* Correct the $BS/uxmon/Config/tripwire path as needed
DEFAULT:
- /usr/sbin/tripwire -m c; (RedHat 7)
- A check is performed every 12 hours
* Add to the $BS/adm/uxmon-net an entry like this:
localhost tripwire
* That's all folks
Perl Requirements:
DBI
DBD Oracle
Oracle Requirements
Installed Oracle (tested with 8.0.[56] and 8.1.[56])
Spezial Oracle User and View
Perl: Install the two perl modules. You can fetch them from CPAN or from http://www.perl.com
Create Special Oracle User and special View to connect reduce security problems and
enable Big Sister to fetch state. Of course you can also use a dba user but we don't
prefer username and password from highly privileged users in config files.
Edit PLSQL Script (see sample/oracle/bs-user-view.sql in your distribution) with
password for sys and change username and password according your needs:
Default: [username]/[password]
sys/change_on_install
check_db/check_db
Now your up to configure your uxmon-net configuration file...
ORACLE_CONNECT:
Connect String constisting looking like [HOSTNAME]:[ORACLE_SID]:[BIG_SISTER_COLON_NAME] .
Eg. Your database runs on "Asterix" and the Oracle SID you want to surveille is called TVD806. You want to have the collon beeing called "ORACLE DEMO". So your ORACLE_CONNECT string looks like: "Asterix:TVD806:ORACLE DEMO".
ORACLE_HOME:
Home directory of the oracle product (where you have installed oracle version)
ORACLE_BASE:
Home of the oracle user.
ORA_USER:
User to connect to oracle db. ( with our prepared PLSQL default is check_db )
ORA_PASS:
Password for oracle user ORA_USER ( with our prepared PLSQL default is check_db )
ORACLE_TNSADMIN:
Path where oracle will find the tnsnames.ora
oracle:
This is the name of the which tells big sister to use the oracle module.
optional fields: ORACLE_NLS_LANG, ORACLE_ORA_NLS33
eg:
localhost ORACLE_CONNECT="ds1skeys:DKMS2:oAPP2" ORACLE_HOME=/u00/app/oracle/product/8.1.6 ORACLE_BASE=/u00/app/oracle ORACLE_NLS_LANG=american_america.WE8ISO8859P1 ORACLE_ORA_NLS33=/u00/app/oracle/product/8.1.6/ocommon/nls/admin/data ORA_USER=TSMW ORA_PASS=TSMW ORACLE_TNSADMIN=/u00/app/oracle/network oracle
Big Sister reports three states for oracle:
red DB is not availible
yellow DB is up, but in restricet mode
green DB is ready
Philip Markwalder
<Philip.Markwalder@trivadis.com>
By default Big Sister searches for commands it attempts
to execute in the search path set when Big Sister was
started. Due to the differences of the hosting systems this
might not always be the right choice. Since version 0.96
you can set the search path via an entry in the adm/resources
file:
- if the file adm/resources does not exist just create
it
- add an entry like
*.path=/sbin:/usr/sbin:/bin:/usr/bin:/usr/local/bin
(set the path for all Big Sister components)
or entries like
*.path=/sbin:/usr/sbin
uxmon.path=/sbin:/usr/sbin:/bin:/usr/bin:/usr/local/bin
(set a very limitted path for all Big Sister components,
an extended path for uxmon only)
Note that Big Sister tries to execute quite some commands,
like e.g.
- df
- mount
- ps
- ping / fping
- nslookup
- ...
Since release 0.94 Big Sister supports performance data
collection and visualizing. This feature relies on Tobias
Oetiker's RRDTool. To enable performance collection you
will have to install RRDTool on the Big Sister server
system(s).
Before the server can start to process performance data
you have to tell the agents to submit the system parameters
you are interested in. You do so by adding one or more lines
like
myserver frequency=10 perfdata=etc/perf options=perf bsdisplay
to the repective uxmon-net files (the sample uxmon-net
file provides these lines - you just have to uncomment
them). The file etc/perf in the sample above specifies
which of the system parameters it knows about uxmon should
submit to the Big Sister server myserver. Once uxmon is
run with at least one 'perfdata' line it creates a list
of known parameters in the file var/uxmon-variables. The
file etc/perf should contain lines like
systemname:variablename
where both systemname and variablename are regular
expressions and therefore may contain wildcards. Two
'perf'-files containing lists of some basic parameters
like disk space usage, cpu usage, etc. are installed
by default (etc/perf and etc/perfslow). I suggest you
just uncomment the respective lines in uxmon-net and
see what is happening before you dive into the depths
of uxmon-variables.
After uxmon starts submitting performance data the server
should immediately start creating the necessary databases.
Since the server does not really know the meanings of
the values submitted by the agent you have to tell it
how the graphs should look like (e.g. if values should
be plotted as being percentages, bandwidth or whatever).
For this purpose there is a file called etc/graphtemplates
containing definitions of all the graphs the server is
expected to create. Whenever the server gets a specific
system parameter the first time it looks it up in the
graphtemplates file, checks if this value is associated
with a graph and if yes creates the respective database.
Big Sister comes with a graphtemplates file containing
the basic graph definitions. So I suggest using this
pre-installed file for a start.
Note that the graphtemplates file is only consulted for
database creation - that's why it is called 'templates'.
Modifications of graphtemplates will not effect
already existing databases. Anyway, you can purge/recreate
databases by just removing the respective files in
the var/graphs directory and restarting the server.
However already collected data will get lost if you
do so.
After databases have been created the systems listed
on your status pages should get a little graph symbol.
If you click on this symbol you will get to an index
page listing all the available graphs for the respective
system. Note that it might take a while before the
graph symbols appear since the server will only update
status pages on status change - database creation will
not force an update. If you are in a hurry you can
force an update by restarting Big Sister on the server.
Since release 0.38 Big Sister supports
- receiving SNMP traps (basic support)
- sending SNMP traps whenever an alarm is raised
Trap sending is currently merely done by the Event
Generator. By setting the variable "trap" in the
bb_event_generator.cfg config file (see CONFIG) you
indicate the trap destination, e.g.
*.* trap=myhost
will tell the Event Generator to send a trap for each
alarm event to myhost. Events currently covered are:
- raising an alarm
- clearing an alarm
- alarm reminders
- alarm acknowledgements
In near future Big Sister will be able to send a
trap on every status change.
You will find a file bigsister.mib in the contrib
directory. This file contains the SNMP MIB Big Sister
applies. There is also a file named bigsister.fmt.
This file is an HP OpenView trap configuration file -
OpenView will learn Big Sister traps via the command
xnmevents -load bigsister.fmt
There is currently only very basic support for trap
reciption in the uxmon agent. If a file adm/bstrapd.cfg
exists during uxmon startup the boot script (bb_start)
will start up an additional daemon called 'bstrapd'.
This daemon will listen to the SNMP trap port, receive
traps and log them to var/snmp_traplog. There is a new
monitor called 'snmp_trap' (see CONFIG) similiar to all
the other log file monitors (syslog, EventLog, etc.)
which will read the bstrapd log file and raise status
messages.
Big Sister uses TCP connections for exchanging status
information between agents and servers as well as
between multiple servers. It is very simple to build
a secure (encrypted) tunnel for these status connections
using Secure Shell (ssh).
On the host running the agent start up ssh forwarding,
e.g.:
ssh -n displaymachine -L 10192:localhost:1984 sleep 600
(This will forward connections to 10192 to displaymachine
port 1984 for 10 minutes)
In uxmon-net use the following server entry:
localhost port=10192 bsdisplay
Use ssh the same way as above. In bb-display.cfg use
the Rsync statement like e.g.:
%Rsync mastermachine:10192 prefix GROUP
Uxmon will perform any configured check every 5
minutes by default. For various reasons you may
want to change this frequency: Either the check
is CPU consuming and you do not want it to be
performed too often or the monitored service is
so important you want to check it more often. Release
0.36 introduced the 'frequency' argument in the
the uxmon-net config file: Using it you can specify
individually for each check how often uxmon should
run it, e.g.:
host1 frequency=10 ping
host2 frequency=60 ping
will run a ping test against host1 every 10 minutes
and one against host2 every 60 minutes.
The frequency argument is also effective when used
together with reporters, e.g.:
display1 frequency=10 bsdisplay
display2 frequency=1 bsdisplay
will send a status report to display1 every 10
minutes while sending a status report to display2
every minute.
You should carefully choose your testing intervals
though. Keep in mind:
- chosing a test interal smaller than the smallest
interval you use together with a reporter is
senseless, e.g.:
host1 frequency=1 ping
display1 frequency=5 bsdisplay
this will ping host1 every minute - but since
status reports are only sent every 5 minutes
a failure of host1 will though it will be
detected by uxmon shortly after occuring only
be reported to the status collector after up
to 5 minutes. In this case you would have to
use:
host1 frequency=1 ping
display1 frequency=1 bsdisplay
- chosing small intervals may significantly
increase your network or system load depending
on the test and/or the size of the status
reports sent to the status collector
- in case of service failures some tests will
take some time to perform (timeouts). Note
that if uxmon cycles repeatedly take too
much time (e.g. uxmon runs a number of tests
once a minute and they take up more than one
minute) uxmon will first try to do its best
performing cycles when it finds time to perform
then - and finally will leave out arbitrary
cycles and therefore skip tests
- the reporter intervals are limited by the
status collector. The status collector expects
every agent to refresh its status messages
every now and then no matter if they are
repeatedly the same or not. After a timeout
(15 minutes) the status collector will declare
a status being invalid and set it to 'no status
report'. Therefore the reporters' interval
should be significantly below 15 minutes.
- the frequency is supposed to be a value with
the meaning 'every xx minutes'. Minute 0
is equal to January 1st, 1970 00:00 (Unix
time). One of the effects is that checks
with the same frequency value will always
run in the same minutes. On startup and
on configuration file change every check
is performed once no matter what its
frequency value is.
Up to release 0.32 you had to run both Big Brother
and Big Sister (uxmon) clients for running all their
monitors. Since version 0.33 uxmon has basic support
for running Big Brother monitoring scripts: the
'bbscript' monitor. Using 'bbscript' you can tell
uxmon to create a minimal BB environment and run
BB commands in them, e.g. in uxmon-net:
localhost file=adm/bb-oracle.sh bbscript
will run adm/bb-oracle.sh and report its results
to the Status Collector.
Note: some BB scripts will try to include
$BBHOME/etc/bbdef.sh for setting defaults.
Usually you will be better off creating
an empty etc/bbdef.sh (in the Big Sister
etc directory) and pass the variables
necessary like
localhost env="LIMIT=100;WHATEVER=some text" file=myscript.sh bbscript
Note: the scripts must have the executable bit
set
If you are running multiple Big Sister Status
Collectors you may have a need to centrally archive
your old log files either for archiving or interpretation.
Since release 0.32 bbd + bsadmin offer a simple
way for achieving this. Bbd offers a method of
saving old log information (see "savelogs" below)
and tranfering the resulting files to a client
("sendlogs"). Bsadmin offers a command called
"archivelogs" retrieving and archiving remote
log files using the savelogs/sendlogs method. A
simple implementation of fault tolerance is included
in the algorithm: bsadmin keeps track of successfull
transmissions and therefore calling bsadmin regularly
will automatically lead to failed transmissions being
retried.
For using this feature you need first to create
an archive directory. Bsadmin will create files
with names composed of the name of the respective
display server with a date appended, e.g.
myhost.19990605. The best way for using archivelogs
is putting something like this in crontab:
4 1 * * * /usr/local/lib/bs/bin/bsadmin -d myhost archivelogs /var/archivedir
archivelogs takes an optional argument: the period
in days archivelogs will let history log files cumulate
until it archives them. The default value is "7".
NOTE: Though the archive period is well-defined bsadmin
may in some cases not archive files with exact boundaries.
It may happen that the display server is unreachable
when bsadmin is called the first time within a period.
This will lead to archiving the logs the next time
bsadmin is invoked.
"display.history" is a growing log file where bbd
stores any status changes. Since release 0.32 bbd
implements a "savelogs" command which will do the
usual log file housekeeping. Each time savelogs
will be invoked the current log file will be moved
to a file called "display.history.tag" where tag is
composed of the current date and time but may be
overriden by supplying it as an argument to "savelogs".
Saved log files older than 8 weeks are automatically
deleted (only when "savelogs" is invoked).
You can force bbd to execute savelogs e.g. weekly
by adding the following entry to your crontab file
on the Big Sister status collector host:
4 1 * * 6 /usr/local/lib/bs/bin/bsadmin -d localhost savelogs
The various tools accessing the log files (the Event
Generator, bshistory, etc.) are "savelogs"-aware,
therefore will work correctly after executing savelogs.
Big Sister supports graphical image maps since version
0.22. To use this feature follow this checklist:
- check if you've installed the Perl module GD.pm
- create a background image you want to place your
status lights on (e.g. a geographical map) and
save it as a GIF, PNG or JPEG, e.g. adm/display_map.png
Note: Older versions of GD only support GIF while
newer versions support PNG and JPEG!
- think about what you'd like to display on the
image map. Note that you need to have a group
for any of the buttons that should appear on
the map. So configure the necessary groups in
adm/bb-display.cfg
- think about where to place the buttons in the
image map and get the display coordinates (sorry,
you have to use your tools for that, but this
should be not too much work)
- create an image map config file, e.g. adm/display_map.cfg.
It could look like:
template adm/display_map.png
red www/skins/default/statred.png
yellow www/skins/default/statyellow.png
green www/skins/default/statgreen.png
purple www/skins/default/statpurple.png
clear www/skins/default/statclear.png
blue www/skins/default/statblue.png
at 15,308 GENF
at 80,238 LAUSANNE
dump www/map.png
NOTE: the generated image will be called www/mapxx.png,
where xx is a sequence number hold in
adm/display_map.cfg.seq
- in adm/bb-display.cfg add the line
%image adm/display_map.cfg
to the %Page statement you want to appear the map in.
Both the agent and status collector part of Big Sister
are compatible with Big Brother. To use the agent
(uxmon) with Big Brother list the Big Brother Display
machine in adm/uxmon-net with the reporter 'bbdisplay'
(NOTE: bsdisplay and bbdisplay ARE different!), e.g.:
server1 ping pop3 bbdisplay # this is our Big Brother display
server2 ping bsdisplay # and this is our Big Sister display
The status collector (bbd) will look to any Big Brother
components like the original one - it does even use
the same log directory structure ...
Since version 0.37 the BB compatible log file mechanism
may be switched off (save disk I/O). Make sure you put
%Option +BBLog
into bb-display.cfg if you need the www/logs/*.* file
structure (this option is on by default but future
versions might behave differently).
NOTE: not all of the functionality is available when
mixing Big Brother with Big Sister:
- Big Sister clients do not support pageing
(Big Sister does rather use bb_event_generator)
- Big Brother does not support dynamic grouping
(the "join" and "leave" commands)
- Big Brother does not support syncing of
displays (though syncing from Big Sister displays
to Big Brother ones might be implemented soon)
- Big Brother does not allow clients to send
multiple status reports 'at once' (using only
one tcp connection)
- The Big Sister event generator will not be able
to use grouping (unless you manually create
the group file) when used together with Big Brother
bbd
NOTE: a few users wished to run Big Sister "bbd" only for
web page creation and use Big Brother for all the
rest. Since 0.24 this can be done by replacing
the section in runbb.sh (if Big Sister is not
installed in /usr/local/lib/bs then change as you
need):
if test "$BBDISPLAY" = TRUE
.
.
.
fi
through
if test "$BBDISPLAY" = TRUE
then
/usr/local/lib/bs/bin/bbd -b /usr/local/lib/bs -c
fi
and make /usr/local/lib/bs/www be a link to your
BB www directory.
Any of the daemons (bb_event_generator, bbd and
uxmon) can be run in debug mode. Use the "-D" option
to start them without going in the background and
print some debug information, e.g.:
bbd -D 5
uxmon-rules.pl runs for each check it finds listed
in uxmon-net the corresponding perl code in
either the adm/Config or uxmon/Config directory.
So this is the place you want to add your checks.
The Config/* code is only a frontend to the modules
found in uxmon/Monitor though. Therefore if you want
to add a really completely new check you'll have
to write a perl module ... Best have a look at
some of the modules. Mandatory methods are
"new" and "check", "check" being called every
5 minutes.
Each Big Sister agent does connect to its server(s)
each five minutes and sends its clients status. This
is the minimum network traffic you can't avoid.
Anyway there are a few hints:
- if using network monitors (such as ping or tcp
monitors) do rather check status from an agent
which is near to the watched client (near in
terms of network bandwith) than an agent in
a distant network
- if necessary you can set up Relay Big Sister Status
Collectors serving a whole lot of agents and
reporting consolidated status to a central
Status Collector. This will prevent the agents
from connecting to the central collector
individually. Instead the status of the whole
part of this network can be transmitted in only
one connection.
A status collector is able to send the stati it
receives regularly to another status collector.
Therefore you can set up relay status collectors
serving a part of the agents and sending its
consolidated status to a Central Status Collector
regularly this way (e.g.):
- set up a central status collector on Host1
- set up a first relay status collector on Host2.
By using the "%Rsync" statement in bb-display.cfg
you can tell it to synchronize its status with
the central status collector, e.g.:
%Rsync Host1 relay1_ ALL
- set up a number of agents connecting to Host2, say
on Host3 through Host10. Tell them (uxmon-rules.pl)
to report to Host2
- set up a second status collector on Host11, use e.g.:
%Rsync Host1 relay2_ ALL
- set up a number of agents connecting to Host11, say
on Host12 through Host15. Tell them (uxmon-net)
to report to Host2
Know Host3 through Host10 will report to Host2, Host12 through
Host15 will report to Host11. Host2 and Host11 will send their
whole status to Host1 prefixing each name with "relay1_" ("relay2_")
respectively. Thus Host1 will see stati like relay1_Host3.cpu,
relay2_Host12.cpu and the like.
If you don't want to use a prefix then use
%Rsync Host1 none ALL
- set up two or more status collectors, let's say on Host1 and Host2
- tell your agents to send their reports to Host1 and Host2 (uxmon-net)
- tell your status collectors to send their reports to each other
(using %Rsync in bb-display.cfg)
|