ContentsHow to negate tests How to migrate existing notify.cfg configurations How to include host specific manually maintained text in status display How to get rid of unwanted status lights ("purples") How to set up tripwire monitor How to install and use Oracle Big Sister Module How to set the path for Big Sister How to setup performance data collection How to setup SNMP trapping How to tunnel Big Sister connection through ssh How to use check frequencies other than the default How to re-use your Big Brother monitoring scripts How to centrally archive your history log files How to make bbd rotate display.history log file How to set up an image map How to interoperate with Big Brother How to debug Big Sister How to add your own "checks" How to reduce network traffic in a large network How to set up a Relay Status Collector How to set up redundant Status Collectors How to negate testsSometimes it is useful to negate tests, e.g. to consider it a problem if a service *is* actually running. Since 0.98b4 uxmon accepts a syntax like e.g.: myhost service=smtp !tcp meaning: test if myhost responds to SMTP requests and report red if yes. Note: This does only work for new-style sensors. If you get log messages like "check !tcp failed to start up" you possibly need to read the NEWAGENT document. How to migrate existing notify.cfg configurationsSince version 0.98 the notify pager (alarming) and therefore notify.cfg are no longer supported. Instead the functionality of notify has been added to the alarm generator and its configuration bb_event_generator.cfg. A short description of the new configuration options can be found in the CONFIG documentation in the section about bb_event_generator.cfg. Look out for PAGER rules ... Migrating from notify/notify.cfg back to the alarm generator builtin notifier support is not too difficult, however. There are mainly two new mechanisms in bb_event_generator.cfg: 1. When specifying mail addresses via the "mail=" setting you can specify (optionally) the pager program as well as the mail address, e.g. mail=sendmail:me@somewhere.com,myscript:test will make the alarm generator send alarms via the sendmail program to me@somewhere.com and via the program myscript to test. 2. Every time an alarm is sent and for each alarm recipient the alarm generator goes through a set of PAGER rules (of course only if they exist :-)). A PAGER rule may specify additional routing information for the message, e.g. PAGER{$mail eq "test"} pager=myscript mail=012345 will redirect an alarm sent to "test" via the myscript program to the address 012345 (no matter what "myscript" uses this address for). Note that the PAGER rules are applied to each individual address in "mail=" and that before the PAGER rules are applied each address is split into its pager (before ":") and address (after the ":") part. So e.g. mail=sendmail:abc,myscript:test would make the alarm generator to generate two alarm messages, one matching PAGER{$mail eq "abc" and $pager eq "sendmail"}, the other matching PAGER{$mail eq "test" and $pager eq "myscript}. How to include host specific manually maintained text in status display[Simon Clift] I needed to include a piece of HTML, specifically with custom help links, for each log page (where the monitor light details are given). After some contemplation of the code I did the following. 1. I created a skin directory, for example, skins/logspecial 2. I copied the skins/default/loghtml_text.proto file to the new skins/logspecial directory. 3. Edit the skins/logspecial/loghtml_text.proto and put text in it like this: <hr /> <h3>Special Help for @HOST@.@ITEM@</h3> @(@HOST@_@ITEM@_help.inc)@ 4. Create my specific help files. So for FooMach.BarTest I have file skins/logspecial/FooMach_BarTest_help.inc which contains my specific material. 5. Restart Big Sister... How to get rid of unwanted status lights ("purples")Whenever an agent stops reporting status messages for a certain host/check the Big Sister display server will still remember this host/check and display a purple ("no report") status light. This is the indended behaviour since by this means you notice which checks are currently not working (e.g. due to a connection loss with the respective agent). However sometimes you will remove or rename a host/check and want those purples to just disappear. Big Sister 0.96 introduces a "remove" command for exactly this purpose. Agents now can send a "remove host.check" to the display server and the respective status light will disappear. If you want to get rid of status lights manually you can use the "bsadmin" command, e.g.: bin/bsadmin -d mydisplayserver remove host1.cpu host1.disk bin/bsadmin -d myserver remove host1.\* Use the adm/permissions file on the server to limit access to this new removal command (see CONFIG). How to set up tripwire monitorThis is the README file for the tripwire monitor check for Big Sister. Has been tested with RedHat 7.0 and Solaris 7.0 with commercial/free version of Tripwire 2.3. Tripwire 2.3 Portions copyright 2000 Tripwire, Inc. Tripwire is a registered trademark of Tripwire, Inc. This software comes with ABSOLUTELY NO WARRANTY; for details use --version. This is free software which may be redistributed or modified only under certain conditions; see COPYING for details. All rights reserved. Big Sister Tripwire monitor checker* Create/modify the tripwire database as root in following way: RedHat 7.0 (as root)- /etc/tripwire/twinstall.sh - modify the /etc/tripwire/twpol.txt for you need - cd /etc/tripwire/ - twadmin --create-polfile ./twpol.txt #Update the policy file - tripwire --init #Create a tripwire DB - chmod a+r tw.pol localhost-local.key site.key #Give read permission to ALL (BS user) - chmod 777 /var/lib/tripwire/report #Give write permission to ALL - tripwire -m c #Check if the report is OK Solaris 7 (as root)- cp /cdrom/cdrom0/install.cfg /tmp - vi /tmp/install.cfg TWROOT="/opt/TSS" #<--Modify as needed TWMAILMETHOD=SMTP TWSMTPHOST="mailhost.mydomain.com" TWSMTPPORT=25 - cd /cdrom/cdrom0/ - ./install.sh /tmp/install.cfg Enter the site keyfile passphrase:xxxxxxxxxxx Enter the local keyfile passphrase:xxxxxxxxxxx - vi /opt/TSS/policy/twpol.txt # Modify as needed - cd /opt/TSS/policy - /opt/TSS/bin/twadmin --create-polfile ./twpol.txt - /opt/TSS/bin/tripwire --init - chmod a+r tw.pol localhost-local.key site.key #Give read permission to ALL (BS user) - chmod 777 /opt/TSS/report #Give write permission to ALL - tripwire -m c #Check if the report is OK * Add the tripwire monitor to the $BS/uxmon/Config directory and set 755 permission. * Correct the $BS/uxmon/Config/tripwire path as needed DEFAULT: - /usr/sbin/tripwire -m c; (RedHat 7) - A check is performed every 12 hours * Add to the $BS/adm/uxmon-net an entry like this: localhost tripwire * That's all folks How to install and use Oracle Big Sister ModulePerl Requirements: DBI DBD Oracle Oracle Requirements Installed Oracle (tested with 8.0.[56] and 8.1.[56]) Spezial Oracle User and View InstallationPerl: Install the two perl modules. You can fetch them from CPAN or from http://www.perl.com Oracle Preparation:Create Special Oracle User and special View to connect reduce security problems and enable Big Sister to fetch state. Of course you can also use a dba user but we don't prefer username and password from highly privileged users in config files. Edit PLSQL Script (see sample/oracle/bs-user-view.sql in your distribution) with password for sys and change username and password according your needs: Default: [username]/[password] sys/change_on_install check_db/check_db Now your up to configure your uxmon-net configuration file... ORACLE_CONNECT: Connect String constisting looking like [HOSTNAME]:[ORACLE_SID]:[BIG_SISTER_COLON_NAME] . Eg. Your database runs on "Asterix" and the Oracle SID you want to surveille is called TVD806. You want to have the collon beeing called "ORACLE DEMO". So your ORACLE_CONNECT string looks like: "Asterix:TVD806:ORACLE DEMO". ORACLE_HOME: Home directory of the oracle product (where you have installed oracle version) ORACLE_BASE: Home of the oracle user. ORA_USER: User to connect to oracle db. ( with our prepared PLSQL default is check_db ) ORA_PASS: Password for oracle user ORA_USER ( with our prepared PLSQL default is check_db ) ORACLE_TNSADMIN: Path where oracle will find the tnsnames.ora oracle: This is the name of the which tells big sister to use the oracle module. optional fields: ORACLE_NLS_LANG, ORACLE_ORA_NLS33 eg: localhost ORACLE_CONNECT="ds1skeys:DKMS2:oAPP2" ORACLE_HOME=/u00/app/oracle/product/8.1.6 ORACLE_BASE=/u00/app/oracle ORACLE_NLS_LANG=american_america.WE8ISO8859P1 ORACLE_ORA_NLS33=/u00/app/oracle/product/8.1.6/ocommon/nls/admin/data ORA_USER=TSMW ORA_PASS=TSMW ORACLE_TNSADMIN=/u00/app/oracle/network oracle Reporting:Big Sister reports three states for oracle: red DB is not availible yellow DB is up, but in restricet mode green DB is ready Philip Markwalder <Philip.Markwalder@trivadis.com> How to set the path for Big SisterBy default Big Sister searches for commands it attempts to execute in the search path set when Big Sister was started. Due to the differences of the hosting systems this might not always be the right choice. Since version 0.96 you can set the search path via an entry in the adm/resources file: - if the file adm/resources does not exist just create it - add an entry like *.path=/sbin:/usr/sbin:/bin:/usr/bin:/usr/local/bin (set the path for all Big Sister components) or entries like *.path=/sbin:/usr/sbin uxmon.path=/sbin:/usr/sbin:/bin:/usr/bin:/usr/local/bin (set a very limitted path for all Big Sister components, an extended path for uxmon only) Note that Big Sister tries to execute quite some commands, like e.g. - df - mount - ps - ping / fping - nslookup - ... How to setup performance data collectionSince release 0.94 Big Sister supports performance data collection and visualizing. This feature relies on Tobias Oetiker's RRDTool. To enable performance collection you will have to install RRDTool on the Big Sister server system(s). Before the server can start to process performance data you have to tell the agents to submit the system parameters you are interested in. You do so by adding one or more lines like myserver frequency=10 perfdata=etc/perf options=perf bsdisplay to the repective uxmon-net files (the sample uxmon-net file provides these lines - you just have to uncomment them). The file etc/perf in the sample above specifies which of the system parameters it knows about uxmon should submit to the Big Sister server myserver. Once uxmon is run with at least one 'perfdata' line it creates a list of known parameters in the file var/uxmon-variables. The file etc/perf should contain lines like systemname:variablename where both systemname and variablename are regular expressions and therefore may contain wildcards. Two 'perf'-files containing lists of some basic parameters like disk space usage, cpu usage, etc. are installed by default (etc/perf and etc/perfslow). I suggest you just uncomment the respective lines in uxmon-net and see what is happening before you dive into the depths of uxmon-variables. After uxmon starts submitting performance data the server should immediately start creating the necessary databases. Since the server does not really know the meanings of the values submitted by the agent you have to tell it how the graphs should look like (e.g. if values should be plotted as being percentages, bandwidth or whatever). For this purpose there is a file called etc/graphtemplates containing definitions of all the graphs the server is expected to create. Whenever the server gets a specific system parameter the first time it looks it up in the graphtemplates file, checks if this value is associated with a graph and if yes creates the respective database. Big Sister comes with a graphtemplates file containing the basic graph definitions. So I suggest using this pre-installed file for a start. Note that the graphtemplates file is only consulted for database creation - that's why it is called 'templates'. Modifications of graphtemplates will not effect already existing databases. Anyway, you can purge/recreate databases by just removing the respective files in the var/graphs directory and restarting the server. However already collected data will get lost if you do so. After databases have been created the systems listed on your status pages should get a little graph symbol. If you click on this symbol you will get to an index page listing all the available graphs for the respective system. Note that it might take a while before the graph symbols appear since the server will only update status pages on status change - database creation will not force an update. If you are in a hurry you can force an update by restarting Big Sister on the server. How to setup SNMP trappingSince release 0.38 Big Sister supports - receiving SNMP traps (basic support) - sending SNMP traps whenever an alarm is raised Trap sending is currently merely done by the Event Generator. By setting the variable "trap" in the bb_event_generator.cfg config file (see CONFIG) you indicate the trap destination, e.g. *.* trap=myhost will tell the Event Generator to send a trap for each alarm event to myhost. Events currently covered are: - raising an alarm - clearing an alarm - alarm reminders - alarm acknowledgements In near future Big Sister will be able to send a trap on every status change. You will find a file bigsister.mib in the contrib directory. This file contains the SNMP MIB Big Sister applies. There is also a file named bigsister.fmt. This file is an HP OpenView trap configuration file - OpenView will learn Big Sister traps via the command xnmevents -load bigsister.fmt There is currently only very basic support for trap reciption in the uxmon agent. If a file adm/bstrapd.cfg exists during uxmon startup the boot script (bb_start) will start up an additional daemon called 'bstrapd'. This daemon will listen to the SNMP trap port, receive traps and log them to var/snmp_traplog. There is a new monitor called 'snmp_trap' (see CONFIG) similiar to all the other log file monitors (syslog, EventLog, etc.) which will read the bstrapd log file and raise status messages. How to tunnel Big Sister connection through sshBig Sister uses TCP connections for exchanging status information between agents and servers as well as between multiple servers. It is very simple to build a secure (encrypted) tunnel for these status connections using Secure Shell (ssh). Agent to ServerOn the host running the agent start up ssh forwarding, e.g.: ssh -n displaymachine -L 10192:localhost:1984 sleep 600 (This will forward connections to 10192 to displaymachine port 1984 for 10 minutes) In uxmon-net use the following server entry: localhost port=10192 bsdisplay Server to ServerUse ssh the same way as above. In bb-display.cfg use the Rsync statement like e.g.: %Rsync mastermachine:10192 prefix GROUP How to use check frequencies other than the defaultUxmon will perform any configured check every 5 minutes by default. For various reasons you may want to change this frequency: Either the check is CPU consuming and you do not want it to be performed too often or the monitored service is so important you want to check it more often. Release 0.36 introduced the 'frequency' argument in the the uxmon-net config file: Using it you can specify individually for each check how often uxmon should run it, e.g.: host1 frequency=10 ping host2 frequency=60 ping will run a ping test against host1 every 10 minutes and one against host2 every 60 minutes. The frequency argument is also effective when used together with reporters, e.g.: display1 frequency=10 bsdisplay display2 frequency=1 bsdisplay will send a status report to display1 every 10 minutes while sending a status report to display2 every minute. You should carefully choose your testing intervals though. Keep in mind: - chosing a test interal smaller than the smallest interval you use together with a reporter is senseless, e.g.: host1 frequency=1 ping display1 frequency=5 bsdisplay this will ping host1 every minute - but since status reports are only sent every 5 minutes a failure of host1 will though it will be detected by uxmon shortly after occuring only be reported to the status collector after up to 5 minutes. In this case you would have to use: host1 frequency=1 ping display1 frequency=1 bsdisplay - chosing small intervals may significantly increase your network or system load depending on the test and/or the size of the status reports sent to the status collector - in case of service failures some tests will take some time to perform (timeouts). Note that if uxmon cycles repeatedly take too much time (e.g. uxmon runs a number of tests once a minute and they take up more than one minute) uxmon will first try to do its best performing cycles when it finds time to perform then - and finally will leave out arbitrary cycles and therefore skip tests - the reporter intervals are limited by the status collector. The status collector expects every agent to refresh its status messages every now and then no matter if they are repeatedly the same or not. After a timeout (15 minutes) the status collector will declare a status being invalid and set it to 'no status report'. Therefore the reporters' interval should be significantly below 15 minutes. - the frequency is supposed to be a value with the meaning 'every xx minutes'. Minute 0 is equal to January 1st, 1970 00:00 (Unix time). One of the effects is that checks with the same frequency value will always run in the same minutes. On startup and on configuration file change every check is performed once no matter what its frequency value is. How to re-use your Big Brother monitoring scriptsUp to release 0.32 you had to run both Big Brother and Big Sister (uxmon) clients for running all their monitors. Since version 0.33 uxmon has basic support for running Big Brother monitoring scripts: the 'bbscript' monitor. Using 'bbscript' you can tell uxmon to create a minimal BB environment and run BB commands in them, e.g. in uxmon-net: localhost file=adm/bb-oracle.sh bbscript will run adm/bb-oracle.sh and report its results to the Status Collector. Note: some BB scripts will try to include $BBHOME/etc/bbdef.sh for setting defaults. Usually you will be better off creating an empty etc/bbdef.sh (in the Big Sister etc directory) and pass the variables necessary like localhost env="LIMIT=100;WHATEVER=some text" file=myscript.sh bbscript Note: the scripts must have the executable bit set How to centrally archive your history log filesIf you are running multiple Big Sister Status Collectors you may have a need to centrally archive your old log files either for archiving or interpretation. Since release 0.32 bbd + bsadmin offer a simple way for achieving this. Bbd offers a method of saving old log information (see "savelogs" below) and tranfering the resulting files to a client ("sendlogs"). Bsadmin offers a command called "archivelogs" retrieving and archiving remote log files using the savelogs/sendlogs method. A simple implementation of fault tolerance is included in the algorithm: bsadmin keeps track of successfull transmissions and therefore calling bsadmin regularly will automatically lead to failed transmissions being retried. For using this feature you need first to create an archive directory. Bsadmin will create files with names composed of the name of the respective display server with a date appended, e.g. myhost.19990605. The best way for using archivelogs is putting something like this in crontab: 4 1 * * * /usr/local/lib/bs/bin/bsadmin -d myhost archivelogs /var/archivedir archivelogs takes an optional argument: the period in days archivelogs will let history log files cumulate until it archives them. The default value is "7". NOTE: Though the archive period is well-defined bsadmin may in some cases not archive files with exact boundaries. It may happen that the display server is unreachable when bsadmin is called the first time within a period. This will lead to archiving the logs the next time bsadmin is invoked. How to make bbd rotate display.history log file"display.history" is a growing log file where bbd stores any status changes. Since release 0.32 bbd implements a "savelogs" command which will do the usual log file housekeeping. Each time savelogs will be invoked the current log file will be moved to a file called "display.history.tag" where tag is composed of the current date and time but may be overriden by supplying it as an argument to "savelogs". Saved log files older than 8 weeks are automatically deleted (only when "savelogs" is invoked). You can force bbd to execute savelogs e.g. weekly by adding the following entry to your crontab file on the Big Sister status collector host: 4 1 * * 6 /usr/local/lib/bs/bin/bsadmin -d localhost savelogs The various tools accessing the log files (the Event Generator, bshistory, etc.) are "savelogs"-aware, therefore will work correctly after executing savelogs. How to set up an image mapBig Sister supports graphical image maps since version 0.22. To use this feature follow this checklist: - check if you've installed the Perl module GD.pm - create a background image you want to place your status lights on (e.g. a geographical map) and save it as a GIF, PNG or JPEG, e.g. adm/display_map.png Note: Older versions of GD only support GIF while newer versions support PNG and JPEG! - think about what you'd like to display on the image map. Note that you need to have a group for any of the buttons that should appear on the map. So configure the necessary groups in adm/bb-display.cfg - think about where to place the buttons in the image map and get the display coordinates (sorry, you have to use your tools for that, but this should be not too much work) - create an image map config file, e.g. adm/display_map.cfg. It could look like: template adm/display_map.png red www/skins/default/statred.png yellow www/skins/default/statyellow.png green www/skins/default/statgreen.png purple www/skins/default/statpurple.png clear www/skins/default/statclear.png blue www/skins/default/statblue.png at 15,308 GENF at 80,238 LAUSANNE dump www/map.png NOTE: the generated image will be called www/mapxx.png, where xx is a sequence number hold in adm/display_map.cfg.seq - in adm/bb-display.cfg add the line %image adm/display_map.cfg to the %Page statement you want to appear the map in. How to interoperate with Big BrotherBoth the agent and status collector part of Big Sister are compatible with Big Brother. To use the agent (uxmon) with Big Brother list the Big Brother Display machine in adm/uxmon-net with the reporter 'bbdisplay' (NOTE: bsdisplay and bbdisplay ARE different!), e.g.: server1 ping pop3 bbdisplay # this is our Big Brother display server2 ping bsdisplay # and this is our Big Sister display The status collector (bbd) will look to any Big Brother components like the original one - it does even use the same log directory structure ... Since version 0.37 the BB compatible log file mechanism may be switched off (save disk I/O). Make sure you put %Option +BBLog into bb-display.cfg if you need the www/logs/*.* file structure (this option is on by default but future versions might behave differently). NOTE: not all of the functionality is available when mixing Big Brother with Big Sister: - Big Sister clients do not support pageing (Big Sister does rather use bb_event_generator) - Big Brother does not support dynamic grouping (the "join" and "leave" commands) - Big Brother does not support syncing of displays (though syncing from Big Sister displays to Big Brother ones might be implemented soon) - Big Brother does not allow clients to send multiple status reports 'at once' (using only one tcp connection) - The Big Sister event generator will not be able to use grouping (unless you manually create the group file) when used together with Big Brother bbd NOTE: a few users wished to run Big Sister "bbd" only for web page creation and use Big Brother for all the rest. Since 0.24 this can be done by replacing the section in runbb.sh (if Big Sister is not installed in /usr/local/lib/bs then change as you need): if test "$BBDISPLAY" = TRUE . . . fi through if test "$BBDISPLAY" = TRUE then /usr/local/lib/bs/bin/bbd -b /usr/local/lib/bs -c fi and make /usr/local/lib/bs/www be a link to your BB www directory. How to debug Big SisterAny of the daemons (bb_event_generator, bbd and uxmon) can be run in debug mode. Use the "-D" option to start them without going in the background and print some debug information, e.g.: bbd -D 5 How to add your own "checks"uxmon-rules.pl runs for each check it finds listed in uxmon-net the corresponding perl code in either the adm/Config or uxmon/Config directory. So this is the place you want to add your checks. The Config/* code is only a frontend to the modules found in uxmon/Monitor though. Therefore if you want to add a really completely new check you'll have to write a perl module ... Best have a look at some of the modules. Mandatory methods are "new" and "check", "check" being called every 5 minutes. How to reduce network traffic in a large networkEach Big Sister agent does connect to its server(s) each five minutes and sends its clients status. This is the minimum network traffic you can't avoid. Anyway there are a few hints: - if using network monitors (such as ping or tcp monitors) do rather check status from an agent which is near to the watched client (near in terms of network bandwith) than an agent in a distant network - if necessary you can set up Relay Big Sister Status Collectors serving a whole lot of agents and reporting consolidated status to a central Status Collector. This will prevent the agents from connecting to the central collector individually. Instead the status of the whole part of this network can be transmitted in only one connection. How to set up a Relay Status CollectorA status collector is able to send the stati it receives regularly to another status collector. Therefore you can set up relay status collectors serving a part of the agents and sending its consolidated status to a Central Status Collector regularly this way (e.g.): - set up a central status collector on Host1 - set up a first relay status collector on Host2. By using the "%Rsync" statement in bb-display.cfg you can tell it to synchronize its status with the central status collector, e.g.: %Rsync Host1 relay1_ ALL - set up a number of agents connecting to Host2, say on Host3 through Host10. Tell them (uxmon-rules.pl) to report to Host2 - set up a second status collector on Host11, use e.g.: %Rsync Host1 relay2_ ALL - set up a number of agents connecting to Host11, say on Host12 through Host15. Tell them (uxmon-net) to report to Host2 Know Host3 through Host10 will report to Host2, Host12 through Host15 will report to Host11. Host2 and Host11 will send their whole status to Host1 prefixing each name with "relay1_" ("relay2_") respectively. Thus Host1 will see stati like relay1_Host3.cpu, relay2_Host12.cpu and the like. If you don't want to use a prefix then use %Rsync Host1 none ALL How to set up redundant Status Collectors- set up two or more status collectors, let's say on Host1 and Host2 - tell your agents to send their reports to Host1 and Host2 (uxmon-net) - tell your status collectors to send their reports to each other (using %Rsync in bb-display.cfg) |