Startwatch Hang Protection

Hi,

I wonder if someone call tell me how SmartWatch hang protection works please ?

I have a Windows Console Application (essentially a DOS program) which does not take any user input, it is configured through a configuration file then runs unattended. In normal operation, it does some com port activity and then goes back to sleep for a period (~1 minute) , using the Windows sleep() function. On occasion, the com routine can hang and may or may not recover.

The question is, can StartWatch hang protection detect scenarios like this or would it get confused by the normal “sleep” mode ?

I know, the obvious thing is to test it, but stopping and starting the program is a bit awkward, so if anyone (Steve, SoftWx ?) knows the answer I’d appreciate it please,

regards
Dave

Let me do a quick test to make sure what I tell you is correct, and I’ll give you an answer shortly.

Steve
SoftWx

StartWatch uses two methods to see if a program is alive. One method is to see if the application responds to a do-nothing windows message. For windows applications, this is normally a pretty good indicator. However, sometimes a program becomes unresponsive if it spends time doing work, and does not bother to service its windows message queue while that processing occurs. So StartWatch also monitors the CPU usage of the application. If the app is doesn’t respond to windows messages, AND it has no cpu usage, then StartWatch will fairly quickly flag the program as hung. If it doesn’t respond to windows messages, but there is CPU activity, StartWatch will allow more time to elapse in that state before it declares it hung.

For console applications, things don’t work quite as well. StartWatch starts console applications within a command window, and that DOS command window will generally always be responsive independent of the state of the console application it is wrapping. So it’s unlikely StartWatch will ever flag it as hung. This is actually a bug, which I discovered while running some tests before answering. StartWatch has code to handle windowless console apps by only using CPU usage in determining if the app is hung. But since StartWatch always runs the console programs with a window, there never are windowless console apps. The way I need to address this is by giving the option to run a program without a window if the user desires that. The second thing is to also use CPU usage alone as one of the triggers to “hungness”. For a windows application, that won’t change anything (which is good) because if it responds to a windows message then it used some CPU responding. But for console apps wrapped by command windows, that change will cover the case where the window is responsive, but the underlying application has no CPU activity since it isn’t handling the windows activity.

To answer your question, yes a program sleeping will be looked on as a program hanging. However, there is an intermediate state that StartWatch flags a program with when it first looks like it might be hung. The program has to remain in that state for some period of time before it is treated as really hung. That amount of time can be controlled roughly by using a slider control for hang “sensitivity”. Low sensitivity means that a program would have to look hung a longer time before being declared hung. High sensitivity means the program would not have to look hung very long before being declared hung. You can precisely control the time the program must look hung by editing the ini file. Eventually I’ll have an advanced options screen for doing this sort of fine control. So if you know the sleep period of the program, you can set the hang trigger to be just a little beyond that.

For now though, until I fix the bug I mentioned, StartWatch very likely won’t detect your console app is hung.

If you have any more questions about this, or some input on how you’d like it to behave for your purposes, please let me know.

Steve
SoftWx

Steve,

thanks a lot for the fast and comprehensive reply !

You’ve really clarified the situation, so thanks for that. Looks like I’ll need to wait for an upgraded version to really cover the cases that I need. In the meantime, can you point me to the ini file that you mentioned please ? There does not seem to be one in the SmartWatch or my application directory ?

In terms of the “wish list”, not sure what you had in mind, but I’d like to see the ability to set the time (seconds) that could be configured for the application to be allowed to “sleep” before being flagged as hung - sort of a calibrated slide bar. So, for example, StartWatch could monitor the app say, every 5 minutes, but allow the app to be “unreponsive” for a different period.

regards
Dave

The location of the ini file depends on whether you’re running vista or xp. In Vista its under users{username}\AppData\Local\SoftWx\StartWatch. In XP it’s in Documents and Settings{username}\Local Settings\Application Data\SoftWx\StartWatch

Steve

Ah !

Thanks Steve - found it !

btw - is it not the middle of the night with you at the moment ?

regards
Dave

Do you have any control over this app? Maybe you could add better error handling?

Yup - I do have control of it since I wrote it !

However, it talks to a 1-Wire network using “C” source from Maxim (Dallas Semiconductor), the 1-wire network error handling is part of the Maxim Source and I’ve not quite got to the bottom of how the 1-wire errors are handled yet - I am working on it though !

The errors are few and pretty far between, StartWatch seemed like a good fall back method of the program stalls when I’m away from home,

regards
Dave

Ah, when you wrote “com port” I was thinking it was something simpler #-o It’s a DOS program?

Yes, it was the middle of the night :slight_smile:

I should add one caveat you should be aware of. There is a “feature” of Windows where certain parts of calls that I/O drivers make into the Windows kernel run at an elevated level. When these calls hang, usually because of bugs or improperly written timeout handling in the driver, Windows will not allow the process to be terminated until that call completes (which it isn’t doing because that’s where it’s hung!). WeatherLink has been known to do this. I worked with a user having this problem. I wrote a special version of StartWatch for him that would try every technique I could find for terminating a process, even some very esoteric techniques that come out of the black hat hacker community. None of these techniques would terminate the WeatherLink process when it would hang during I/O. Microsoft is aware of this issue, and was supposed to have fixed this for Vista. I haven’t actually tested it to see if that’s the case. This Windows issue is why I added the option to have StartWatch to reboot the computer if a hung process can’t be terminated.

When your program hangs, if you are able to terminate it using TaskManager, then StartWatch will be able to terminate it too. But if TaskManager can’t terminate it, StartWatch may not be able to either. It’s hard to say for sure though, because StartWatch tries some extra things that will be able to terminate processes that TaskManager can’t or won’t. But in this rare instance I’ve described above, nothing will be able to terminate the process.

Since you program, if you’re curious to know more about this Windows feature, I may be able to find the link to the Microsoft documentation acknowledging it.

Steve
SoftWx

Hi Steve,

glad to see that you’re foregoing sleep to deliever a quality service to your users :slight_smile:

The program is a “Win32 Console Application” - written with Visual Studio 2005, but I guess that that makes is pretty much a DOS application. I have not got around to learning the nitty-gritty of “proper” Windows programming - I’m just about able to do what I need to do with my console/DOS apps.

When the program hangs, I can Ctrl-C out of it - I think ! The reason that I’m not 100% sure is that I’ve not actually been around when the program hangs in the com routine - up to now, it has eventually recovered - even though that might take a number of hours.

Yes, I would be curious to know about the Windows “feature” that you talked about - but don’t spend a lot of time trying to find it - if you have it easily handy, then let me know, but don’t worry about it if you can’t put your hands on it quickly.

regards
Dave

http://support.microsoft.com/kb/270117

if you google on

windows terminateprocess “kernel mode”

you’ll find some discussions of the topic

Thanks Steve !

regards
Dave