...Help!!! - Memory leak (or more like flood!)

Hello everybody,

Not been posting much lately, that work thing :frowning:
background
I was running 4.6 for absolutely ages, and it was fine - Ran for weeks on end.
I am running Debian, and as such I do apt-get dist-upgrade every now and then, in the last batch, I got soooo sc***ed
Now, WDisplay will eat up all my memory in less that 12 hours 8O I will admit that I was casual about upgrading, as it has never given me any real problem before now (and yes I was running unstable as well)
I did try 5.5, but that was just as bad
I am now moving to testing distro… (pinning it)

I am suspecting that this is related to some of the libs used by WD, can anyone help me out with a stable list of the libs that they use for WD?

Or is there something else that I have missed?

rgds

Johan

there has been lots of changes since 4.6
mainly with the graphing…
and the cutom tags too

how do you or can you monitor the memory use of weatherd in linux?

anyone else noticed this?

From what I can gather from the message is there is something wrong in config or linux because back when I was running 4.6 only thing crashing was ftp sending not the program. my 5.6 is still running 50 hrs now i am runnning Mandrake 10.1 I going to upgrade today to a newer version.
Coyote :smiley:

Yup, there is something wrong with my linux… Not sure what tho, and the only thing that is going bonkers is WD, it is fine for the other things.
it is part of a recent upgrade that I went through, hence the question what does WD use. If I could figure that out maybe then I could sort the rest out… ahh well, feels like a rebuild to me!
:angryfire: as it falls over in about 9 hrs…

All I needed!!

rgds

Johan

is there any errors under view, program error log though jabul?
also what happens when you go back to 4.6?
(do you have that?)

It was such a mess, there was no way out… I did the deed, luckily I had thought about it when I built it in the first place, so, in all it was not to bad…
An observation I did though was that when WD runs OK, it has 3 processes in top, when it ran bad it had only the one… (that was eating all my memory) I also noticed that on the same build, if I went back to my 2.4.x kernel, it was also ok, but on the 2.6.x it was broken. Not sure how to sum this one up now, but if anything, I am glad it is all working again.

rgds

Johan

…nope

It is back again, and I don’t think it is going to go away again… As I did an upgrade, it broke again, now only lists one task i ‘top’ and it is leaking memory by the second - I have created a script to restart the program every two hours (although not sure that is often enough! - yes it is that bad!!)

I suspect that there have been changes to some of the underlying packages that WD relies upon… I say this, because this is the 3rd time it happens, and it is when I try to go upto or close to 2.6.12 kernel (and some of the other programs need that etc etc) by the looks of it, only WD has issues with it.

Windy, is this something that can be resolved, if so, what info would you need from me? (or anyone else that can help)

Still running a Debian distro with kernel 2.6.8 (pretty much an early sarge:testing)

I am running Debian Sarge, but 2.4.x kernel, and I can’t say I am having those problems (which you already suspected)

maybe if you turned things off / disabled things/ until it stopped, to find out what is causing it
otherwise yes it sounds like some problem with some other drivers, etc, beyond my control

well, there is not that much to turn off really, the server runs DNS, DHCP, Samba, Netatalk, and a bit of NFS (kernel server). I’ll give you that it is not dedicated to WD - but this is not stretch the unit (0.06 is avg load). I think the key to the problem is that with my current config (that is broken) WD shows up only once in top, when it was working OK, it had spawned 3 threads, so I could see 3 listings in top - anyone got a clue why this would be? Spawning threads to be reused, but they are dying on each instance would explain where the memory is going, but they obviously live long enough for the program to not crash. Just a though

Vergil, could you send my some more info on you build?

(BTW in 50min WD chews up 45% of all memory - that is of 512MB RAM - not sure if top counts the swap space or not but there is another 1GB of that)

rgds

Johan

what i mean is turn of things that wd does

Huh, Clearly a case of not seeing the trees for the forest… (clearly not quite woken up yet - teaches me for posting before breakfast!)

I will do that, it currently generates WEB and the files for the WDL. This eve, I will turn the whole lot off, and just let it sit there and record, we will see what effect that has.

My observations are that it starts straight away (Web files are generated every 15 mins), the only functions that are that frequent is the logging of current data to the files and the txt files for WDL…

I’ll keep posting any updates.

rgds

Johan

ok, now everything except the program itself is turned off, and it is still leaking like a sieve…

I have built the current unit on the Online distribution (testing), now downloading the DVD of the stable 3.1r0a - with a bit of luck that might be better…
coming down overnight so I will try this tomorrow… Still, this has me stumped!

Vergil, what did you build you server on?

rgds

Johan

ok, I can confirm that something may be happening here. I just checked, it WD was dead. It had died over 36 hours ago (after like 24 hours of being up). I hadn’t check it for a couple days (since the actual station is a foot away from me). I started it back up to see how long it lasts.

Johan, I started with sarge (when it was testing) back in Feb. But when sarge went stable, I changed the server to stable. It should be running 3.1r0a and kernel 2.4.27

Vergil,

Do you keep your box up-to-date with apt-get, or are you happy that it is stable with what you got?
(I recall I did not like the 2.4 kernel because of lacking ACPI support)
I will now move this box to the stable destroy, it really bugs me this, it was running fine on Sarge Unstable since last year (pre release), and then bang…

The question is if the thread management has changed in any parts that WD relies upon or there is some other change that we have not spotted, as moving back to ‘older’ code may only be buying some time, there might be a change in the air.

Windy,
I’ll see if I get time to try one of the memory management/debugging tools to see if I can find the offending bit/call/process - not sure how possible that is, not sure how ‘clean’ the kylix binary code is. As an aside, I also noticed that WD does not respond to a HUP (hang up) call, but then why would it - it is a graphical front end program. It was just that I feel a little worried about terminating the program every hour, so I was trying to see if I could ask the program to quit itself. But alas no… This would be needed for proper remote control of the process.

I will keep battling this…

More posts tonight (BST)

Regards

Johan

The plot thickens…


More strace.txt (70.4 KB)

I will check that out when I get home tonight (currently away from the computer and I don’t have external ssh access permited.

Now, also ran this through ‘Valgrind’ and the output is in the attached file.

the command used was:
valgrind

what is a futex?
something to do with the com port?
or memmory?
what happens if you set wd to stationless mode, and then restart it (so it does not access the com port) as a test

Something the compiler uses to make the binary…

This is what the Man section says:

NAME