[boost] [interprocess] Semaphore cleanup after crash

Discussion:

Sachin Garg

2008-07-29 09:23:01 UTC

When using semaphores to synchronize separate processes, everything
works fine when each process exits nicely (closing its semaphores
before exit). But things get really messy when a process might crash.
I am unable to figure out how to recover from such a crash which
leaves semaphores in inconsistent state.

If a semaphore is not in-use (open) by any process, in this case (in
my application) I can safely 'remove' it and start afresh. Is there
some way to find out if any process is using a semaphore at a time so
that I can call 'remove'?

When I just add a 'remove' on process start this works great on
windows (as remove just fails if another process has the semaphore
open), but on linux sem_unlink is used which has the behavior of
deleting it even if its in use.

What is the general practice when it comes to cleaning up semaphores
after process crashes? Maybe some way to ensure that 'post' and
'close' are always called even when application has otherwise crashed?
Is there some way to use boost's windows style semaphores on linux
instead of native posix style?

I tried looking and many have asked this question (in context of
recovering from posix semaphores, which are used by boost on linux),
but I couldn't find any answers. Lars had asked this here also, almost
an year ago but no answers in that thread either. This seems like a
basic issue but am totally lost on how to even approach it.

Sachin Garg

Bob Wilkinson

2008-07-29 15:55:42 UTC

Permalink

Post by Sachin Garg
When using semaphores to synchronize separate processes, everything
works fine when each process exits nicely (closing its semaphores
before exit). But things get really messy when a process might crash.
I am unable to figure out how to recover from such a crash which
leaves semaphores in inconsistent state.
If a semaphore is not in-use (open) by any process, in this case (in
my application) I can safely 'remove' it and start afresh. Is there
some way to find out if any process is using a semaphore at a time so
that I can call 'remove'?
When I just add a 'remove' on process start this works great on
windows (as remove just fails if another process has the semaphore
open), but on linux sem_unlink is used which has the behavior of
deleting it even if its in use.
What is the general practice when it comes to cleaning up semaphores
after process crashes? Maybe some way to ensure that 'post' and
'close' are always called even when application has otherwise crashed?
Is there some way to use boost's windows style semaphores on linux
instead of native posix style?
I tried looking and many have asked this question (in context of
recovering from posix semaphores, which are used by boost on linux),
but I couldn't find any answers. Lars had asked this here also, almost
an year ago but no answers in that thread either. This seems like a
basic issue but am totally lost on how to even approach it.
Sachin Garg

Hi Sachin

ipcs -s -p will show a list of semaphores and the
associated pids of the process which created them.

Using the pids obtained from the above, you can check the
process table to check whether the process is still alive?

e.g.
(N.B. I use the -m, rather than the -s option to ipcs for
illustration, since I have no semaphores, but do have
shared memory used).

***@spain:~$ ipcs -m -p

------ Shared Memory Creator/Last-op --------
shmid owner cpid lpid
327680 bob 7192 7238
360449 bob 7233 7284
393218 bob 7273 7187
425987 bob 7235 7187
458756 bob 7279 7187
491525 bob 7235 7187
524294 bob 7284 29961
557063 bob 7227 7187
589832 bob 7227 7187
622601 bob 7287 7187
655370 bob 7309 7187
688139 bob 7347 7187
720908 bob 16117 16124
753677 bob 16117 16124

***@spain:~$ ps ax | grep 7192
7192 tty2 Sl 0:04 /usr/bin/gnome-session
19670 pts/3 R+ 0:00 grep --colour=auto 7192
***@spain:~$

A little perl script could be written to do this.

Bob

--
To make tax forms true they should read "Income Owed Us" and "Incommode You".

Sachin Garg

2008-07-29 17:38:04 UTC

Permalink

On Tue, Jul 29, 2008 at 9:25 PM, Bob Wilkinson

Post by Bob Wilkinson

ipcs -s -p will show a list of semaphores and the
associated pids of the process which created them.
Using the pids obtained from the above, you can check the
process table to check whether the process is still alive?
e.g.
(N.B. I use the -m, rather than the -s option to ipcs for
illustration, since I have no semaphores, but do have
shared memory used).

[snip]

Post by Bob Wilkinson
A little perl script could be written to do this.

I have a named semaphore so can I do all this only for that one named
semaphore? And can this be done programatically without relying on
external executables?

If its possible (I hope it is), will it make sense to add such a
smart_remove to boost interprocess? Idea behind using boost was to
make my code portable and hide the platform related intricacies. Doing
all this myself squarely defeats that purpose (but for now, I would
still love to know the solution if any :-).

Thanks,
Sachin Garg

Ion Gaztañaga

2008-07-29 17:02:46 UTC

Permalink

Post by Sachin Garg
If a semaphore is not in-use (open) by any process, in this case (in
my application) I can safely 'remove' it and start afresh. Is there
some way to find out if any process is using a semaphore at a time so
that I can call 'remove'?

Inteprocess is modeled after posix primitives, so there is no way to
know if someone is attached. Think about this as if the semaphore was a
file. What would you do if you are communicating two processes with a
file and one process crashes? I think you should have some keepalive
mechanism to detect that a process has died and recreate ipc mechanisms
on failure.

Post by Sachin Garg
When I just add a 'remove' on process start this works great on
windows (as remove just fails if another process has the semaphore
open), but on linux sem_unlink is used which has the behavior of
deleting it even if its in use.

This same problem happens with std::remove(const char *filename)
(windows version fails if the file is in use but unix version calls
unlink and removes that file from the filesystem without failing while
attached processes still write to that phantom file) but this is a
difference I don't know how to solve.

Post by Sachin Garg
What is the general practice when it comes to cleaning up semaphores
after process crashes? Maybe some way to ensure that 'post' and
'close' are always called even when application has otherwise crashed?
Is there some way to use boost's windows style semaphores on linux
instead of native posix style?
I tried looking and many have asked this question (in context of
recovering from posix semaphores, which are used by boost on linux),
but I couldn't find any answers. Lars had asked this here also, almost
an year ago but no answers in that thread either. This seems like a
basic issue but am totally lost on how to even approach it.

In general I see no general solution. You can't register cleanup actions
when a process crashes (well, the OS can, but not the user code). If
anyone has any idea about this, I would be glad to hear it.

Regards,

Ion

Sachin Garg

2008-07-29 18:03:12 UTC

Permalink

Inteprocess is modeled after posix primitives, so there is no way to know if
someone is attached. Think about this as if the semaphore was a file. What
would you do if you are communicating two processes with a file and one
process crashes? I think you should have some keepalive mechanism to detect
that a process has died and recreate ipc mechanisms on failure.

Yep, I understand this is the posix way of removing everything, be it
semaphores or other stuff. By keepalive do you mean having an umbrella
process to take care of recovering from such crashes? Or is it some
other standard mechanism that I am not aware of?

This same problem happens with std::remove(const char *filename) (windows
version fails if the file is in use but unix version calls unlink and
removes that file from the filesystem without failing while attached
processes still write to that phantom file) but this is a difference I don't
know how to solve.

Yep. I tried forcing use of interprocess' cywgin and windows
implementation of named_semaphore on linux (just for experimenting) as
these are done differently. Windows one fails to compile and cygwin
implementation fails as that uses shm_unlink which works same as
sem_unlink, the posix way :-)

In general I see no general solution. You can't register cleanup actions
when a process crashes (well, the OS can, but not the user code). If anyone
has any idea about this, I would be glad to hear it.

The method discussed with Bob (in same thread), does that makes sense?
To programatically do what he proposes using commands.

I am not aware of system calls for this but it seems possible (ipcs
does this 'somehow') to find which process last used a semaphore and
then it can be checked if that process id is still alive, and only
then we can call sem_unlink. All this can be abstracted with boost in
a smart_remove or a safe_remove. Idea being to sem_unlink only when no
other process is using it.

If it doesn't looks like something of too much general value (though I
think it would be) I would atleast like to do this in my code, so any
pointers to relevant system calls will be really really helpful.

Thanks for all the great work done in interprocess.

Sachin Garg

Sachin Garg

2008-07-29 19:20:55 UTC

Permalink

Post by Sachin Garg

Inteprocess is modeled after posix primitives, so there is no way to know if
someone is attached. Think about this as if the semaphore was a file. What
would you do if you are communicating two processes with a file and one
process crashes? I think you should have some keepalive mechanism to detect
that a process has died and recreate ipc mechanisms on failure.

This same problem happens with std::remove(const char *filename) (windows
version fails if the file is in use but unix version calls unlink and
removes that file from the filesystem without failing while attached
processes still write to that phantom file) but this is a difference I don't
know how to solve.

In general I see no general solution. You can't register cleanup actions
when a process crashes (well, the OS can, but not the user code). If anyone
has any idea about this, I would be glad to hear it.

The method discussed with Bob (in same thread), does that makes sense?
To programatically do what he proposes using commands.
I am not aware of system calls for this but it seems possible (ipcs
does this 'somehow') to find which process last used a semaphore and
then it can be checked if that process id is still alive, and only
then we can call sem_unlink. All this can be abstracted with boost in
a smart_remove or a safe_remove. Idea being to sem_unlink only when no
other process is using it.
If it doesn't looks like something of too much general value (though I
think it would be) I would atleast like to do this in my code, so any
pointers to relevant system calls will be really really helpful.
Thanks for all the great work done in interprocess.

ps. I figured something can be done using semctl/semget etc but they
need sem's set id as parameter. Haven't yet figure out how to find
that id for a posix named semaphore.

Sachin Garg

Sachin Garg

2008-07-31 20:31:10 UTC

Permalink

Ion,

I was wondering whats your take on this. Is it something that
can/should be added to boost or would you prefer that I just hack it
in my code only?

Sachin Garg

Post by Sachin Garg

Inteprocess is modeled after posix primitives, so there is no way to know if
someone is attached. Think about this as if the semaphore was a file. What
would you do if you are communicating two processes with a file and one
process crashes? I think you should have some keepalive mechanism to detect
that a process has died and recreate ipc mechanisms on failure.

This same problem happens with std::remove(const char *filename) (windows
version fails if the file is in use but unix version calls unlink and
removes that file from the filesystem without failing while attached
processes still write to that phantom file) but this is a difference I don't
know how to solve.

In general I see no general solution. You can't register cleanup actions
when a process crashes (well, the OS can, but not the user code). If anyone
has any idea about this, I would be glad to hear it.

The method discussed with Bob (in same thread), does that makes sense?
To programatically do what he proposes using commands.
I am not aware of system calls for this but it seems possible (ipcs
does this 'somehow') to find which process last used a semaphore and
then it can be checked if that process id is still alive, and only
then we can call sem_unlink. All this can be abstracted with boost in
a smart_remove or a safe_remove. Idea being to sem_unlink only when no
other process is using it.
If it doesn't looks like something of too much general value (though I
think it would be) I would atleast like to do this in my code, so any
pointers to relevant system calls will be really really helpful.
Thanks for all the great work done in interprocess.

ps. I figured something can be done using semctl/semget etc but they
need sem's set id as parameter. Haven't yet figure out how to find
that id for a posix named semaphore.
Sachin Garg

Ion Gaztañaga

2008-08-01 14:08:48 UTC

Permalink

Post by Sachin Garg
Ion,
I was wondering whats your take on this. Is it something that
can/should be added to boost or would you prefer that I just hack it
in my code only?
Sachin Garg

My opinion is that there is no solution without kernel help. The
original Interproces library (Shmem) emulated windows behaviour in Unix
and it was a nightmare to get consistent behaviour. This was changed in
Interprocess. The relationship between System V and POSIX resources is
quite obscure and I don't see a proper way to solve this.

The same problem with files and I haven't seen any clue to make Unix and
Windows behavior identical until today:

http://mg.to/2004/09/30/file_share_delete-in-shell-extension

According to this, adding FILE_SHARE_DELETE to the shared memory
emulation functions would allow, UNIX-like behavior for Windows files. I
haven't had time to test this.

This would not solve your problem, because you want Windows behavior
(failure when the resource is in use) in UNIX.

Regards,

Ion

Ion Gaztañaga

2008-08-02 08:18:19 UTC

Permalink

Post by Ion GaztaÃ±aga
The same problem with files and I haven't seen any clue to make Unix and
http://mg.to/2004/09/30/file_share_delete-in-shell-extension
According to this, adding FILE_SHARE_DELETE to the shared memory
emulation functions would allow, UNIX-like behavior for Windows files. I
haven't had time to test this.

I've just checked this, but it does not behave like unix. If you specify
FILE_SHARE_DELETE you DeleteFile returns success when the file when it's
in use (but the file it's still there in the explorer) and opening the
file after deletion fails. However, if you try to create another file
with the same name this also fails. So you can't just call "remove" and
recreate the file with the same name. That's a pity.

Regards,

Ion

Sachin Garg

2008-08-02 08:40:46 UTC

Permalink

Post by Ion GaztaÃ±aga

Post by Ion GaztaÃ±aga
The same problem with files and I haven't seen any clue to make Unix and
http://mg.to/2004/09/30/file_share_delete-in-shell-extension
According to this, adding FILE_SHARE_DELETE to the shared memory emulation
functions would allow, UNIX-like behavior for Windows files. I haven't had
time to test this.

I've just checked this, but it does not behave like unix. If you specify
FILE_SHARE_DELETE you DeleteFile returns success when the file when it's in
use (but the file it's still there in the explorer) and opening the file
after deletion fails. However, if you try to create another file with the
same name this also fails. So you can't just call "remove" and recreate the
file with the same name. That's a pity.

If you are at making behavior identical, I would much prefer the
windows way rather than posix way. Not a preference towards any
platform, just that the windows way seems to make more sense. Or maybe
two 'removes', one which works as it does now and other the
smart_remove. In case someone out there does prefer's posix way.

Of course, I don't have any answer as to how to get either done.

I have been trying to hack in Bob's solution in my code as it can work
atleast for me. But its painful as internal implementation of
named_semaphore is different on win/lin/mac, so both lin and mac will
need separate hacks, and then this is something that will need to be
carefully examined again every time boost is updated as internal
implementations may change is future. Sometimes I just wish things
were easier :-)

Sachin Garg

p***@agilent.com

2008-07-29 21:00:14 UTC

Permalink

Microsoft has a lot of faults but when comparing Windows API to the LINUX API I take windows every day.

I remember having to clean up shared memory segments by and then under some UNIX OS -- I guess it was SOLARIS.

I also remember that mmap()/munmap() (on SOLARIS) did not behave in a manner, which I would prefer as a C++ programmer, since multiple calls to mmap() could be undone by a single call to munmap(). I hope that this problem was considered when writing the memory mapped io features of boost. I guess you would need some static container of all pointers returned by mmap() and some reference count indicating how often the matching pointer was returned by mmap():

static std::map<void*, size_t> s_sMmap2RefCount;

I also remember the amount of code I wrote to hack around the UNIX feature of killing processes which write to a dead pipe.

I also remember the amount of code I wrote to get the errno from execvp() into the process which called fork().

I also remember having to write code in C instead of C++, since the code was supposed to be linked into a shared library which was intended to be dlopen-ed by some third party executable which in turn may or may not be loading the correct C++ library -- consider that UNIX knows only about a single namespaces for all symbols in a process.

I also remember that I could not write C++ code with a post-C++-Exception-Handling-style, since the matching compiler did not implement C++ Exception Handling correctly for a couple of years after this feature was already working on Windows and OS/2. The UNIX compiler did call destructors for memory locations for which no constructor had been called and vs. versa, they were forgetting to call destructors for initialized temporary objects.

I hate the UNIX API because I'm a C++ programmer.

Sachin Garg

2008-07-29 23:37:47 UTC

Permalink

Well, that atleast confirms I am not the only one banging my head on
the wall due to this :-)

I don't think the reference count solution can work here as process
crashes can leave reference count invalid. Any other possible solution
you might have implemented?

Sachin Garg

Post by p***@agilent.com
Microsoft has a lot of faults but when comparing Windows API to the LINUX API I take windows every day.
I remember having to clean up shared memory segments by and then under some UNIX OS -- I guess it was SOLARIS.
static std::map<void*, size_t> s_sMmap2RefCount;
I also remember the amount of code I wrote to hack around the UNIX feature of killing processes which write to a dead pipe.
I also remember the amount of code I wrote to get the errno from execvp() into the process which called fork().
I also remember having to write code in C instead of C++, since the code was supposed to be linked into a shared library which was intended to be dlopen-ed by some third party executable which in turn may or may not be loading the correct C++ library -- consider that UNIX knows only about a single namespaces for all symbols in a process.
I also remember that I could not write C++ code with a post-C++-Exception-Handling-style, since the matching compiler did not implement C++ Exception Handling correctly for a couple of years after this feature was already working on Windows and OS/2. The UNIX compiler did call destructors for memory locations for which no constructor had been called and vs. versa, they were forgetting to call destructors for initialized temporary objects.
I hate the UNIX API because I'm a C++ programmer.
_______________________________________________
Boost-users mailing list
http://lists.boost.org/mailman/listinfo.cgi/boost-users

Continue reading on narkive:

Search results for '[boost] [interprocess] Semaphore cleanup after crash' (Questions and Answers)

replies

system error (102)?

started 2006-09-10 02:27:39 UTC

desktops