Thread race condition on Windows port?

I’ve got a lot of trouble getting my application to run on Windows. The threads go haywire or deadlock, and the result is different depending whether or not I look at them in Sysinternals Process Explorer. Very bizarre. Anyway, while debugging I did find what I think is a potential bug. In prvProcessSimulatedInterrupts(), there is a piece of code to switch threads: ~~~~~ /* Suspend the old thread. */ pxThreadState = ( xThreadState *) *( ( size_t * ) pvOldCurrentTCB ); SuspendThread( pxThreadState->pvThread ); /* Obtain the state of the task now selected to enter the Running state. */ pxThreadState = ( xThreadState * ) ( *( size_t *) pxCurrentTCB ); ResumeThread( pxThreadState->pvThread ); ~~~~~ I think inbetween these two there needs to be a call to GetThreadContext, as described on the given URL. ~~~~~ CONTEXT context; BOOL success; /* Suspend the old thread. */ pxThreadState = ( xThreadState *) *( ( size_t * ) pvOldCurrentTCB ); SuspendThread( pxThreadState->pvThread ); // Ensure the thread is really suspended by calling an operation on it // that only returns once once it is suspended. // See http://blogs.msdn.com/b/oldnewthing/archive/2015/02/05/10591215.aspx context.ContextFlags = CONTEXT_INTEGER; // Just ask some dummy register data success = GetThreadContext( pxThreadState->pvThread, &context ); configASSERT( success ); /* Obtain the state of the task now selected to enter the Running state. */ pxThreadState = ( xThreadState * ) ( *( size_t *) pxCurrentTCB ); ResumeThread( pxThreadState->pvThread ); ~~~~~

Thread race condition on Windows port?

Interesting – thanks for this nugget of information and link. We have used the Windows port to run very comprehensive tests over many days without any issues (see caveat below), but there are so many possible combinations of operating system, CPU models, caches, number of cores, etc. it is not surprising that issues could arise that we have not seen. The above mentioned caveat is when Windows IO system calls are used. For example, if you call printf() from a FreeRTOS task (when using the FreeRTOS Windows port), or attempt to use the Windows TCP/IP stack, etc., then eventually under heavy load it will fall flat on its face. Its possible your post is highlighting the reason for this. You may find the standard FreeRTOS Windows demo is calling printf() directly, and under low load it gets away with it. The FreeRTOS+TCP and FreeRTOS+FAT demos however generate a much higher volume of logging information, and cannot output this directly without causing trouble. In the FreeRTOS+TCP demo there are options to log to a UDP port using the FreeRTOS+UDP stack itself, to a disk file, or to stdout. The former is no problem, as it only uses FreeRTOS calls, but the disk file and stdout will cause an issue if these outputs are accessed directly from a FreeRTOS task. To get around this we have the FreeRTOS tasks send the log messages to a thread safe circular buffer, and have a standard Windows thread (one that is not under the control of the FreeRTOS scheduler) read the log message from the buffer and perform the actual output (printf() or write to disk). That way the log messages get output without the FreeRTOS tasks generating any Windows IO system calls themselves. Regards.

Thread race condition on Windows port?

Thank you for your feedback. Actually there are two issues in my original post, so I’ll try to keep them apart for clarity.
  1. The crashes So far I’m not even doing any printfs from the crosscompiled embedded FreeRTOS program. I’m just trying to run an existing embedded application that writes to an LCD. So far on Windows I only modelled the LCD as a data buffer without any observer. Still the thread mechanism is doing bizarre stuff, I wonder if it’s an interaction with the MSVC debugger. Anyway, I’ll have to sort this out by myself but this is not the main reason I posted. For info: I’m using a Win7 64bit quadcore machine. Update: now that I think of it there is a file system layer somewhere that might be reading a data file, so thanks for the tip, I’ll see if this might be the culprit!
  2. The proposed patch to the source code. During my debugging of 1. I think I did find a real issue with the Windows port.c code you might want to backport into the source tree. See the URL for a description why SuspendThread only might not be enough. Unless some other scheduler internal condition with which I’m not familiar renders it safe to resume the next thread while SuspendThread might still leave the previous thread briefly running.

Thread race condition on Windows port?

We are using very similar machines then. Let me know if the file system layer turns out to be the cause of your problem or not. Regards.

Thread race condition on Windows port?

Hi Hendrik, Not only the use of system I/O functions (files, streams, TCP/IP) may lead to instability, but any Windows system call that may possibly block will cause problems.
Unless some other scheduler internal condition with which I’m not familiar renders it safe to resume the next thread while SuspendThread might still leave the previous thread briefly running.
I haven’t tried-out your patch yet, it might help. But it might also disturb the timing of the scheduler in case GetThreadContext() blocks for a long time? In ./demo_logging.c you will see that regular stdio functions are used. Also networkInterface.c is doing system calls. That is possible because the system calls originate from normal Windows threads:
DWORD WINAPI prvWin32LoggingThread( void *pvParam )
DWORD WINAPI prvWinPcapRecvThread( void *pvParam )
DWORD WINAPI prvWinPcapSendThread( void *pvParam )
The communication/synchronisation between these Windows threads and FreeRTOS tasks is done with care. A FreeRTOS task may call SetEvent( pvSendEvent ), but it may not call WaitForSingleObject( pvSendEvent, time ), because the latter may block. Likewise, a real Windows thread shall not make use of FreeRTOS queues or FreeRTOS blocking calls. Regards.

Thread race condition on Windows port?

Is there any news to this problem? I seem to have stumbled upon a similar problem, and the patch above seems not to work. By now I have added a Windows Critical Section around the SuspendThread() and ResumeThread() statements mentioned above. The Critical Section is entered for any memory allocation / release as well as any Windows TCP performed by FreeRTOS threads. Also I disabled all printf() and std::cerr/cout in FreeRTOS threads. However the application still stops responding pretty soon when I start to do I/O… What else might cause problems? What about malloc() and C++ ‘new / delete’, is it actually necessary to protect those? Has actually ANYBODY ever successfully coupled FreeRTOS code with additional Windows threads?

Thread race condition on Windows port?

Yes – we do very comprehensive development and testing under Windows – including the FreeRTOS+TCP stack. However, to do this all Windows IO system calls have to be banished from threads that are under the control of FreeRTOS. Presumaby as described above (I’ve not read through the thread again) this is achieved by keeping IO in separate Windows threads, then using circual buffers to have the FreeRTOS and non-FreeRTOS threads communicate.

Thread race condition on Windows port?

In addition to what Richard commented, maybe some more information to get a feeling of the problem:
Has actually ANYBODY ever successfully coupled FreeRTOS code with additional Windows threads?
Have you looked in detail how the FreeRTOS tasks are implemented? A task is indeed a (Windows) thread. Here in port.c you see a task switch: ~~~~ SuspendThread( pxThreadState->pvThread ); /* change the current ‘pvThread’ */ ResumeThread( pxThreadState->pvThread ); ~~~~ The scheduler has decided that a new task (thread) must be come active. It calls SuspendThread() for the old task and it calls ResumeThread() for the new task. But imagine what happens if the old task was in the middle of a system call. It won’t get suspended. Hence: two tasks will be running simultaneously, something that is fatal for FreeRTOS. One of the consequences of the above is that it is very hard to synchronise FreeRTOS tasks with Windows threads. We tested and tried a long time to see if there is any solution for this, but I’m afraid there isn’t. If your task must do any W32 system call, implement it as a normal thread. If your task is part of the FreeRTOS application, don’t let it call any W32 system call. And if a task must cooperate with a W32 thread, use sync methods that are neither W32, nor FreeRTOS. WaitForMultipleObjects (etc) is in the W32 domain xSemaphoreTake (and all other) are in the FreeRTOS domain. Have a look at this file:
DemoFreeRTOS_Plus_TCP_and_FAT_Windows_Simulatordemo_logging.c
to see what I mean. In this module, all FreeRTOS tasks can send logging to the console. Of course it can not call the normal printf(). It will queue the logging strings and hand them over to a genuine Windows thread. If you know of ways to improve the cooperation between the W32 threads and FreeRTOS tasks, please tell. Regards.