| 
  |  |  
  
  | 
      
NOTE:This is a read only archive of threads posted to the FreeRTOS support forum.  Use these archive pages to search previous posts.  New forum support threads can be started at the FreeRTOS forums .   FreeRTOS Support ArchiveThe FreeRTOS support forum can be used for active support both from Amazon Web Services and the community. In return for using our software for free, we request you play fair and do your bit to help others! Sign up for an account and receive notifications of new support topics then help where you can. 
   
 
This is a read only archive of threads posted to the FreeRTOS support forum.  Use these archive pages to search previous posts.  New forum support threads can be started at the FreeRTOS forums.     
    [FreeRTOS Home]  
      [Live FreeRTOS Forum]  
      [FAQ]  
              [Archive Top]  
                    [January 2018 Threads]
 BufferAllocation_1.c xFreeBuffersList memory corruption issuePosted by tc-cadscan  on January 26, 2018I’ve written a zero-copy network driver for FreeRTOS+TCP on STM32F4. For the most part it works well, but every once in a while I get an issue with pxGetNetworkBufferDescriptor returning NULL. I’m calling it with xBlockTimeTicks set to portMAX_DELAY, so in theory this should never happen. Looking through the source code, the buffer descriptors are kept in a list called xFreeBuffersList, and are fetched by this line:
~~~
pxReturn = ( NetworkBufferDescriptort * ) listGET OWNEROF HEAD_ENTRY( &xFreeBuffersList );
~~~
Following this, some validation is performed on pxReturn to make sure it is a valid descriptor, and if not, return NULL. This seems to be where it’s failing:
~~~
if( ( bIsValidNetworkDescriptor( pxReturn ) != pdFALSEUNSIGNED ) &&
                    listIS CONTAINED_WITHIN( &xFreeBuffersList, &( pxReturn->xBufferListItem ) ) )
~~~
So it’s basically looking up the ListItemt associated with the network descriptor, and verifying that it is part of xFreeBuffersList. listIS CONTAINEDWITHIN just compares the pvContainer field of the ListItem t to the list pointer (xFreeBuffersList in this case). The problem seems to be that pvContainer is set to NULL. As this should get set when the descriptor is inserted into the list (i.e. when it was last released), and the descriptor is definitely getting inserted into the list correctly (it must be, given that we just got it from the list), this makes me think the list structure is getting corrupted in memory somehow.
A couple of times I’ve also seen a different issue, where listGETOWNER OFHEAD ENTRY returns some random value which isn’t a buffer descriptor. Whenever this happened, I tracked it down to fields in the list structure being invalid (sometimes NULL or sometimes random values), which also suggests memory corruption to me.
There are only 5 places in my code where I call into BufferAllocation_1.c:
 
During xNetworkInterfaceInitialise, I call pxGetNetworkBufferDescriptor multiple times to allocate buffers to the receive DMA descriptors.During xNetworkInterfaceOutput, I assign the buffer passed in by the TCP/IP stack to a transmit DMA descriptor. If this fails (due to a DMA descriptor not being available within a given timeout), I call vReleaseNetworkBufferAndDescriptor to discard the buffer.During the receive deferred interrupt processing task, I attempt to send the received buffer to the TCP/IP stack. If this fails, I call vReleaseNetworkBufferAndDescriptor to discard the buffer.Also in the deferred processing task, I call pxGetNetworkBufferWithDescriptor to replace the receive buffer that was just used.In the transmit complete interrupt handler. I call vNetworkBufferReleaseFromISR to release the transmit buffer. 
In all of these cases, I’m calling the buffer allocation functions exactly as described in the documentation, so I don’t think this is the issue. Whenever I have to deal with a buffer in an ISR, I make sure to only call the interrupt-safe functions. Checking in BufferAllocation1.c, all of the functions that interact with the descriptor list use ipconfigBUFFER ALLOCLOCK and ipconfigBUFFER ALLOC_UNLOCK (or the interrupt-safe equivalents), so I’m not sure where the problem is happening.
I’ve written a simple test application to reproduce the issue – it basically just listens for UDP packets on port 10000 and echoes them back to the sender. I’m setting the FREERTOSZERO COPY flag to reduce unnecessary buffer allocation/deallocation, to rule that out. This is the only task on the system which uses the TCP stack:
~~~
static void echoThread(void *pvParameters)
{
    const uint16_t port = 10000;
 Socket_t sock = FreeRTOS_socket(FREERTOS_AF_INET, FREERTOS_SOCK_DGRAM, FREERTOS_IPPROTO_UDP);
CAD_ASSERT(sock != FREERTOS_INVALID_SOCKET);
struct freertos_sockaddr hostAddr;
hostAddr.sin_port = FreeRTOS_htons(port);
FreeRTOS_bind(sock, &hostAddr, sizeof(hostAddr));
struct freertos_sockaddr clientAddr;
uint32_t clientAddrLength = sizeof(clientAddr);
uint8_t *buffer;
int32_t length;
while (1)
{
    length = FreeRTOS_recvfrom(sock, &buffer, 0, FREERTOS_ZERO_COPY, &clientAddr, &clientAddrLength);
    if (length > 0)
    {
        length = FreeRTOS_sendto(sock, buffer, length, FREERTOS_ZERO_COPY, &clientAddr, clientAddrLength);
    }
    if (length <= 0)
    {
        FreeRTOS_ReleaseUDPPayloadBuffer(buffer);
    }
}
 
}
~~~
The issue only shows up very rarely – probably once in every 100,000 to a million packets. The only reliable way I’ve found to reproduce it is to flood the network interface with traffic, at a rate of several thousand packets/second. Even then it’s difficult to reproduce, sometimes taking several minutes. If it helps at all, I’ve found that it tends to show up most often just after starting to send packets – stopping and then restarting the packet generator is more likely to trigger it than just leaving it running.
I did find one other post on here (https://freertos.org/FreeRTOSSupport ForumArchive/August 2016/freertosTCP BufferAllocation1 xFreeBuffersListcorruption bab400aej.html) with a similar issue. In that case it was tracked down to stack overflow (identified by the 0xA5A5A5A5 sequence), but in my case I don’t see that sequence anywhere so I doubt it’s stack overflow. Also if it was stack overflow I’d expect it to happen a lot more frequently and less randomly than it is doing.
At this point I’m not sure how I can debug the issue any further, as the problem seems to be happening within the TCP stack itself. Any ideas?
Thanks,
Tom
BufferAllocation_1.c xFreeBuffersList memory corruption issuePosted by rtel  on January 29, 2018Hmm, tricky to see in this as you seem to have done a good level of debugging already.
Is the driver code all yours?  Or are you also relying on some STM32 library/cube functions?  If you are using STM32 library functions, do any of them mess with the basepri register?  That could potentially effect critical sections.
Also, how are you passing received buffers into the TCP/IP stack?  Is it from an interrupt?  If so, are you then trying to allocated a new buffer from the interrupt too?
Are you using a recent version of FreeRTOS with configASSERT() defined?  FreeRTOS V10.0.1 would be recommended as that has asserts for just about all the interrupt priority configuration errors people make.
BufferAllocation_1.c xFreeBuffersList memory corruption issuePosted by tc-cadscan  on January 29, 2018We’re not using any of the STM32 library code, all the driver code has been written from scratch. Nothing in my code touches BASEPRI.
We’re using FreeRTOS v9 as we started work on the project just before v10 was released and haven’t got round to upgrading yet, but the TCP/IP stack code was taken from the v10 release. configASSERT is defined but isn’t failing anywhere.
Receive buffers get passed back to the stack from a deferred processing task, not from the interrupt. The only buffer management that happens in an ISR is releasing the transmit buffers after the transmit complete interrupt. One thing I’ve just noticed – vNetworkBufferReleaseFromISR calls vListInsertEnd to put the buffer back into the free buffers list. Is this safe to call from an ISR? (It’s wrapped inside portSETINTERRUPTMASKFROMISR/portCLEARINTERRUPTMASKFROMISR, but I’m not sure if this is adequate to make a non-interrupt safe function interrupt safe).
I’ll try releasing the transmit buffers from the deferred processing task instead of directly from the interrupt, and see if that makes any difference.
BufferAllocation_1.c xFreeBuffersList memory corruption issuePosted by rtel  on January 29, 2018
  One thing I’ve just noticed – vNetworkBufferReleaseFromISR 
  calls vListInsertEnd to put the buffer back into the free buffers list. 
  Is this safe to call from an ISR? (It’s wrapped inside 
  portSETINTERRUPTMASKFROMISR/portCLEARINTERRUPTMASKFROMISR, but 
  I’m not sure if this is adequate to make a non-interrupt safe function 
  interrupt safe).
 
That should be ok.
 
  I’ll try releasing the transmit buffers from the deferred processing 
  task instead of directly from the interrupt, and see if that makes any 
  difference.
 
Let us know if that solves the problem.
BufferAllocation_1.c xFreeBuffersList memory corruption issuePosted by heinbali01  on January 29, 2018Hi Tom,
 
  I’ve written a zero-copy network driver for FreeRTOS+TCP on STM32F4
 
That is very brave of you, but there was a zero-copy driver already.  It is not in the official distribution yet, but it was published in this list.  I will post it as an attachment here below ( STM32Fxx.zip).
 
  but every once in a while I get an issue with pxGetNetworkBufferDescriptor returning NULL.
 
I solved this problem by dropping a packet.  When a packet has been received, then you need to provide a new empty network buffer.  If that is impossible, I would drop the received packet, and leave the buffer in place so it will be used to receive a next package.
 
  I’m calling it with xBlockTimeTicks set to portMAX_DELAY
 
Are you sure it is wise such a long time?  Does this happen from  prvEMACHandlerTask() ?
When the function is called from the IP-task ( e.g. from  xNetworkInterfaceOutput() ), you should use a time-out of zero. If not, the IP-task will probably be waiting for the IP-task to release some network buffer.
 
     During xNetworkInterfaceInitialise, I call pxGetNetworkBufferDescriptor multiple
       times to allocate buffers to the receive DMA descriptors.
 
Comment : during initialisation, there must be enough network buffers.  You can use  configASSERT() for this.  If it is triggered, the is a configuration error.
 
     During xNetworkInterfaceOutput, I assign the buffer passed in by the TCP/IP stack
       to a transmit DMA descriptor. If this fails (due to a DMA descriptor not being
       available within a given timeout), I call vReleaseNetworkBufferAndDescriptor to
       discard the buffer.
 
This function has lead to some confusion in the past.
To summarise:
If  ipconfigZERO_COPY_TX_DRIVER = 1, the parameter  bReleaseAfterSend is always true
If  bReleaseAfterSend is true, it means that the driver will own the  pxDescriptor.
So yes, if sending fails, you must release the network buffer.
If sending succeeds, the releasing of the buffer is suspended.
In my zero-copy driver, the releasing is indeed done within  prvEMACHandlerTask().
I stop commenting here.
Please have a look at the attached driver.  It has been tested on both STM32F4 and STM32F7.
Regards.
BufferAllocation_1.c xFreeBuffersList memory corruption issuePosted by tc-cadscan  on January 30, 2018
  That is very brave of you, but there was a zero-copy driver already. It is not in the official distribution yet, but it was published in this list. I will post it as an attachment here below (STM32Fxx.zip).
 
Thanks Hein, I’ll take a look at that and see how it differs from my code. From a quick glance, it seems like the buffer management is mostly the same, but I’m doing some initial filtering in the ISR to reduce context switches from dropped packets.
 
  I solved this problem by dropping a packet. When a packet has been received, then you need to provide a new empty network buffer. If that is impossible, I would drop the received packet, and leave the buffer in place so it will be used to receive a next package.
 
I don’t think this will solve my issue, which I think is down to the buffer list becoming corrupted, rather than a timeout (as noted, I’m calling pxGetNetworkBufferAndDescriptor with portMAX_DELAY). Once it gets in this state, every call to pxGetNetworkBufferAndDescriptor returns NULL immediately (i.e. not due to timeout), so without fixing the underlying issue this would just cause the receive process to drop packets forever.
 
  Are you sure it is wise such a long time? Does this happen from prvEMACHandlerTask() ?
  When the function is called from the IP-task ( e.g. from xNetworkInterfaceOutput() ), you should use a time-out of zero. If not, the IP-task will probably be waiting for the IP-task to release some network buffer.
 
Yes, this is only done from the receive task, never from anything called by the IP task. I don’t ever allocate buffers in xNetworkInterfaceOutput, I only use the buffer that’s passed in as a parameter.
My reason for using an infinite block time is that, once the receive buffer has been given back to the IP task, the new buffer must  be allocated to the receive DMA, regardless of how long it takes. Otherwise the receive DMA will be gradually depleted of buffers. However I agree with you that it’s better to try to get the replacement buffer first, and if that fails then drop the packet and reuse the existing buffer – I’ll change my code to do that.
BufferAllocation_1.c xFreeBuffersList memory corruption issuePosted by heinbali01  on January 30, 2018Hi Tom,
 
  but I’m doing some initial filtering in the ISR to reduce context switches from dropped packets.
 
Maybe you want to share some code?
 
  I don’t think this will solve my issue, which I think is down to the buffer list becoming corrupted
 
I don’t know where the corruption comes from, but I doubt that there is a problem with  BufferAllocation_1.c.
 
  My reason for using an infinite block time is that, once the receive buffer has been
  given back to the IP task, the new buffer must be allocated to the receive DMA,
  regardless of how long it takes.
 
Here I do not agree.  My choice was to drop the packet received ( do not forward it to the IP-task ), and leave the buffer in places.
 
  Otherwise the receive DMA will be gradually depleted of buffers.
 
True, that is why I would not swap the buffer.
 
  I’ll change my code to do that.
 
Yes good.  But please check if it really happens and how often.
Regards, Hein
         
    Copyright (C) Amazon Web Services, Inc. or its affiliates. All rights reserved. |  |