Discussion:
PJSIP for high scale SIP server
Matt Williams
2013-07-05 16:31:22 UTC
Permalink
Hi,

I'm working on Project Clearwater (http://www.projectclearwater.org/), an open source highly-scalable IMS (IP Multimedia Subsystem) implementation.

We're using PJSIP as our SIP stack. Most of the trails I've seen on the mailing list have been about using PJSIP for SIP clients, but is anyone using it (like us) server-side, e.g. for proxies or B2BUAs? Each instance of our "bono" edge proxy server supports 50000 incoming SIP/TCP connections (and the limitation we then hit is with Amazon AWS EC2, not the software itself), but we're unable to have more than one transport thread (i.e. running pjsip_endpt_handle_events). If we have more than one, we see crashes that seem to be related to concurrent accesses to shared data structures from multiple threads.

Does anyone have any experience of running multiple transport threads, or any pointers for using PJSIP at high scale? I'm happy to investigate more (and share crash dumps if that's useful), but wanted to check whether anyone else had seen this first.

Thanks,

Matt
Saúl Ibarra Corretgé
2013-07-08 10:21:37 UTC
Permalink
The Asterisk project is now using PJSIP as their SIP stack. It's right now in trunk and will be part of Asterisk 12.
Post by Matt Williams
Hi,
I'm working on Project Clearwater (http://www.projectclearwater.org/), an open source highly-scalable IMS (IP Multimedia Subsystem) implementation.
We're using PJSIP as our SIP stack. Most of the trails I've seen on the mailing list have been about using PJSIP for SIP clients, but is anyone using it (like us) server-side, e.g. for proxies or B2BUAs? Each instance of our "bono" edge proxy server supports 50000 incoming SIP/TCP connections (and the limitation we then hit is with Amazon AWS EC2, not the software itself), but we're unable to have more than one transport thread (i.e. running pjsip_endpt_handle_events). If we have more than one, we see crashes that seem to be related to concurrent accesses to shared data structures from multiple threads.
Does anyone have any experience of running multiple transport threads, or any pointers for using PJSIP at high scale? I'm happy to investigate more (and share crash dumps if that's useful), but wanted to check whether anyone else had seen this first.
Thanks,
Matt
_______________________________________________
Visit our blog: http://blog.pjsip.org
pjsip mailing list
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
--
Saúl Ibarra Corretgé
AG Projects
Matt Williams
2013-07-09 18:36:28 UTC
Permalink
Saul,

Thanks for the pointer!

I've pulled down and investigated the Asterisk code, but unfortunately it only uses a single transport thread for PJSIP. I might ask on their mailing list and see if they've looked at this aspect of scalability.

Thanks again,

Matt

-----Original Message-----
From: pjsip [mailto:pjsip-***@lists.pjsip.org] On Behalf Of Saúl Ibarra Corretgé
Sent: 08 July 2013 11:22
To: pjsip list
Subject: Re: [pjsip] PJSIP for high scale SIP server

The Asterisk project is now using PJSIP as their SIP stack. It's right now in trunk and will be part of Asterisk 12.
Post by Matt Williams
Hi,
I'm working on Project Clearwater (http://www.projectclearwater.org/), an open source highly-scalable IMS (IP Multimedia Subsystem) implementation.
We're using PJSIP as our SIP stack. Most of the trails I've seen on the mailing list have been about using PJSIP for SIP clients, but is anyone using it (like us) server-side, e.g. for proxies or B2BUAs? Each instance of our "bono" edge proxy server supports 50000 incoming SIP/TCP connections (and the limitation we then hit is with Amazon AWS EC2, not the software itself), but we're unable to have more than one transport thread (i.e. running pjsip_endpt_handle_events). If we have more than one, we see crashes that seem to be related to concurrent accesses to shared data structures from multiple threads.
Does anyone have any experience of running multiple transport threads, or any pointers for using PJSIP at high scale? I'm happy to investigate more (and share crash dumps if that's useful), but wanted to check whether anyone else had seen this first.
Thanks,
Matt
_______________________________________________
Visit our blog: http://blog.pjsip.org
pjsip mailing list
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
--
Saúl Ibarra Corretgé
AG Projects




_______________________________________________
Visit our blog: http://blog.pjsip.org

pjsip mailing list
***@lists.pjsip.org
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
Nishant Rodrigues
2013-07-15 19:50:18 UTC
Permalink
Not done it myself, but I think you need to have multiple threads call
the "pjsip_endpt_handle_events" function.

See: http://www.pjsip.org/pjsip/docs/html/group__PJSIP__ENDPT.htm#ga2fc6fbb56b269712776f22d02edb2f6c

On Wed, Jul 10, 2013 at 12:06 AM, Matt Williams
Post by Matt Williams
Saul,
Thanks for the pointer!
I've pulled down and investigated the Asterisk code, but unfortunately it only uses a single transport thread for PJSIP. I might ask on their mailing list and see if they've looked at this aspect of scalability.
Thanks again,
Matt
-----Original Message-----
Sent: 08 July 2013 11:22
To: pjsip list
Subject: Re: [pjsip] PJSIP for high scale SIP server
The Asterisk project is now using PJSIP as their SIP stack. It's right now in trunk and will be part of Asterisk 12.
Post by Matt Williams
Hi,
I'm working on Project Clearwater (http://www.projectclearwater.org/), an open source highly-scalable IMS (IP Multimedia Subsystem) implementation.
We're using PJSIP as our SIP stack. Most of the trails I've seen on the mailing list have been about using PJSIP for SIP clients, but is anyone using it (like us) server-side, e.g. for proxies or B2BUAs? Each instance of our "bono" edge proxy server supports 50000 incoming SIP/TCP connections (and the limitation we then hit is with Amazon AWS EC2, not the software itself), but we're unable to have more than one transport thread (i.e. running pjsip_endpt_handle_events). If we have more than one, we see crashes that seem to be related to concurrent accesses to shared data structures from multiple threads.
Does anyone have any experience of running multiple transport threads, or any pointers for using PJSIP at high scale? I'm happy to investigate more (and share crash dumps if that's useful), but wanted to check whether anyone else had seen this first.
Thanks,
Matt
_______________________________________________
Visit our blog: http://blog.pjsip.org
pjsip mailing list
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
--
Saúl Ibarra Corretgé
AG Projects
_______________________________________________
Visit our blog: http://blog.pjsip.org
pjsip mailing list
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
_______________________________________________
Visit our blog: http://blog.pjsip.org
pjsip mailing list
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
Gang Liu
2013-07-31 03:00:44 UTC
Permalink
Four years ago, I has a class 4 routing demo project which require to
handle 1000 CPS. I spent about one month to play with pjsip 0.9.0 to
implement a B2BUA which could handle more than 2000 Call Leg Per Second,
UDP transport. The beginning design was also use multiple pjsip worker
threads. It worked very well at lad. But it had some race condition/dead
lock when try to handle real traffic. I remember one deadlock case was
INVITE retransmission timer timeout hanling at one thread and at the same
time the other thread got 100 Trying packet from network. my solution
was offload
all CPU/IO bound processing logic to other threads and use only one thead
to call pjsip_endpt_handle_events() and all other pjsip funcs. I would like
to spend more time to trace but that project ended soon because of business
reason.

regards,
Gang
Hi,
I'm working on Project Clearwater (*http://www.projectclearwater.org/*<http://www.projectclearwater.org/>),
an open source highly-scalable IMS (IP Multimedia Subsystem) implementation.
We're using PJSIP as our SIP stack. Most of the trails I've seen on the
mailing list have been about using PJSIP for SIP clients, but is anyone
using it (like us) server-side, e.g. for proxies or B2BUAs? Each instance
of our "bono" edge proxy server supports 50000 incoming SIP/TCP connections
(and the limitation we then hit is with Amazon AWS EC2, not the software
itself), but we're unable to have more than one transport thread (i.e.
running pjsip_endpt_handle_events). If we have more than one, we see
crashes that seem to be related to concurrent accesses to shared data
structures from multiple threads.
Does anyone have any experience of running multiple transport threads, or
any pointers for using PJSIP at high scale? I'm happy to investigate more
(and share crash dumps if that's useful), but wanted to check whether
anyone else had seen this first.
Thanks,
Matt
_______________________________________________
Visit our blog: http://blog.pjsip.org
pjsip mailing list
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
Matt Williams
2013-08-01 08:04:32 UTC
Permalink
Gang,

Thanks for your response.

Your project sounded interesting - it's a shame it didn't continue. It's good to hear (in some ways) that we're not the only ones to hit this issue, and that you resolved them in the same way as we have.

I'll keep digging on the multi-threading issue - it would be good to be able to run multiple transport threads.

Thanks,

Matt

From: pjsip [mailto:pjsip-***@lists.pjsip.org] On Behalf Of Gang Liu
Sent: 31 July 2013 04:01
To: pjsip list
Subject: Re: [pjsip] PJSIP for high scale SIP server

Four years ago, I has a class 4 routing demo project which require to handle 1000 CPS. I spent about one month to play with pjsip 0.9.0 to implement a B2BUA which could handle more than 2000 Call Leg Per Second, UDP transport. The beginning design was also use multiple pjsip worker threads. It worked very well at lad. But it had some race condition/dead lock when try to handle real traffic. I remember one deadlock case was INVITE retransmission timer timeout hanling at one thread and at the same time the other thread got 100 Trying packet from network. my solution was offload all CPU/IO bound processing logic to other threads and use only one thead to call pjsip_endpt_handle_events() and all other pjsip funcs. I would like to spend more time to trace but that project ended soon because of business reason.

regards,
Gang
On Sat, Jul 6, 2013 at 12:31 AM, Matt Williams <***@metaswitch.com<mailto:***@metaswitch.com>> wrote:
Hi,

I'm working on Project Clearwater (http://www.projectclearwater.org/), an open source highly-scalable IMS (IP Multimedia Subsystem) implementation.

We're using PJSIP as our SIP stack. Most of the trails I've seen on the mailing list have been about using PJSIP for SIP clients, but is anyone using it (like us) server-side, e.g. for proxies or B2BUAs? Each instance of our "bono" edge proxy server supports 50000 incoming SIP/TCP connections (and the limitation we then hit is with Amazon AWS EC2, not the software itself), but we're unable to have more than one transport thread (i.e. running pjsip_endpt_handle_events). If we have more than one, we see crashes that seem to be related to concurrent accesses to shared data structures from multiple threads.

Does anyone have any experience of running multiple transport threads, or any pointers for using PJSIP at high scale? I'm happy to investigate more (and share crash dumps if that's useful), but wanted to check whether anyone else had seen this first.

Thanks,

Matt


_______________________________________________
Visit our blog: http://blog.pjsip.org

pjsip mailing list
***@lists.pjsip.org<mailto:***@lists.pjsip.org>
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
Dennis Guse
2013-08-01 12:29:39 UTC
Permalink
Asterisk is switching towards PJSIP with the next version 12 (tbd October).
Probably there is some experience with this kind of problem.

https://wiki.asterisk.org/wiki/display/AST/New+SIP+channel+driver
http://lists.digium.com/pipermail/asterisk-dev/2012-December/057997.html

---
Dennis Guse
Gang,****
** **
Thanks for your response.****
** **
Your project sounded interesting - it's a shame it didn't continue. It's
good to hear (in some ways) that we're not the only ones to hit this issue,
and that you resolved them in the same way as we have.****
** **
I'll keep digging on the multi-threading issue - it would be good to be
able to run multiple transport threads.****
** **
Thanks,****
** **
Matt****
** **
Liu
*Sent:* 31 July 2013 04:01
*To:* pjsip list
*Subject:* Re: [pjsip] PJSIP for high scale SIP server****
** **
Four years ago, I has a class 4 routing demo project which require to
handle 1000 CPS. I spent about one month to play with pjsip 0.9.0 to
implement a B2BUA which could handle more than 2000 Call Leg Per Second,
UDP transport. The beginning design was also use multiple pjsip worker
threads. It worked very well at lad. But it had some race condition/dead
lock when try to handle real traffic. I remember one deadlock case was
INVITE retransmission timer timeout hanling at one thread and at the same
time the other thread got 100 Trying packet from network. my solution was offload
all CPU/IO bound processing logic to other threads and use only one thead
to call pjsip_endpt_handle_events() and all other pjsip funcs. I would
like to spend more time to trace but that project ended soon because of
business reason.
regards,
Gang****
On Sat, Jul 6, 2013 at 12:31 AM, Matt Williams <
Hi,****
****
I'm working on Project Clearwater (http://www.projectclearwater.org/), an
open source highly-scalable IMS (IP Multimedia Subsystem) implementation.*
***
****
We're using PJSIP as our SIP stack. Most of the trails I've seen on the
mailing list have been about using PJSIP for SIP clients, but is anyone
using it (like us) server-side, e.g. for proxies or B2BUAs? Each instance
of our "bono" edge proxy server supports 50000 incoming SIP/TCP connections
(and the limitation we then hit is with Amazon AWS EC2, not the software
itself), but we're unable to have more than one transport thread (i.e.
running pjsip_endpt_handle_events). If we have more than one, we see
crashes that seem to be related to concurrent accesses to shared data
structures from multiple threads.****
****
Does anyone have any experience of running multiple transport threads, or
any pointers for using PJSIP at high scale? I'm happy to investigate more
(and share crash dumps if that's useful), but wanted to check whether
anyone else had seen this first.****
****
Thanks,****
****
Matt****
****
_______________________________________________
Visit our blog: http://blog.pjsip.org
pjsip mailing list
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org****
** **
_______________________________________________
Visit our blog: http://blog.pjsip.org
pjsip mailing list
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
Matt Williams
2013-08-01 13:13:39 UTC
Permalink
Dennis,

Thanks for your email.

Yes, I'd noticed that Asterisk was switching to PJSIP. Unfortunately, it only uses a single transport thread too - it seems that's the approach everyone uses.

Thanks again,

Matt

From: pjsip [mailto:pjsip-***@lists.pjsip.org] On Behalf Of Dennis Guse
Sent: 01 August 2013 13:30
To: pjsip list
Subject: Re: [pjsip] PJSIP for high scale SIP server

Asterisk is switching towards PJSIP with the next version 12 (tbd October).
Probably there is some experience with this kind of problem.

https://wiki.asterisk.org/wiki/display/AST/New+SIP+channel+driver
http://lists.digium.com/pipermail/asterisk-dev/2012-December/057997.html

---
Dennis Guse

On Thu, Aug 1, 2013 at 10:04 AM, Matt Williams <***@metaswitch.com<mailto:***@metaswitch.com>> wrote:
Gang,

Thanks for your response.

Your project sounded interesting - it's a shame it didn't continue. It's good to hear (in some ways) that we're not the only ones to hit this issue, and that you resolved them in the same way as we have.

I'll keep digging on the multi-threading issue - it would be good to be able to run multiple transport threads.

Thanks,

Matt

From: pjsip [mailto:pjsip-***@lists.pjsip.org<mailto:pjsip-***@lists.pjsip.org>] On Behalf Of Gang Liu
Sent: 31 July 2013 04:01
To: pjsip list
Subject: Re: [pjsip] PJSIP for high scale SIP server

Four years ago, I has a class 4 routing demo project which require to handle 1000 CPS. I spent about one month to play with pjsip 0.9.0 to implement a B2BUA which could handle more than 2000 Call Leg Per Second, UDP transport. The beginning design was also use multiple pjsip worker threads. It worked very well at lad. But it had some race condition/dead lock when try to handle real traffic. I remember one deadlock case was INVITE retransmission timer timeout hanling at one thread and at the same time the other thread got 100 Trying packet from network. my solution was offload all CPU/IO bound processing logic to other threads and use only one thead to call pjsip_endpt_handle_events() and all other pjsip funcs. I would like to spend more time to trace but that project ended soon because of business reason.

regards,
Gang
On Sat, Jul 6, 2013 at 12:31 AM, Matt Williams <***@metaswitch.com<mailto:***@metaswitch.com>> wrote:
Hi,

I'm working on Project Clearwater (http://www.projectclearwater.org/), an open source highly-scalable IMS (IP Multimedia Subsystem) implementation.

We're using PJSIP as our SIP stack. Most of the trails I've seen on the mailing list have been about using PJSIP for SIP clients, but is anyone using it (like us) server-side, e.g. for proxies or B2BUAs? Each instance of our "bono" edge proxy server supports 50000 incoming SIP/TCP connections (and the limitation we then hit is with Amazon AWS EC2, not the software itself), but we're unable to have more than one transport thread (i.e. running pjsip_endpt_handle_events). If we have more than one, we see crashes that seem to be related to concurrent accesses to shared data structures from multiple threads.

Does anyone have any experience of running multiple transport threads, or any pointers for using PJSIP at high scale? I'm happy to investigate more (and share crash dumps if that's useful), but wanted to check whether anyone else had seen this first.

Thanks,

Matt


_______________________________________________
Visit our blog: http://blog.pjsip.org

pjsip mailing list
***@lists.pjsip.org<mailto:***@lists.pjsip.org>
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org


_______________________________________________
Visit our blog: http://blog.pjsip.org

pjsip mailing list
***@lists.pjsip.org<mailto:***@lists.pjsip.org>
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
Gang Liu
2013-08-02 03:17:22 UTC
Permalink
for small, middle scale projects single transport thead is enough.

maybe it will be benefit if use multiple transport threads to handle 50000
TLS connections per pjsip endpoint.

regards,
Gang

On Thu, Aug 1, 2013 at 9:13 PM, Matt Williams
Dennis,****
** **
Thanks for your email.****
** **
Yes, I'd noticed that Asterisk was switching to PJSIP. Unfortunately, it
only uses a single transport thread too - it seems that's the approach
everyone uses.****
** **
Thanks again,****
** **
Matt****
** **
Guse
*Sent:* 01 August 2013 13:30
*To:* pjsip list
*Subject:* Re: [pjsip] PJSIP for high scale SIP server****
** **
Asterisk is switching towards PJSIP with the next version 12 (tbd October).
****
Probably there is some experience with this kind of problem.****
** **
https://wiki.asterisk.org/wiki/display/AST/New+SIP+channel+driver****
http://lists.digium.com/pipermail/asterisk-dev/2012-December/057997.html**
**
****
---
Dennis Guse****
** **
On Thu, Aug 1, 2013 at 10:04 AM, Matt Williams <
Gang,****
****
Thanks for your response.****
****
Your project sounded interesting - it's a shame it didn't continue. It's
good to hear (in some ways) that we're not the only ones to hit this issue,
and that you resolved them in the same way as we have.****
****
I'll keep digging on the multi-threading issue - it would be good to be
able to run multiple transport threads.****
****
Thanks,****
****
Matt****
****
Liu
*Sent:* 31 July 2013 04:01
*To:* pjsip list
*Subject:* Re: [pjsip] PJSIP for high scale SIP server****
****
Four years ago, I has a class 4 routing demo project which require to
handle 1000 CPS. I spent about one month to play with pjsip 0.9.0 to
implement a B2BUA which could handle more than 2000 Call Leg Per Second,
UDP transport. The beginning design was also use multiple pjsip worker
threads. It worked very well at lad. But it had some race condition/dead
lock when try to handle real traffic. I remember one deadlock case was
INVITE retransmission timer timeout hanling at one thread and at the same
time the other thread got 100 Trying packet from network. my solution was offload
all CPU/IO bound processing logic to other threads and use only one thead
to call pjsip_endpt_handle_events() and all other pjsip funcs. I would
like to spend more time to trace but that project ended soon because of
business reason.
regards,
Gang****
On Sat, Jul 6, 2013 at 12:31 AM, Matt Williams <
Hi,****
****
I'm working on Project Clearwater (http://www.projectclearwater.org/), an
open source highly-scalable IMS (IP Multimedia Subsystem) implementation.*
***
****
We're using PJSIP as our SIP stack. Most of the trails I've seen on the
mailing list have been about using PJSIP for SIP clients, but is anyone
using it (like us) server-side, e.g. for proxies or B2BUAs? Each instance
of our "bono" edge proxy server supports 50000 incoming SIP/TCP connections
(and the limitation we then hit is with Amazon AWS EC2, not the software
itself), but we're unable to have more than one transport thread (i.e.
running pjsip_endpt_handle_events). If we have more than one, we see
crashes that seem to be related to concurrent accesses to shared data
structures from multiple threads.****
****
Does anyone have any experience of running multiple transport threads, or
any pointers for using PJSIP at high scale? I'm happy to investigate more
(and share crash dumps if that's useful), but wanted to check whether
anyone else had seen this first.****
****
Thanks,****
****
Matt****
****
_______________________________________________
Visit our blog: http://blog.pjsip.org
pjsip mailing list
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org****
****
_______________________________________________
Visit our blog: http://blog.pjsip.org
pjsip mailing list
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org****
** **
_______________________________________________
Visit our blog: http://blog.pjsip.org
pjsip mailing list
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
Matt Williams
2013-08-02 10:21:44 UTC
Permalink
Gang,

Yes, we're definitely looking at high-scale here - we currently run with 50k TCP connections on one EC2 m1.small (single core). We're looking to scale up to 25M TCP connections total.

Because our architecture is stateless, we smoothly scale horizontally but having 500 nodes to manage is a bit of a headache, so the option to run on fewer larger (multi-core) machines would be nice. Unfortunately, we can't take advantage of multi-core machines because the transport thread itself uses a significant proportion of the total CPU (the process is a simple edge proxy, so the worker thread is fairly lightly-loaded).

Cheers,

Matt

From: pjsip [mailto:pjsip-***@lists.pjsip.org] On Behalf Of Gang Liu
Sent: 02 August 2013 04:17
To: pjsip list
Subject: Re: [pjsip] PJSIP for high scale SIP server

for small, middle scale projects single transport thead is enough.

maybe it will be benefit if use multiple transport threads to handle 50000 TLS connections per pjsip endpoint.

regards,
Gang
On Thu, Aug 1, 2013 at 9:13 PM, Matt Williams <***@metaswitch.com<mailto:***@metaswitch.com>> wrote:
Dennis,

Thanks for your email.

Yes, I'd noticed that Asterisk was switching to PJSIP. Unfortunately, it only uses a single transport thread too - it seems that's the approach everyone uses.

Thanks again,

Matt

From: pjsip [mailto:pjsip-***@lists.pjsip.org<mailto:pjsip-***@lists.pjsip.org>] On Behalf Of Dennis Guse
Sent: 01 August 2013 13:30

To: pjsip list
Subject: Re: [pjsip] PJSIP for high scale SIP server

Asterisk is switching towards PJSIP with the next version 12 (tbd October).
Probably there is some experience with this kind of problem.

https://wiki.asterisk.org/wiki/display/AST/New+SIP+channel+driver
http://lists.digium.com/pipermail/asterisk-dev/2012-December/057997.html

---
Dennis Guse

On Thu, Aug 1, 2013 at 10:04 AM, Matt Williams <***@metaswitch.com<mailto:***@metaswitch.com>> wrote:
Gang,

Thanks for your response.

Your project sounded interesting - it's a shame it didn't continue. It's good to hear (in some ways) that we're not the only ones to hit this issue, and that you resolved them in the same way as we have.

I'll keep digging on the multi-threading issue - it would be good to be able to run multiple transport threads.

Thanks,

Matt

From: pjsip [mailto:pjsip-***@lists.pjsip.org<mailto:pjsip-***@lists.pjsip.org>] On Behalf Of Gang Liu
Sent: 31 July 2013 04:01
To: pjsip list
Subject: Re: [pjsip] PJSIP for high scale SIP server

Four years ago, I has a class 4 routing demo project which require to handle 1000 CPS. I spent about one month to play with pjsip 0.9.0 to implement a B2BUA which could handle more than 2000 Call Leg Per Second, UDP transport. The beginning design was also use multiple pjsip worker threads. It worked very well at lad. But it had some race condition/dead lock when try to handle real traffic. I remember one deadlock case was INVITE retransmission timer timeout hanling at one thread and at the same time the other thread got 100 Trying packet from network. my solution was offload all CPU/IO bound processing logic to other threads and use only one thead to call pjsip_endpt_handle_events() and all other pjsip funcs. I would like to spend more time to trace but that project ended soon because of business reason.

regards,
Gang
On Sat, Jul 6, 2013 at 12:31 AM, Matt Williams <***@metaswitch.com<mailto:***@metaswitch.com>> wrote:
Hi,

I'm working on Project Clearwater (http://www.projectclearwater.org/), an open source highly-scalable IMS (IP Multimedia Subsystem) implementation.

We're using PJSIP as our SIP stack. Most of the trails I've seen on the mailing list have been about using PJSIP for SIP clients, but is anyone using it (like us) server-side, e.g. for proxies or B2BUAs? Each instance of our "bono" edge proxy server supports 50000 incoming SIP/TCP connections (and the limitation we then hit is with Amazon AWS EC2, not the software itself), but we're unable to have more than one transport thread (i.e. running pjsip_endpt_handle_events). If we have more than one, we see crashes that seem to be related to concurrent accesses to shared data structures from multiple threads.

Does anyone have any experience of running multiple transport threads, or any pointers for using PJSIP at high scale? I'm happy to investigate more (and share crash dumps if that's useful), but wanted to check whether anyone else had seen this first.

Thanks,

Matt


_______________________________________________
Visit our blog: http://blog.pjsip.org

pjsip mailing list
***@lists.pjsip.org<mailto:***@lists.pjsip.org>
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org


_______________________________________________
Visit our blog: http://blog.pjsip.org

pjsip mailing list
***@lists.pjsip.org<mailto:***@lists.pjsip.org>
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org


_______________________________________________
Visit our blog: http://blog.pjsip.org

pjsip mailing list
***@lists.pjsip.org<mailto:***@lists.pjsip.org>
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
Gang Liu
2013-08-07 04:54:37 UTC
Permalink
Matt,
Based on my understanding from sprout source code, only bono
instances need to handle many TCP connections because there are TCP
connection pools between bono and sprout.

I saw there are some worker threads managed by STACK module which
processing rx messages from cloned message queue. And pjsip thread is
calling pjsip_endpt_handle_events(polling timer head and ioqueue).
Did you mean transport thread is a pjsip thread defined by
static int pjsip_thread(void *p)
stack.cpp

If yes, this transport thread/pjsip thread is polling IOQUEUE and
timerheap. Because STACK module clones rx msgs to queue which processed by
worker thread later, so this transport thread actually is only working on
network I/O event(epoll) and sip message parsing(transport manager layer)
and timerheap.

I am wondering how many messages per second or transcations per
second bono(edge proxy) need to handling when 50k concurrent TCP conns
there? Which is the bottleneck, network event, sip parser or timerheap ?

Any guideline will be helpful use bono as a edge proxy before
kamailio/opensips. It will be more easy to do multiple transport threads
stress testing.

regards,
Gang

On Fri, Aug 2, 2013 at 6:21 PM, Matt Williams
Gang,****
** **
Yes, we're definitely looking at high-scale here - we currently run with
50k TCP connections on one EC2 m1.small (single core). We're looking to
scale up to 25M TCP connections total.****
** **
Because our architecture is stateless, we smoothly scale horizontally but
having 500 nodes to manage is a bit of a headache, so the option to run on
fewer larger (multi-core) machines would be nice. Unfortunately, we can't
take advantage of multi-core machines because the transport thread itself
uses a significant proportion of the total CPU (the process is a simple
edge proxy, so the worker thread is fairly lightly-loaded).****
** **
Cheers,****
** **
Matt****
** **
Liu
*Sent:* 02 August 2013 04:17
*To:* pjsip list
*Subject:* Re: [pjsip] PJSIP for high scale SIP server****
** **
for small, middle scale projects single transport thead is enough.
maybe it will be benefit if use multiple transport threads to handle 50000
TLS connections per pjsip endpoint.
regards,
Gang****
On Thu, Aug 1, 2013 at 9:13 PM, Matt Williams <
Dennis,****
****
Thanks for your email.****
****
Yes, I'd noticed that Asterisk was switching to PJSIP. Unfortunately, it
only uses a single transport thread too - it seems that's the approach
everyone uses.****
****
Thanks again,****
****
Matt****
****
Guse
*Sent:* 01 August 2013 13:30****
*To:* pjsip list
*Subject:* Re: [pjsip] PJSIP for high scale SIP server****
****
Asterisk is switching towards PJSIP with the next version 12 (tbd October).
****
Probably there is some experience with this kind of problem.****
****
https://wiki.asterisk.org/wiki/display/AST/New+SIP+channel+driver****
http://lists.digium.com/pipermail/asterisk-dev/2012-December/057997.html**
**
****
---
Dennis Guse****
****
On Thu, Aug 1, 2013 at 10:04 AM, Matt Williams <
Gang,****
****
Thanks for your response.****
****
Your project sounded interesting - it's a shame it didn't continue. It's
good to hear (in some ways) that we're not the only ones to hit this issue,
and that you resolved them in the same way as we have.****
****
I'll keep digging on the multi-threading issue - it would be good to be
able to run multiple transport threads.****
****
Thanks,****
****
Matt****
****
Liu
*Sent:* 31 July 2013 04:01
*To:* pjsip list
*Subject:* Re: [pjsip] PJSIP for high scale SIP server****
****
Four years ago, I has a class 4 routing demo project which require to
handle 1000 CPS. I spent about one month to play with pjsip 0.9.0 to
implement a B2BUA which could handle more than 2000 Call Leg Per Second,
UDP transport. The beginning design was also use multiple pjsip worker
threads. It worked very well at lad. But it had some race condition/dead
lock when try to handle real traffic. I remember one deadlock case was
INVITE retransmission timer timeout hanling at one thread and at the same
time the other thread got 100 Trying packet from network. my solution was offload
all CPU/IO bound processing logic to other threads and use only one thead
to call pjsip_endpt_handle_events() and all other pjsip funcs. I would
like to spend more time to trace but that project ended soon because of
business reason.
regards,
Gang****
On Sat, Jul 6, 2013 at 12:31 AM, Matt Williams <
Hi,****
****
I'm working on Project Clearwater (http://www.projectclearwater.org/), an
open source highly-scalable IMS (IP Multimedia Subsystem) implementation.*
***
****
We're using PJSIP as our SIP stack. Most of the trails I've seen on the
mailing list have been about using PJSIP for SIP clients, but is anyone
using it (like us) server-side, e.g. for proxies or B2BUAs? Each instance
of our "bono" edge proxy server supports 50000 incoming SIP/TCP connections
(and the limitation we then hit is with Amazon AWS EC2, not the software
itself), but we're unable to have more than one transport thread (i.e.
running pjsip_endpt_handle_events). If we have more than one, we see
crashes that seem to be related to concurrent accesses to shared data
structures from multiple threads.****
****
Does anyone have any experience of running multiple transport threads, or
any pointers for using PJSIP at high scale? I'm happy to investigate more
(and share crash dumps if that's useful), but wanted to check whether
anyone else had seen this first.****
****
Thanks,****
****
Matt****
****
_______________________________________________
Visit our blog: http://blog.pjsip.org
pjsip mailing list
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org****
****
_______________________________________________
Visit our blog: http://blog.pjsip.org
pjsip mailing list
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org****
****
_______________________________________________
Visit our blog: http://blog.pjsip.org
pjsip mailing list
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org****
** **
_______________________________________________
Visit our blog: http://blog.pjsip.org
pjsip mailing list
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
Matt Williams
2013-08-09 11:57:56 UTC
Permalink
Gang,

Thanks for your response.

Yes, you're right that only bono instances need to handle lots of TCP connections. It's not clear whether the TCP connections are the limiting factor, though, or whether it's the message rate. sprout nodes don't need to handle as many messages as bono as

* sprout nodes are only in the signaling path once, while bono nodes are in the signaling path twice (once on the calling party side, and once on the called party side)

* sprout drops out of the signaling path once the dialog is established, while bono stays in

* sprout nodes have a lot more non-transport work to do (e.g. querying the HSS, doing ENUM lookups), so the transport thread load is a smaller proportion of the total load.

Yes, the transport thread we run is the pjsip_thread you found - good spot!

With 50k TCP connections, I think we're looking at ~1.2k messages per second, but this is based on some rough calculations rather than metrics (which I'd like to add). I also haven't delved too far into where on this thread the bottleneck was - I've been approaching it from the perspective of whether we could run multiple transport threads - I appreciate that depending on where the bottleneck is, adding more threads might not solve the problem, though.

I'll do some more investigation - thanks for your input!

Cheers,

Matt

From: pjsip [mailto:pjsip-***@lists.pjsip.org] On Behalf Of Gang Liu
Sent: 07 August 2013 05:55
To: pjsip list
Subject: Re: [pjsip] PJSIP for high scale SIP server

Matt,
Based on my understanding from sprout source code, only bono instances need to handle many TCP connections because there are TCP connection pools between bono and sprout.

I saw there are some worker threads managed by STACK module which processing rx messages from cloned message queue. And pjsip thread is calling pjsip_endpt_handle_events(polling timer head and ioqueue).
Did you mean transport thread is a pjsip thread defined by
static int pjsip_thread(void *p) stack.cpp

If yes, this transport thread/pjsip thread is polling IOQUEUE and timerheap. Because STACK module clones rx msgs to queue which processed by worker thread later, so this transport thread actually is only working on network I/O event(epoll) and sip message parsing(transport manager layer) and timerheap.

I am wondering how many messages per second or transcations per second bono(edge proxy) need to handling when 50k concurrent TCP conns there? Which is the bottleneck, network event, sip parser or timerheap ?

Any guideline will be helpful use bono as a edge proxy before kamailio/opensips. It will be more easy to do multiple transport threads stress testing.

regards,
Gang
On Fri, Aug 2, 2013 at 6:21 PM, Matt Williams <***@metaswitch.com<mailto:***@metaswitch.com>> wrote:
Gang,

Yes, we're definitely looking at high-scale here - we currently run with 50k TCP connections on one EC2 m1.small (single core). We're looking to scale up to 25M TCP connections total.

Because our architecture is stateless, we smoothly scale horizontally but having 500 nodes to manage is a bit of a headache, so the option to run on fewer larger (multi-core) machines would be nice. Unfortunately, we can't take advantage of multi-core machines because the transport thread itself uses a significant proportion of the total CPU (the process is a simple edge proxy, so the worker thread is fairly lightly-loaded).

Cheers,

Matt

From: pjsip [mailto:pjsip-***@lists.pjsip.org<mailto:pjsip-***@lists.pjsip.org>] On Behalf Of Gang Liu
Sent: 02 August 2013 04:17

To: pjsip list
Subject: Re: [pjsip] PJSIP for high scale SIP server

for small, middle scale projects single transport thead is enough.

maybe it will be benefit if use multiple transport threads to handle 50000 TLS connections per pjsip endpoint.

regards,
Gang
On Thu, Aug 1, 2013 at 9:13 PM, Matt Williams <***@metaswitch.com<mailto:***@metaswitch.com>> wrote:
Dennis,

Thanks for your email.

Yes, I'd noticed that Asterisk was switching to PJSIP. Unfortunately, it only uses a single transport thread too - it seems that's the approach everyone uses.

Thanks again,

Matt

From: pjsip [mailto:pjsip-***@lists.pjsip.org<mailto:pjsip-***@lists.pjsip.org>] On Behalf Of Dennis Guse
Sent: 01 August 2013 13:30

To: pjsip list
Subject: Re: [pjsip] PJSIP for high scale SIP server

Asterisk is switching towards PJSIP with the next version 12 (tbd October).
Probably there is some experience with this kind of problem.

https://wiki.asterisk.org/wiki/display/AST/New+SIP+channel+driver
http://lists.digium.com/pipermail/asterisk-dev/2012-December/057997.html

---
Dennis Guse

On Thu, Aug 1, 2013 at 10:04 AM, Matt Williams <***@metaswitch.com<mailto:***@metaswitch.com>> wrote:
Gang,

Thanks for your response.

Your project sounded interesting - it's a shame it didn't continue. It's good to hear (in some ways) that we're not the only ones to hit this issue, and that you resolved them in the same way as we have.

I'll keep digging on the multi-threading issue - it would be good to be able to run multiple transport threads.

Thanks,

Matt

From: pjsip [mailto:pjsip-***@lists.pjsip.org<mailto:pjsip-***@lists.pjsip.org>] On Behalf Of Gang Liu
Sent: 31 July 2013 04:01
To: pjsip list
Subject: Re: [pjsip] PJSIP for high scale SIP server

Four years ago, I has a class 4 routing demo project which require to handle 1000 CPS. I spent about one month to play with pjsip 0.9.0 to implement a B2BUA which could handle more than 2000 Call Leg Per Second, UDP transport. The beginning design was also use multiple pjsip worker threads. It worked very well at lad. But it had some race condition/dead lock when try to handle real traffic. I remember one deadlock case was INVITE retransmission timer timeout hanling at one thread and at the same time the other thread got 100 Trying packet from network. my solution was offload all CPU/IO bound processing logic to other threads and use only one thead to call pjsip_endpt_handle_events() and all other pjsip funcs. I would like to spend more time to trace but that project ended soon because of business reason.

regards,
Gang
On Sat, Jul 6, 2013 at 12:31 AM, Matt Williams <***@metaswitch.com<mailto:***@metaswitch.com>> wrote:
Hi,

I'm working on Project Clearwater (http://www.projectclearwater.org/), an open source highly-scalable IMS (IP Multimedia Subsystem) implementation.

We're using PJSIP as our SIP stack. Most of the trails I've seen on the mailing list have been about using PJSIP for SIP clients, but is anyone using it (like us) server-side, e.g. for proxies or B2BUAs? Each instance of our "bono" edge proxy server supports 50000 incoming SIP/TCP connections (and the limitation we then hit is with Amazon AWS EC2, not the software itself), but we're unable to have more than one transport thread (i.e. running pjsip_endpt_handle_events). If we have more than one, we see crashes that seem to be related to concurrent accesses to shared data structures from multiple threads.

Does anyone have any experience of running multiple transport threads, or any pointers for using PJSIP at high scale? I'm happy to investigate more (and share crash dumps if that's useful), but wanted to check whether anyone else had seen this first.

Thanks,

Matt


_______________________________________________
Visit our blog: http://blog.pjsip.org

pjsip mailing list
***@lists.pjsip.org<mailto:***@lists.pjsip.org>
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org


_______________________________________________
Visit our blog: http://blog.pjsip.org

pjsip mailing list
***@lists.pjsip.org<mailto:***@lists.pjsip.org>
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org


_______________________________________________
Visit our blog: http://blog.pjsip.org

pjsip mailing list
***@lists.pjsip.org<mailto:***@lists.pjsip.org>
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org


_______________________________________________
Visit our blog: http://blog.pjsip.org

pjsip mailing list
***@lists.pjsip.org<mailto:***@lists.pjsip.org>
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org

Loading...