[OpenSIPS-Users] usrloc restart persistency on seed node

Alexey Vasilyev alexei.vasilyev at gmail.com
Thu Jan 3 06:23:32 EST 2019


Hi everybody,

I like the approach, but here are some thoughts.

I think that X seconds delay should not pause all the opensips work. Just
to run asynchronously, allowing to process requests even before syncing
data.
For example, I use for syncyng from systemd "ExecStartPost" script. So it
runs, when opensips already started.
(And, by the way, John, be careful, don't run "ul_cluster_sync" when you
are starting "seed" node first, without running any another node. It makes
cluster "Not synced')

Lets imagine, "seed" node starts and find 2 nodes (or more), which one to
choose for syncing? And if they have different data (they were not synced
between each other), what should it do?

Thanks.

чт, 3 янв. 2019 г. в 11:33, Liviu Chircu <liviu at opensips.org>:

> Happy New Year John, Alexey and everyone else!
>
> I just finished catching up with this thread, and I must admit that I now
> concur with John's distaste of the asymmetric nature of cluster node
> restarts!
>
> Although it is correct and gets the job done, the 2.4 "seed" mechanism
> forces
> the admin to conditionally add a "opensipsctl fifo ul_cluster_sync" command
> into the startup script of all "seed" nodes.  I think we can do better :)
>
> What if we kept the "seed" concept, but tweaked it such that instead of
> meaning:
>
> "following a restart, always start in 'synced' state, with an empty
> dataset"
>
> ... it would now mean:
>
> "following a restart or cluster sync command, fall back to a 'synced'
> state,
> with an empty dataset if and only if we are unable to find a suitable sync
> candidate within X seconds"
>
> This solution seems to fit all requirements that I've seen posted so
> far.  It is:
>
> * correct (a cluster with at least 1 "seed" node will still never deadlock)
> * symmetric (with the exception of cluster bootstrapping, all node
> restarts are identical)
> * autonomous (users need not even know about "ul_cluster_sync" anymore!
> Not saying
>                this is necessarily good, but it brings down the learning
> curve)
>
> The only downside could be that any cluster bootstrap will now last at
> least X seconds.
> But that seems such a rare event (in production, at least) that we need
> not worry
> about it.  Furthermore, the X seconds will be configurable.
>
> What do you think?
>
> PS: by "cluster bootstrap" I mean (re)starting all nodes simultaneously.
>
> Best regards,
>
> Liviu Chircu
> OpenSIPS Developer
> http://www.opensips-solutions.com
>
> On 02.01.2019 12:24, John Quick wrote:
> > Alexey,
> >
> > Thanks for your feedback. I acknowledge that, in theory, a situation may
> > arise where a node is brought online and all the previously running nodes
> > were not fully synchronised so it is then a problem for the newly started
> > node to know which data set to pull. In addition to the example you give
> -
> > lost interconnection - I can also foresee difficulties when several nodes
> > all start at the same time. However, I do not see how arbitrarily setting
> > one node as "seed" will help to resolve either of these situations unless
> > the seed node has more (or better) information than the others.
> >
> > I am trying to design a multi-node solution that is scalable. I want to
> be
> > able to add and remove nodes according to current load. Also, to be able
> to
> > take one node offline, do some maintenance, then bring it back online.
> For
> > my scenario, the probability of any node being taken offline for
> maintenance
> > during the year is 99.9% whereas I would say the probability of partial
> loss
> > of LAN connectivity (causing the split-brain issue) is less than 0.01%.
> >
> > If possible, I would really like to see an option added to the usrloc
> module
> > to override the "seed" node concept. Something that allows any node
> > (including seed) to attempt to pull registration details from another
> node
> > on startup. In my scenario, a newly started node with no usrloc data is a
> > major problem - it could take it 40 minutes to get close to having a full
> > set of registration data. I would prefer to take the risk of it pulling
> data
> > from the wrong node rather than it not attempting to synchronise at all.
> >
> > Happy New Year to all.
> >
> > John Quick
> > Smartvox Limited
> >
> >
> >> Hi John,
> >>
> >> Next is just my opinion. And I didn't explore source code OpenSIPS for
> > syncing data.
> >> The problem is little bit deeper. As we have cluster, we potentially
> have
> > split-brain.
> >> We can disable seed node at all and just let nodes work after
> > disaster/restart. But it means that we can't guarantee consistency of
> data.
> > So nodes must show this with <Not in sync> state.
> >> Usually clusters use quorum to trust on. But for OpenSIPS I think this
> > approach is too expensive. And of course for quorum we need minimum 3
> hosts.
> >> For 2 hosts after loosing/restoring interconnection it is impossible to
> > say, which host has consistent data. That's why OpenSIPS uses seed node
> as
> > artificial trust point. I think <seed> node doesn't solve syncing
> problems,
> > but it simplifies total work.
> >> Let's imagine 3 nodes A,B,C. A is Active. A and B lost interconnection.
> C
> > is down. Then C is up and has 2 hosts for syncing. But A already has 200
> > phones re-registered for some reason. So we have 200 conflicts (on node B
> > the same phones still in memory). Where to sync from? <Seed> host will
> > answer this question in 2 cases (A or B). Of course if C is <seed> so it
> > just will be happy from the start. And I actually don't know what
> happens,
> > if we now run <ul_cluster_sync> on C. Will it get all the contacts from A
> > and B or not?
> >> We operate with specific data, which is temporary. So syncing policy
> can be
> > more relaxed. May be it's a good idea to connect somehow <seed> node with
> > Active role in the cluster. But again, if Active node restarts and still
> > Active - we will have a problem.
> >> -----
> >> Alexey Vasilyev
>


-- 
Best regards
Alexey Vasilyev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opensips.org/pipermail/users/attachments/20190103/676b6a69/attachment-0001.html>


More information about the Users mailing list