Forum is extremely slow at the moment.

Bill Stuntz

Registered User
Apr 6, 2012
5,022
68
48
74
Columbus. OH
Country
Region
I also confirm: Down 3:19, Up at 3:38 but I was out at the time. It was back up by the time I saw the messages.

I haven't been able to get anything interesting out of pathping. It seems to crash out as soon as it tries to get past my T-Mobile Home Internet 5G cellular modem/router. Tracert also gives a bunch or timeout messages as soon as it gets past my router without telling me where it's trying to go.
 

Attachments

Last edited:

zedric

NAWCC Member
Aug 8, 2012
2,078
484
83
Country
Region
So to summarise,

there is a regular fault that prevents access to the forums. It happens the same time every day and has been happening for weeks.

It does not appear at first sight to be due to the server hosting the forum, and so probably relates to a network issue.

The expert that the NAWCC brought in didn’t see any issue (otherwise we’d have been told about it) which is odd.

And Comcast recommend a fibre connection which may or may not fix the problem

The problem does not seem to be congestion unless, for a example, off-site backup is swamping the forum traffic. So the consensus is that the fibre fix might work if they route traffic for the fibre differently to traffic from the T1 connection.

Does that seem about right?
 

Bill Stuntz

Registered User
Apr 6, 2012
5,022
68
48
74
Columbus. OH
Country
Region
there is a regular fault that prevents access to the forums. It happens the same time every day and has been happening for weeks.
It's mostly around the same time, but not EVERY day. And there is sometimes more that one outage in a day. They seem to be more consistent recently, but back in October, most of them went down for a few minutes between 7&10am instead of the 3:30ish we're seeing now.
 

Dave Coatsworth

Senior Administrator
Staff member
NAWCC Business
NAWCC Fellow
Sponsor
Feb 11, 2005
9,078
4,033
113
63
Camarillo, CA
www.daveswatchparts.com
Country
Region
So to summarise,

there is a regular fault that prevents access to the forums. It happens the same time every day and has been happening for weeks.

It does not appear at first sight to be due to the server hosting the forum, and so probably relates to a network issue.

The expert that the NAWCC brought in didn’t see any issue (otherwise we’d have been told about it) which is odd.

And Comcast recommend a fibre connection which may or may not fix the problem

The problem does not seem to be congestion unless, for a example, off-site backup is swamping the forum traffic. So the consensus is that the fibre fix might work if they route traffic for the fibre differently to traffic from the T1 connection.

Does that seem about right?
I don't think this particular problem has been happening for 'weeks'. It was first reported by Graham on Saturday in post #47. This was well after consulting with the expert and with Comcast, so their input is not relevant to this problem. I'm not sure I would expect the move to fiber to resolve this current outage problem. I agree that this problem does not seem to be congestion in the Comcast network. Otherwise, iMIS and the Gift Shop should have also been hit.
 

Bill Stuntz

Registered User
Apr 6, 2012
5,022
68
48
74
Columbus. OH
Country
Region
I don't think this particular problem has been happening for 'weeks'.
I disagree. I just selected all the uptimerobot "MB Down" messages in my inbox - 21 in January, 7 in December, 7 in November, 17 in October, 14 in September. It's become significantly worse recently.
 

Dave Coatsworth

Senior Administrator
Staff member
NAWCC Business
NAWCC Fellow
Sponsor
Feb 11, 2005
9,078
4,033
113
63
Camarillo, CA
www.daveswatchparts.com
Country
Region
Bill,
You are lumping together all causes of downtime, including those that we have confirmed are caused by inadequacy in the Comcast residential network. I am looking at this latest problem, which began sometime last week, as a discrete issue.
 

zedric

NAWCC Member
Aug 8, 2012
2,078
484
83
Country
Region
I don't think this particular problem has been happening for 'weeks'. It was first reported by Graham on Saturday in post #47. This was well after consulting with the expert and with Comcast, so their input is not relevant to this problem. I'm not sure I would expect the move to fiber to resolve this current outage problem. I agree that this problem does not seem to be congestion in the Comcast network. Otherwise, iMIS and the Gift Shop should have also been hit.
Hi Dave

I had been noticing this problem for at least a week or so before Graham mentioned it - that is this specific problem of the forum being down for about 10 minutes from around 7:10am my time - I had assumed it might be a problem on my end so didn't think to post..
 

Bill Stuntz

Registered User
Apr 6, 2012
5,022
68
48
74
Columbus. OH
Country
Region
Bill,
You are lumping together all causes of downtime, including those that we have confirmed are caused by inadequacy in the Comcast residential network. I am looking at this latest problem, which began sometime last week, as a discrete issue.
True... to an extent. I have no idea WHY it goes down. All I know is that I get emails when it does - which has happened pretty much every day this month. But I also know that I don't get those emails when the "slows" attack. And I have no idea whether the "slows" and the "downs" are related. Are the "downs" caused by extended "super-slows"? If the information I've been able to supply helps diagnose the problem... GREAT. If not... I can't think of anything more I can do.
 

gmorse

NAWCC Member
Jan 7, 2011
14,556
3,675
113
Breamore, Hampshire, UK
Country
Region
Hi zedric,
I had been noticing this problem for at least a week or so before Graham mentioned it - that is this specific problem of the forum being down for about 10 minutes from around 7:10am my time - I had assumed it might be a problem on my end so didn't think to post..
Yes, my experience is similar, I had attributed it to some local network problem at my end until a pattern began to emerge.

Regards,

Graham
 

John Matthews

NAWCC Member
Sep 22, 2015
4,081
2,190
113
France
Country
Region
My background is in IT Technical Support and Systems Architecture and I have to say that I have been very disappointed by the response given on this thread.

I am forced to the conclusion that the data that is required to identify the cause of the slow response is not being collected and that data which is being collected does not appear to being interpreted correctly.

There is definitely an issue with the Comcast network and from the data I have collected it is a very significant factor. It may be the only factor. It has been present for many months. I am systematically monitoring the network on an hourly basis and have done so since the 9th December. I attach an extract of the data I have collected showing the afternoon period (EST). I have highlighted the those times when I am confident that response issues probably occurred.

On public networks, where there are devices working close to capacity, small variation in load will have significant impacts on response time and loads on public networks vary rapidly.

Where there is a * it means the response timed out (response >1000ms). A ? indicates that pathping was terminated because of the time out.

I don't think this particular problem has been happening for 'weeks'. It was first reported by Graham on Saturday in post #47. This was well after consulting with the expert and with Comcast, so their input is not relevant to this problem. I'm not sure I would expect the move to fiber to resolve this current outage problem. I agree that this problem does not seem to be congestion in the Comcast network. Otherwise, iMIS and the Gift Shop should have also been hit.
Dave I believe you are mistaken. As demonstrated in the attached, the problem has been present for some time. The iMIS and Gift Shop systems on net.nawcc.org are unlikely to be accessed across the ComCast network by heavy users - these being the only users who are likely to persistently observe slow response.

John
 

Attachments

Tom McIntyre

Technical Admin
Staff member
NAWCC Star Fellow
NAWCC Ruby Member
Sponsor
Golden Circle
Aug 24, 2000
85,198
2,937
113
86
Boston
awco.org
Country
Region
No business that is our size has the resources to manage these services in house. There are a number of agents who do this kind of work and do it well. None of them is willing to work on our system on a contract basis.

If our system was deployed with any of a number of service facilities, we could contract for the services we need.

We have already sacrificed the Advanced Search feature, Image proxy servers, and content delivery network because we cannot maintain them in house.

I think this discussion is addressing the wrong problem.

This is last years performance summary on our Forum Server that I posted earlier.

Attachments

 

Jim Haney

NAWCC Member
Sponsor
Donor
Sep 21, 2002
7,466
2,745
113
73
Decatur, TN.
Country
Region
Tom,
What was the cost of XenForo taking over control according to your exchange with Brent in 2017?

I hope the BOD has done some work on this,I am sure you have requested this in the past.

If we expect a reliable MB and everyone would agree that poor service is a turnoff for users, we should budget the funds to do it.
 

Dick C

NAWCC Member
Oct 14, 2009
2,165
165
63
Country
My background is in IT Technical Support and Systems Architecture and I have to say that I have been very disappointed by the response given on this thread.

I am forced to the conclusion that the data that is required to identify the cause of the slow response is not being collected and that data which is being collected does not appear to being interpreted correctly.

There is definitely an issue with the Comcast network and from the data I have collected it is a very significant factor. It may be the only factor. It has been present for many months. I am systematically monitoring the network on an hourly basis and have done so since the 9th December. I attach an extract of the data I have collected showing the afternoon period (EST). I have highlighted the those times when I am confident that response issues probably occurred.

On public networks, where there are devices working close to capacity, small variation in load will have significant impacts on response time and loads on public networks vary rapidly.

Where there is a * it means the response timed out (response >1000ms). A ? indicates that pathping was terminated because of the time out.



Dave I believe you are mistaken. As demonstrated in the attached, the problem has been present for some time. The iMIS and Gift Shop systems on net.nawcc.org are unlikely to be accessed across the ComCast network by heavy users - these being the only users who are likely to persistently observe slow response.

John
John, Thanks again for your analysis. There well could be two different issues being that need addressing. The first being why does the system become totally unavailable between 2:13pm and around 2:30 pm Texas time.

The two questions that I would like to see answered by the staff at the Columbia HQ are: 1. When you attempt to access mb.nawcc.org is the wide area network used OR are you connected via a Local Area Network? 2. During the stated outage time beyond 2:13pm Texas Time (3:13pm Columbia time) are you able to connect to mb.nawcc.org?

The second is the increased error rates within the Comcast network that you are showing. If Columbia puts in place a T3 is this going to provide any relief given that a great number of the lost packets are in the PA routers?

Regards,
Dick
 

Dick C

NAWCC Member
Oct 14, 2009
2,165
165
63
Country
System unavailable 2:13 pm Texas time....back operational at 2:33pm

Pathping run during this time shows 4 errors, each on one of 4 nodes.
 

Bill Stuntz

Registered User
Apr 6, 2012
5,022
68
48
74
Columbus. OH
Country
Region
I have no idea whether this could help or not. At 3:13 I tried to access the MB & got server not found. Ran tracert - reached the MB. Got an uptimerobot message - MB down at 3:19. Tried every few minutes & always got server not found, but several tracerts always reached the mb. MB came back online about 3:35, got uptimerobot msg MB up at 3:36. I don't understand why tracert could ping the MB, MB never found in FF, robot said it was down even though tracert could ping it. I just can't seem to wrap my brain around it. This is the first time I've been actively trying to use the MB when the problem was actually happening. Usually I don't see the messages until the MB is back up.
 

Tom McIntyre

Technical Admin
Staff member
NAWCC Star Fellow
NAWCC Ruby Member
Sponsor
Golden Circle
Aug 24, 2000
85,198
2,937
113
86
Boston
awco.org
Country
Region
I was working on the system today when it went down. The failure was in cPanel/WHM on the forum server, which also serves the nawccfino (Waltham) facility and our forum testing site, which I was working on.

I do not know what caused the failure, but I suspect it is related to SSL certificate checking and the systems attempt to address failures there.

The only additional observations I have are that the nawccinfo site returned after about 5 to 8 minues, the test forum site returned a few minutes later and finally the live forum area came back.

I believe Seth has contacted our systems consultant about checking the problem with installing our certificates and with a bit of luck, this annoying problem can be resolved.

This will not fix out problems but it should eliminate this recurring annoyance.

With respect to the general system upgrade and servicing, I have not found anyone willing to provide the level of services described by Brent on the XenForo site for an in-house system like ours. It is too much overhead for the vendors in this marketplace to learn the idiosyncracies of each of the physical facilities. They will only bid on standardized configurtions. i.e. like those provided by commercial service centers and/or the cloud.
 

Dick C

NAWCC Member
Oct 14, 2009
2,165
165
63
Country
Looking at the users active and specifically the robots. During the outage period it appears that PROXIMIC was very active against different threads?

I know nothing about robots and whether they you can set parameters, but why were they active when others could not get at the site.?
 

John Matthews

NAWCC Member
Sep 22, 2015
4,081
2,190
113
France
Country
Region
Here is the network data for this afternoon

1643325130199.png

In view of this snapshot of the network, particularly the timeouts being experienced at nodes 14, 15 & 16 in Pennsylvania, the only location where it would be possible to reliably assess how the system and applications were performing between 14:00 & 1600 is at headquarters. You would have to be accessing the system directly on a LAN and not across the ComCast network, to eliminate the wide area network impacts, which clearly exist.

I am not saying that system and application issues are not present, I am simply saying that if they exist, they can only be analysed reliably at headquarters, not remotely across the ComCast network.

John
 
Last edited:
  • Like
Reactions: Dick C

Dick C

NAWCC Member
Oct 14, 2009
2,165
165
63
Country
I haven't been on much recently. But my site monitor sends me emails several times per day that the site is down & back up.
Bill,

Do you have an uptime report that might show the date that these outages started around the 3 pm onwards EST? Perhaps it can be used to determine if a piece of application software or systems or network underwent changes?
 

Bill Stuntz

Registered User
Apr 6, 2012
5,022
68
48
74
Columbus. OH
Country
Region
Here's a screen shot of the emails - december & january. Filtered on MB Down messages.
Down.jpg
 

Dick C

NAWCC Member
Oct 14, 2009
2,165
165
63
Country
Here's a screen shot of the emails - december & january. Filtered on MB Down messages.
Can you add to this showing the uptime following the downtime so we can see the duration. I like patterns and would discount small outages vs the ones we are now seeing. Then maybe, and that is a big maybe, someone can recognize when changes might have been made.
 

Dick C

NAWCC Member
Oct 14, 2009
2,165
165
63
Country
Bill,
Thank you. For those of you that are looking at the times please note that if a time is shown as 3:07 it could be up to 5 plus or minus minutes from there as the capture of the info is based on 5 minute intervals.

However, if one looks closely at the time spreads, for the most part the largest spread is in the afternoon hours. And around the 6th of January for the most part the outages are focused on the afternoon.

So, if anything, what might have changed in the servers?

P.S. John's question about whether the higher speed line will bypass the routers that are showing significant error rates needs to be answered. A fiber link between Columbia and Lancaster may not solve any problems if Lancaster traffic will also be routed as it is today.
 

John Matthews

NAWCC Member
Sep 22, 2015
4,081
2,190
113
France
Country
Region
Nothing in the error logs, machine was up the whole time.
As the server show no down time or errors, then it is important to know whether the forum software can be accessed at headquarters without using the ComCast network.

If the forum software is accessed directly at headquarters, does it function normally at times when remote users are unable to access it?

I suspect it is related to SSL certificate checking
Tom are you implying that there is an application verification process that runs daily at a time when remote access is down?

Is there is any scheduled process at this time that could be overloading the network locally or requires ComCast resources to access the corporate server from the forum server related to system integration?

John
 
  • Like
Reactions: Dick C

Tom McIntyre

Technical Admin
Staff member
NAWCC Star Fellow
NAWCC Ruby Member
Sponsor
Golden Circle
Aug 24, 2000
85,198
2,937
113
86
Boston
awco.org
Country
Region
John since I get 5 or 10 emails each day from the NAWCC computers about expring SSL Certificates, I think the WHM system that manages our Forum server is monitoring for expration and validity.

The machine was runnng VMware to define virtual machines the las time I looked (about 18 months ago) and those virtual machines are using the WHM/cPanel product wrapper to Centos linux variant.

Almost all the software has automatic updates turned on so we need someone who can check that nothing goes awry.

Seth is our only IT employee. He is responsible for all the software and machinery in house. He has a list of consultants he can call when necessary.

Steve Humphrey was a computer systems expert as well as being our Executve Director. Kevin Osborne had been working with Steve for a number of years before coming to work for the NAWCC shortly after Steve was hired. We are still regrouping from all the staff changes and the impact on the staff of the Corona pandemic.
 

Bill Stuntz

Registered User
Apr 6, 2012
5,022
68
48
74
Columbus. OH
Country
Region
Tom McIntyre Back when I was an admin & we were developing the XF MB, I had set an uptimerobot monitor on the test XF MB in addition to the "live" MB. You mentioned in a recent post that you were working on "our forum testing site" when the MB went down. I've noticed that sometimes I get messages that both the live MB & the "test" MB went down, but most of the time I only get emails about the "live" site. For THAT outage, I did not get a message about the test site. Did the test MB go down too, or was it just the live site that went down? If it was only the live MB, does that give us any info that might help localize the problem? I also have a monitor on googleapi's which I understand our MB uses. I occasionally get messages that googleapi's is down, but they're never associated with MB outages.
Update: I just re-checked my messages. On the 27th only the live MB went down. On the 26th &25th, both were down. Googleapi's was down on the 26th at about 2am.
 
Last edited:

Jim Haney

NAWCC Member
Sponsor
Donor
Sep 21, 2002
7,466
2,745
113
73
Decatur, TN.
Country
Region
It was down from 3;20 to 3;30, may have been before 3:20 that when I started trying to get on ?
 

Bill Stuntz

Registered User
Apr 6, 2012
5,022
68
48
74
Columbus. OH
Country
Region
Tried to acess the MB at 3:14 - site not found. It cameback up at 3:31. Ran tracert at 2:00 and again at 3:15. These results make me believe that the problem is internal at the MB. It also took nearly a minute to update the thread after I hit the "Post Reply" button. And again, when I edited my post to include that info. The second edit "Save" was almost instant. Uptimerobot said down at 3:20 up at 3:33, and the test MB down at 3:25 and up at 3:34
2pm_tracert.jpg

3.15_tracert.jpg
 
Last edited:

Dick C

NAWCC Member
Oct 14, 2009
2,165
165
63
Country
Error occurred exactly at 2:13 pm Texas Time (CST)

I attempted to reconnect each and every minute

Successful at 2:31 pm Texas Time

During the outage ran Pathping - 1 error in 1 node out of 35

Last Robot logged before the outage was shown as Majestic-12
 

John Matthews

NAWCC Member
Sep 22, 2015
4,081
2,190
113
France
Country
Region
Network data for this afternoon for both the forum and corporate site.

1643412969738.png 1643412850982.png

For access to nawcc.org the routers at nodes 20 & 21 were changed between 12:00 & 13:00 EST.

On both network routes there was degradation in performance between 13:00 and 16:00 EST. More problems accessing mb.nawcc.org with significant time outs ~14:15 EST. The highlighted times on both routes correspond to when the response time is likely to have been increased by ~50%.

John
 

Bill Stuntz

Registered User
Apr 6, 2012
5,022
68
48
74
Columbus. OH
Country
Region
Looking at those charts makes me wonder... MOST of the time that there was a lot of packet loss, the MB response was "normal" - doesn't that imply that the outages are due to something internal to the MB systems? My tracerts before & during the outage looked fine.
 

Tom McIntyre

Technical Admin
Staff member
NAWCC Star Fellow
NAWCC Ruby Member
Sponsor
Golden Circle
Aug 24, 2000
85,198
2,937
113
86
Boston
awco.org
Country
Region
The test site where we are working on the Single Sign On is a subdirectory of the active forum site and the nawc-info site is located in a different account on the same hardware..

I was actively working on all 3 when it went down yesterday. The forum stopped immediately, the other two sites were in the middle of operations and probably not trying to serve new connections.

With no real evidence except my gut feeling it felt like a software problem in the OS and most likely related to a failure in the credential management.
 

Dave Coatsworth

Senior Administrator
Staff member
NAWCC Business
NAWCC Fellow
Sponsor
Feb 11, 2005
9,078
4,033
113
63
Camarillo, CA
www.daveswatchparts.com
Country
Region
MOST of the time that there was a lot of packet loss, the MB response was "normal" - doesn't that imply that the outages are due to something internal to the MB systems?
Exactly right, Bill. I am convinced that we are dealing with two separate problems here. We have the ongoing Comcast residential line problems that cause random (sometimes severe) slowing due to packet loss and we have this 12:13 hard failure on mb.nawcc.org. As I said earlier, I have a committment from HQ that they will bring in our systems consultant to diagnose the daily failure. Hopefully this will happen next week.
 

Bill Stuntz

Registered User
Apr 6, 2012
5,022
68
48
74
Columbus. OH
Country
Region
I am convinced that we are dealing with two separate problems here.
Me too! I don't know much about the nuts&bolts of networking layers, but I'm guessing that pings are handled earlier in the process than https/etc. - basically before the traffic actually gets sent to the server for processing. So the server being out in never-never land can't stop the ping responses. The server itself may not even know it was pinged - kinda like your knee-jerk reflex. Your knee jerks before the brain even knows the nerve in your knee was tapped by your doctor's reflex hammer.

P.S. Only the fact that the doctor was standing off to the side keeps my brain from asking "Why did I just kick him in the nuts?" :D
 
Last edited:

John Matthews

NAWCC Member
Sep 22, 2015
4,081
2,190
113
France
Country
Region
Remote access data for net.nawcc.org and watchnews.nawcc.org

1643468630376.png 1643468664410.png
Remote access data for mb.nawcc.org Saturday 11 December, 2021
1643468929847.png 1643468983549.png
Remote access data for mb.nawcc.org Monday 10 January, 2022
1643469163515.png 1643469202249.png

John
 

Bill Stuntz

Registered User
Apr 6, 2012
5,022
68
48
74
Columbus. OH
Country
Region
I don't think we'll get much more useful data until we get something from the consultant and/or the dedicated line switchover happens.
I'm not convinced that the dedicated line will cure the problem.

uptimerobot: down 3:16 up 3:34
Pings during/after outage:

100 pings, started at 3:20 from Columbus, OH via T-Mobile 5G home gateway- during outage
Ping statistics for 50.244.233.125:
Packets: Sent = 100, Received = 100, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 81ms, Maximum = 239ms, Average = 138ms

3:25
Ping statistics for 50.244.233.125:
Packets: Sent = 100, Received = 100, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 72ms, Maximum = 240ms, Average = 132ms

3:35 after MB came back up
Ping statistics for 50.244.233.125:
Packets: Sent = 100, Received = 100, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 70ms, Maximum = 251ms, Average = 120ms
 
Last edited:

Dick C

NAWCC Member
Oct 14, 2009
2,165
165
63
Country
I look forward to what the consultant finds. My strong leaning for the outage problem lies with the server system.
 

Tom McIntyre

Technical Admin
Staff member
NAWCC Star Fellow
NAWCC Ruby Member
Sponsor
Golden Circle
Aug 24, 2000
85,198
2,937
113
86
Boston
awco.org
Country
Region
The service that operates the Internet is failing. I get notices of failures but I get so many that are irrelevant that I do not always notice the details. However last night I saw notices of httpd failure which is the code for the low level internet driver/listener.
 

Bill Stuntz

Registered User
Apr 6, 2012
5,022
68
48
74
Columbus. OH
Country
Region
Any idea why it should be failing so consistently at the same time each day? Is that driver up to date?
 

Tom McIntyre

Technical Admin
Staff member
NAWCC Star Fellow
NAWCC Ruby Member
Sponsor
Golden Circle
Aug 24, 2000
85,198
2,937
113
86
Boston
awco.org
Country
Region
I hope our consultant can tell us. It could be a resource issue while Centos is tryng to run an update.

We stopped trying to run Elasticsearch because of similar behavior. We cannot just stop the Internet Conncection.
 

Dick C

NAWCC Member
Oct 14, 2009
2,165
165
63
Country
The service that operates the Internet is failing. I get notices of failures but I get so many that are irrelevant that I do not always notice the details. However last night I saw notices of httpd failure which is the code for the low level internet driver/listener.
I assume that these notices are date and time stamped. Do they occur just before the mb goes down at 2:13?

You indicated in another message that perhaps Centos is trying to update. Are the Centos updates scheduled at 2:13?

What version of Centos is installed?
 
Last edited:

Tom McIntyre

Technical Admin
Staff member
NAWCC Star Fellow
NAWCC Ruby Member
Sponsor
Golden Circle
Aug 24, 2000
85,198
2,937
113
86
Boston
awco.org
Country
Region
I do not manage any of this, but I have access to the area where the information is shown.

CentOS v7.9.2009 vmware [mb2] v100.0.7

Load Averages: 6.72 5.54 5.31

Change Log for 100.0.7
Entry: 2022-01-20
  • Fixed case CPANEL-39775: Update MySQL GPG Key url for 2022.
Change Log for 100.0.5
Entry: 2021-11-30
  • Fixed case CPANEL-39775: Update MySQL GPG Key url for 2022.
  • Fixed case CPANEL-38994: Ensure Dovecot Solr starts on systems where iptables reports warnings with it's output.
  • Fixed case CPANEL-39118: Update cpanel-php73-services-weather to 1.4.7-2.cp1198.
  • Fixed case CPANEL-39166: Update cpanel-perl-532-dns-unbound to 0.27-1.cp1198.
  • Fixed case CPANEL-39227: Fix Internal Server Error when accessing WHM > phpMyAdmin on a server with a trial license.
  • Fixed case CPANEL-39240: CRTs created with /scripts/gencrt include attributes input by the user.
  • Fixed case CPANEL-39272: Update cpanel-roundcubemail to 1.4.12-1.cp1198.
  • Fixed case CPANEL-39281: Fix bug in importing CSV files when the account name is "excel", "office", or anything that can be confused for a file type.
  • Fixed case CPANEL-39292: Update cpanel-php73 to 7.3.33-1.cp1198.
  • Fixed case CPANEL-39314: Update cpanel-mailman to 2.1.37-1.cp1198.
  • Fixed case CPANEL-39321: Adjust hostname SSL certs' DCV for ancestor/implicit DCV change.
 

John Matthews

NAWCC Member
Sep 22, 2015
4,081
2,190
113
France
Country
Region
Tom

On the server hosting the forum, is VMware ESXi being used to create multiple virtual machines, and is it CentOS linux which is hosting the XenForo forum software?

If so, are there other VMs and if so, do you know what other applications are running?

John
 

John Matthews

NAWCC Member
Sep 22, 2015
4,081
2,190
113
France
Country
Region
The test site where we are working on the Single Sign On is a subdirectory of the active forum site and the nawc-info site is located in a different account on the same hardware..

I was actively working on all 3 when it went down yesterday. The forum stopped immediately, the other two sites were in the middle of operations and probably not trying to serve new connections.

With no real evidence except my gut feeling it felt like a software problem in the OS and most likely related to a failure in the credential management.
Tom - If I interpret your recent posts correctly.

Common hardware is supporting a number of virtual machines. The forum (mb.nawcc.org) has the dedicated IP address of 50.244.233.125. This is a VM running CentOS Linux being supported by the base metal hypervisor VMwareESXi. VMwareESXi is supporting additional VMs serviced by the IP address 50.244.233.124 (info.nawcc.org, natcon.nawcc.org and watchnews.nawcc.org) - I assume these use DHCP functionality of the hypervisor to control access.

The IP address 23.31.242.4 supports net.nawcc.org (business directory & the shop) and also test.nawcc.org.

Is the latter the test site to which you refer?

If it is, your first sentence, implies to me that net.nawcc.org and test.nawcc.org are VMs supported by VMware ESXi - is that correct?

When you say software problem in the OS, I assume you mean VMwareESXi (as it runs directly on the hardware) rather than the OS of the VM, i.e. CentOS Linux.

John
 

Bernhard J.

NAWCC Member
Sponsor
Jan 10, 2022
1,033
1,085
113
Berlin, Germany
Country
Region
"There is nothing to complain about" is the highest praise of a resident of Berlin (= me, presently)
 

Forum statistics

Threads
177,619
Messages
1,556,743
Members
53,633
Latest member
Leslie
Encyclopedia Pages
909
Total wiki contributions
3,058
Last edit
Watch Inspectors by Kent