• Member Voting Now through June 6. Check Your Email for a Link to the Online Ballot. The Ballot Contains Links to Each Proposed Amendment to Bylaws and Articles of Incorporation.

Forum is extremely slow at the moment.

Tom McIntyre

Technical Admin
Staff member
NAWCC Star Fellow
NAWCC Ruby Member
Sponsor
Golden Circle
Aug 24, 2000
84,737
2,385
113
85
Boston
awco.org
Country
Region
thanks for the data John. Are you sure the data is for www.nawcc.org (which is EZ's URL for us) and not net.nawcc.org, which also resides inside the Columbia campus (behind the firewall) with mb.nawcc.org?
 

Dick C

Registered User
Oct 14, 2009
2,037
132
63
Country
Hi zedric,


I don't know if anyone else has noticed it, but every evening recently at around 8pm GMT, the whole board has stopped responding for 5 to 10 minutes before going back to normal. Images and all content load normally all the time apart from this brief but annoying hiatus.

Regards,

Graham
I have not noticed the time; however, I do know that this afternoon, Texas time, the message board was not available.
 

zedric

NAWCC Member
Aug 8, 2012
1,956
404
83
Country
Region
Hi zedric,


I don't know if anyone else has noticed it, but every evening recently at around 8pm GMT, the whole board has stopped responding for 5 to 10 minutes before going back to normal. Images and all content load normally all the time apart from this brief but annoying hiatus.

Regards,

Graham
Hi Graham

I wasn't online early today (my time), but have noticed a general slow down around that time most days, as John Matthews has posted - the forum generally seems to become unworkable for a few minutes and then the problem resolves itself - sometimes I get timeouts, other times various error messages. Let's hope that the consultant who was used has come up with some workable recommendations....
 

John Matthews

NAWCC Member
Sep 22, 2015
3,703
1,829
113
France
Country
Region
Are you sure the data is for www.nawcc.org (which is EZ's URL for us) and not net.nawcc.org, which also resides inside the Columbia campus (behind the firewall) with mb.nawcc.org?
Yes, I'm certain.

1642917597543.png

vs

1642917756135.png

The host name of net.nawcc.org is static.hfc.comcastbusiness.net for which the ISP is Comcast and the geolocation is given as Columbia or Lancaster, depending upon which site you use. For nawcc.org geolocation is Lansing and the ISP Liquid Web. For mb.nawcc.org geolocation is in the Lancaster area and ISP is Comcast.

John
 

Tom McIntyre

Technical Admin
Staff member
NAWCC Star Fellow
NAWCC Ruby Member
Sponsor
Golden Circle
Aug 24, 2000
84,737
2,385
113
85
Boston
awco.org
Country
Region
OK. What I was trying to get at is that nawcc.org located in lansing, is in Michigan and not related to any possible problems with ComCast except when it is displaying content from net.nawcc.org in an iframe.

It is all part of the same general issues and the upgrade to the wire and possible reconfiguration of our Columbia firewall could address that.

I would really like to have an image proxy server and an Elasticsearch server outside the campus to avoid the obvious bottlenecks. Both of those are causing severe problems for the forums and we have not been able to use our integrated search services like similar threads because of the reliability and performance issues.

The software supports serving the static content from a Content Delivery Network and it would all be much easier in my opinion if all of our interactive resources were on an external server and I presonally think it should be cloud based also but I have no great stake in that.
 

Dick C

Registered User
Oct 14, 2009
2,037
132
63
Country
It just happened again...I posted an item at 2:12 p.m. Texas time, 8:12 GMT. Went to find another item and came back shortly before 2:18 and could not edit the post. So attempted to bring up another version of the message board. At 2:29 p.m. I was able to get access to the second version. During this outage the top left of my post showing the "...minutes ago" was updating every minute; however, the rest of the site was unusable.

During this outage I was able to get the homepage of NAWCC.ORG.

So why at this time does this happen.
 
  • Like
Reactions: zedric and gmorse

Kevin W.

NAWCC Member
Apr 11, 2002
23,443
644
113
64
Nepean, Ontario, Canada
Country
Region
It went down at 3;14 pm today, for me, could not get on the site at all, i tried 3 times.
 

John Matthews

NAWCC Member
Sep 22, 2015
3,703
1,829
113
France
Country
Region
You may like to read this if you need some background on the packet loss that is recorded in the logs I have posted.

I posted this in November when the response was really bad. The site was virtually unusable.

1642978206221.png

Starting with the bottom line:
  • 26% of packets destined for mb.nawcc.org from my PC in southern France were being lost
  • 8% of the packets were being lost between the Comcast router '96' & mb.nawcc.org
  • 18% of packets were being lost between routers '158' & '96'
If you compare those losses with those we are currently experiencing (post 48) you can see that the losses are smaller, but still significant. For mb.nawcc.org 10%, 4% 0% and nawcc.org 7%, 3% 0%.

Losses along the links probably indicates that the links are congested, and losses at the routers, that they are being overloaded. I infer from the data that the local area Comcast network (routers and links) is overloaded.

In order to determine why this is happening at a particular time in the day, requires the source, destination and volume of packets being transmitted to be analysed. While the performance of NAWCC devices and the network traffic generated, should be known and be routinely monitored, so that loading and headroom across 24hrs is understood, the equivalent analysis to determine the loading of the outwith Comcast network can, realistically, only be done by them.

John
 

John Matthews

NAWCC Member
Sep 22, 2015
3,703
1,829
113
France
Country
Region
What I was trying to get at is that nawcc.org located in lansing, is in Michigan and not related to any possible problems with ComCast except when it is displaying content from net.nawcc.org in an iframe.
Tom I am confused by this in relation to the logs I have posted.

The logs relate to the network performance between my PC and the servers that host the NAWCC applications.

They take no account of the NAWCC software applications. If you are implying that in order to respond to a software request on one server, there is a need to request the transfer of data from another server, then the impact on overall response will include the performance of the link between the two servers. If the servers are in different physical locations, and particularly if they have to use the public network, that is almost certainly going to cause an unacceptable degradation in response. This would be in addition to any network issues that the logs show.

The NAWCC servers that are visible on the corporate and forum sites are:
  • nawcc.org - the main corporate site
    • 67.225.191.105
    • ISP: Liquid Web
    • Location Lansing, Michigan
  • mb.nawcc.org - forums
    • 50.244.235.125
    • ISP: Comcast Cable
    • Location: Lancaster, Pennsylvania
  • net.nawcc.org
    • 23.31.242.4
    • ISP: Comcast Cable
    • Location: Lancaster, Pennsylvania
  • natcon/watchnews.nawcc.org
    • 50.244.233.124
    • ISP: Comcast Cable
    • Location: Lancaster, Pennsylvania
I assume the last three servers are located at headquarters, but their precise physical location and the internal network is not known to me.

All I can do from my location in France is see the network route to the different servers. The route from other locations will be different and only if there are performance issues with the host server or with the local network in the vicinity of the server, will all remote users experience common response issues. I can only observe the network issues. While it is true that the majority of response problems have been experienced on the forums, there have been response issues on the corporate site hosted in Michigan. It is not true to say that Comcast issues have not contributed to the response degradation of the corporate site.

Here are the logs for Saturday 15 January at 22:00 GMT (16:00 East Coast). Hop 9 is when my traffic reaches the USA.

Forum

1643023323461.png

Corporate

1643023372338.png

As can be seen the traffic from Paris (location of the previous node 8) enters the Comcast network at Ashburn in Virginia on both routes but is serviced by different routers that direct the packets on to their final destination. A significant part of that route is managed by Comcast to both servers and at the time of this log the corporate site was experience Comcast and Liquid Web network issues.

John
 

Tom McIntyre

Technical Admin
Staff member
NAWCC Star Fellow
NAWCC Ruby Member
Sponsor
Golden Circle
Aug 24, 2000
84,737
2,385
113
85
Boston
awco.org
Country
Region
My theory s that all the problems are associated with the connection to our campus in Columbia.

Services requested from the server in Lansing Michigan only run into problems if they reference the machines in Columbia.

The standard responses from the Internet in general do not suffer from the problems we see with Columbia. If all of the machines were in Lansing (or almost anywhere else) we would not be seeing these problems.

The iMIS membership management system located inside the HQ building in Columbia is the basic system for staff operations and probably cannot be moved. On the other hand its traffic is quite a bit less than the other systems. It services our Event functions and our museum store and NAWCC Member account management. None of the traffic associated with those functions is heavy.
 

John Matthews

NAWCC Member
Sep 22, 2015
3,703
1,829
113
France
Country
Region
Services requested from the server in Lansing Michigan only run into problems if they reference the machines in Columbia.
Tom - this is not true if there are network problems on the route to the nawcc.org server in Lansing and as I have just demonstrated they do occur. If users are accessing /publications or /local-chapters directly that are stored on nawcc.org, at times when there are network problems, the service will be degraded. It is true that the network problems local to Lansing occur less frequently and the fibre link to EZSolutions contributes to greater reliability, but it is untrue to say that they don't occur.

John
 

Tom McIntyre

Technical Admin
Staff member
NAWCC Star Fellow
NAWCC Ruby Member
Sponsor
Golden Circle
Aug 24, 2000
84,737
2,385
113
85
Boston
awco.org
Country
Region
John, the global stats are interesting but are unlikely to mean much in terms of users of the forum complaining about service speed. Well over 90% of our traffic is to the mb.nawcc.org server.

More importantly than that we have disabled services that would improve matters because we cannot operate them reliably in the facility there.
 

gmorse

NAWCC Member
Jan 7, 2011
13,712
2,882
113
Breamore, Hampshire, UK
Country
Region
Hi,

Is the fact that the slowdown occurs at roughly the same time each day and has the same duration, pertinent?

Regards,

Graham
 

Dick C

Registered User
Oct 14, 2009
2,037
132
63
Country
2:08 Texas time OK
2:13 p.m. cannot load mb.nawcc.org
Between these times I looked once per minute until the following occurred.
2:24 p.m. mb.nawcc.org is available

So why is this consistent? I cannot believe that the network is congested every day at the same time frame unless there is an issue on the server where it is unable to service requests, whether someone is fussing with network connectivity, whether some piece of software has locked the system, whether the great ghost is at work, etc.

How will a faster link solve this problem? I cannot see how. If the link is too slow then I would expect slowerrrrrrrrrrrrrrrr response, not a total denial of service for a 10+ minute interval during the same time frame every day.

Has anyone taken the time to take a serious look into the MB system to see what tasks are running during this period of outages, etc.?

Are there any backups scheduled at that time?
 
Last edited:
  • Like
Reactions: gmorse

Bill Stuntz

Registered User
Apr 6, 2012
4,958
45
48
73
Columbus. OH
Country
Region
I haven't been on much recently. But my site monitor sends me emails several times per day that the site is down & back up. Monitor.jpg
 

John Matthews

NAWCC Member
Sep 22, 2015
3,703
1,829
113
France
Country
Region
I am really not sure how much clearer I can state the problem ...

All I can do from my location in France is see the network route to the different servers. The route from other locations will be different and only if there are performance issues with the host server or with the local network in the vicinity of the server, will all remote users experience common response issues. I can only observe the network issues.
and obviously the timing is significant ...

Is the fact that the slowdown occurs at roughly the same time each day and has the same duration, pertinent?
I think we are all becoming frustrated by the lack of information that is forthcoming.

Even if there is limited in-house IT resource, and there is a lack of routine monitoring, surely it is not difficult to determine whether it is the server or the wide area network? If you have any problem with just limited number of possible causes, you investigate what happens if you remove one of the possible causes from the situation.

How do the staff working at headquarters access the forums? I assume their interaction can be across the LAN (the in-house network) and they do not have to use the Comcast network? If so, have they been explicitly asked to use the forum during the periods when remote response is degraded and report their experience?

John
 
  • Like
Reactions: Dick C and gmorse

John Matthews

NAWCC Member
Sep 22, 2015
3,703
1,829
113
France
Country
Region
Here's something you may wish to try.

For those of you who are noticing periods of degraded performance, working on a Windows platform and are familiar with using the command prompt, I would suggest that when you experience slow response you try using the utility pathping. This gives the output I have posted above. Try it initially when the response is good and compare the results.

Access the command prompt by right click on the MS symbol on the task bar - select Command Prompt this will bring up a window like so ...

1643102428872.png
type pathping mb.nawcc.org at the prompt. The command will take a little time to complete - a matter of minutes depending on the number of hops that have to be analysed. From France it can take 8 or 9 minutes. Check the data towards the end of the output and compare with the results I have just obtained at 1025 GMT in a period of good response.

1643103241748.png

This will help you interpret the results ...

1643102721691.png

It would be helpful to screen capture the output and post on this thread. This will identify at which point in the network the routes from various locations converge - it is between that point and the server where any analysis should be focused.

Someone in headquarters should use it - this will check that they are not being directed out to the Comcast network and back in again. I have known it happen!

I believe pathping is not available on Apple platforms, but you can use traceroute included in the GUI tool Network Utility. It gives a subset of the information provided by pathping

John
 
Last edited:

Dick C

Registered User
Oct 14, 2009
2,037
132
63
Country
Thank you for pointing out this utility.

Now, I might be missing something. During the outage I used the PING command to ping mb.nawcc.org and the results were successful. Doesn't that mean that I was able to get to the server with the message board software?
 

Dick C

Registered User
Oct 14, 2009
2,037
132
63
Country
On the MB server are there any tasks scheduled to kick off around 2:12 or thereabouts? If so, should they be set to 2:12 a.m. rather than p.m.?
 

John Matthews

NAWCC Member
Sep 22, 2015
3,703
1,829
113
France
Country
Region
Yes - if life is rosy you should get something like this

1643128273821.png

that's sending 10 echo requests each of 1KB.

At times when there are performance issues one or more of the requests will time out and the times will increase and be less consistent.

1643128871948.png

The average time will depend on the length of the route and the speeds of the links.

John
 

Dick C

Registered User
Oct 14, 2009
2,037
132
63
Country
Yes - if life is rosy you should get something like this

View attachment 692062

that's sending 10 echo requests each of 1KB.

At times when there are performance issues one or more of the requests will time out and the times will increase and be less consistent.

View attachment 692064

The average time will depend on the length of the route and the speeds of the links.

John
So I was successful with ping when the outage was occurring. Does this not point to something in the operating system or mb software?
 

John Matthews

NAWCC Member
Sep 22, 2015
3,703
1,829
113
France
Country
Region
No - you cannot draw that conclusion.

All it tells you is that 4 echo requests, each of a mere 32 bytes (default) were received and responded to.

John
 

Dick C

Registered User
Oct 14, 2009
2,037
132
63
Country
No - you cannot draw that conclusion.

All it tells you is that 4 echo requests, each of a mere 32 bytes (default) were received and responded to.

John
So the operating system is responding by echoing my packets. Then shouldn't I be looking further into the OS or the MB software to see what is going on if Ping can reach the system; yet the MB is not responding to my subsequent requests?
 

Jerry Treiman

NAWCC Member
Golden Circle
Aug 25, 2000
7,102
4,091
113
Los Angeles, CA
Country
Region
About 10 minutes ago (20:22 GMT) I was unable to access these forums; it was unavailable for at least 5 minutes. NAWCC.org opened for me.
 

Tom McIntyre

Technical Admin
Staff member
NAWCC Star Fellow
NAWCC Ruby Member
Sponsor
Golden Circle
Aug 24, 2000
84,737
2,385
113
85
Boston
awco.org
Country
Region
I normally only use ping because pathping gives you a lot of details about how the isp's organize their network which is essentially of no interest to me.

My isp at home is Spectrum, which is a recent merger of Charter and Time Warner. In this instance only Spectrum and Comcast are involved in the connection to Columbia. My home connection is a broadband cable and it is not surprising that a packet was lost in establishing the connection.

After that, hops 3 to 7 were on Spectrum/Charter and hops 8 to 17 were Comcast and finally our campus in Columbia.

I then did the same thing to our main server with similar results but no Comcast in the path, just Spectrum to Liquid Web.

Microsoft Windows [Version 10.0.22000.434]
(c) Microsoft Corporation. All rights reserved.

C:\Users\tom>pathping mb.nawcc.org

Tracing route to mb.nawcc.org [50.244.233.125]
over a maximum of 30 hops:
0 McSpectre [192.168.123.55]
1 192.168.123.254
2 22.15.66.1
3 096-034-141-104.biz.spectrum.com [96.34.141.104]
4 be32.crr02oxfrma.netops.charter.com [96.34.80.238]
5 bbr02slidla-bue-2.slid.la.charter.com [96.34.2.156]
6 bbr01blvlil-tge-0-0-0-11.blvl.il.charter.com [96.34.0.137]
7 bbr02ashbva-bue-4.ashb.va.charter.com [96.34.3.163]
8 be-205-pe04.ashburn.va.ibone.comcast.net [23.30.207.5]
9 be-2304-cs03.ashburn.va.ibone.comcast.net [96.110.37.137]
10 be-1312-cr12.ashburn.va.ibone.comcast.net [96.110.32.210]
11 be-302-cr11.pittsburgh.pa.ibone.comcast.net [96.110.32.102]
12 be-1211-cs02.pittsburgh.pa.ibone.comcast.net [96.110.38.133]
13 96.110.42.166
14 be-34-ar01.lancaster.pa.pitt.comcast.net [69.139.168.142]
15 96.110.25.18
16 162.151.69.158
17 c-174-54-64-96.hsd1.pa.comcast.net [174.54.64.96]
18 mb.nawcc.org [50.244.233.125]

Computing statistics for 450 seconds...
Source to Here This Node/Link
Hop RTT Lost/Sent = Pct Lost/Sent = Pct Address
0 McSpectre [192.168.123.55]
0/ 100 = 0% |
1 5ms 0/ 100 = 0% 0/ 100 = 0% 192.168.123.254
0/ 100 = 0% |
2 --- 100/ 100 =100% 100/ 100 =100% 22.15.66.1
0/ 100 = 0% |
3 14ms 0/ 100 = 0% 0/ 100 = 0% 096-034-141-104.biz.spectrum.com [96.34.141.104]
0/ 100 = 0% |
4 18ms 0/ 100 = 0% 0/ 100 = 0% be32.crr02oxfrma.netops.charter.com [96.34.80.238]
0/ 100 = 0% |
5 25ms 0/ 100 = 0% 0/ 100 = 0% bbr02slidla-bue-2.slid.la.charter.com [96.34.2.156]
0/ 100 = 0% |
6 29ms 0/ 100 = 0% 0/ 100 = 0% bbr01blvlil-tge-0-0-0-11.blvl.il.charter.com [96.34.0.137]
0/ 100 = 0% |
7 26ms 0/ 100 = 0% 0/ 100 = 0% bbr02ashbva-bue-4.ashb.va.charter.com [96.34.3.163]
0/ 100 = 0% |
8 28ms 0/ 100 = 0% 0/ 100 = 0% be-205-pe04.ashburn.va.ibone.comcast.net [23.30.207.5]
0/ 100 = 0% |
9 26ms 0/ 100 = 0% 0/ 100 = 0% be-2304-cs03.ashburn.va.ibone.comcast.net [96.110.37.137]
0/ 100 = 0% |
10 26ms 0/ 100 = 0% 0/ 100 = 0% be-1312-cr12.ashburn.va.ibone.comcast.net [96.110.32.210]
0/ 100 = 0% |
11 31ms 0/ 100 = 0% 0/ 100 = 0% be-302-cr11.pittsburgh.pa.ibone.comcast.net [96.110.32.102]
0/ 100 = 0% |
12 33ms 0/ 100 = 0% 0/ 100 = 0% be-1211-cs02.pittsburgh.pa.ibone.comcast.net [96.110.38.133]
0/ 100 = 0% |
13 35ms 0/ 100 = 0% 0/ 100 = 0% 96.110.42.166
0/ 100 = 0% |
14 42ms 0/ 100 = 0% 0/ 100 = 0% be-34-ar01.lancaster.pa.pitt.comcast.net [69.139.168.142]
0/ 100 = 0% |
15 40ms 0/ 100 = 0% 0/ 100 = 0% 96.110.25.18
0/ 100 = 0% |
16 40ms 0/ 100 = 0% 0/ 100 = 0% 162.151.69.158
0/ 100 = 0% |
17 51ms 0/ 100 = 0% 0/ 100 = 0% c-174-54-64-96.hsd1.pa.comcast.net [174.54.64.96]
0/ 100 = 0% |
18 55ms 0/ 100 = 0% 0/ 100 = 0% mb.nawcc.org [50.244.233.125]

Trace complete.

Tracing route to nawcc.org [67.225.191.105]
over a maximum of 30 hops:
0 McSpectre [192.168.123.55]
1 192.168.123.254
2 22.15.66.1
3 096-034-141-104.biz.spectrum.com [96.34.141.104]
4 be32.crr02oxfrma.netops.charter.com [96.34.80.238]
5 bbr02slidla-bue-2.slid.la.charter.com [96.34.2.156]
6 bbr01blvlil-tge-0-0-0-11.blvl.il.charter.com [96.34.0.137]
7 prr01ashbva-bue-6.ashb.va.charter.com [96.34.3.89]
8 eqix-dc2.liquidweb.com [206.126.238.138]
9 lw-dc3-core1.rtr.liquidweb.com [209.59.157.16]
10 lw-dc3-storm2-po5.rtr.liquidweb.com [69.167.128.137]
11 host.ezsolution.com [67.225.191.105]

Computing statistics for 275 seconds...
Source to Here This Node/Link
Hop RTT Lost/Sent = Pct Lost/Sent = Pct Address
0 McSpectre [192.168.123.55]
0/ 100 = 0% |
1 3ms 0/ 100 = 0% 0/ 100 = 0% 192.168.123.254
0/ 100 = 0% |
2 --- 100/ 100 =100% 100/ 100 =100% 22.15.66.1
0/ 100 = 0% |
3 14ms 0/ 100 = 0% 0/ 100 = 0% 096-034-141-104.biz.spectrum.com [96.34.141.104]
0/ 100 = 0% |
4 19ms 0/ 100 = 0% 0/ 100 = 0% be32.crr02oxfrma.netops.charter.com [96.34.80.238]
0/ 100 = 0% |
5 22ms 0/ 100 = 0% 0/ 100 = 0% bbr02slidla-bue-2.slid.la.charter.com [96.34.2.156]
0/ 100 = 0% |
6 29ms 0/ 100 = 0% 0/ 100 = 0% bbr01blvlil-tge-0-0-0-11.blvl.il.charter.com [96.34.0.137]
0/ 100 = 0% |
7 28ms 0/ 100 = 0% 0/ 100 = 0% prr01ashbva-bue-6.ashb.va.charter.com [96.34.3.89]
0/ 100 = 0% |
8 47ms 0/ 100 = 0% 0/ 100 = 0% eqix-dc2.liquidweb.com [206.126.238.138]
0/ 100 = 0% |
9 54ms 0/ 100 = 0% 0/ 100 = 0% lw-dc3-core1.rtr.liquidweb.com [209.59.157.16]
0/ 100 = 0% |
10 52ms 0/ 100 = 0% 0/ 100 = 0% lw-dc3-storm2-po5.rtr.liquidweb.com [69.167.128.137]
1/ 100 = 1% |
11 51ms 1/ 100 = 1% 0/ 100 = 0% host.ezsolution.com [67.225.191.105]

Trace complete.

Tracing route to nawcc.org [67.225.191.105]
over a maximum of 30 hops:
0 McSpectre [192.168.123.55]
1 192.168.123.254
2 22.15.66.1
3 096-034-141-104.biz.spectrum.com [96.34.141.104]
4 be32.crr02oxfrma.netops.charter.com [96.34.80.238]
5 bbr02slidla-bue-2.slid.la.charter.com [96.34.2.156]
6 bbr01blvlil-tge-0-0-0-11.blvl.il.charter.com [96.34.0.137]
7 prr01ashbva-bue-6.ashb.va.charter.com [96.34.3.89]
8 eqix-dc2.liquidweb.com [206.126.238.138]
9 lw-dc3-core1.rtr.liquidweb.com [209.59.157.16]
10 lw-dc3-storm2-po5.rtr.liquidweb.com [69.167.128.137]
11 host.ezsolution.com [67.225.191.105]

Computing statistics for 275 seconds...
Source to Here This Node/Link
Hop RTT Lost/Sent = Pct Lost/Sent = Pct Address
0 McSpectre [192.168.123.55]
0/ 100 = 0% |
1 3ms 0/ 100 = 0% 0/ 100 = 0% 192.168.123.254
0/ 100 = 0% |
2 --- 100/ 100 =100% 100/ 100 =100% 22.15.66.1
0/ 100 = 0% |
3 14ms 0/ 100 = 0% 0/ 100 = 0% 096-034-141-104.biz.spectrum.com [96.34.141.104]
0/ 100 = 0% |
4 19ms 0/ 100 = 0% 0/ 100 = 0% be32.crr02oxfrma.netops.charter.com [96.34.80.238]
0/ 100 = 0% |
5 22ms 0/ 100 = 0% 0/ 100 = 0% bbr02slidla-bue-2.slid.la.charter.com [96.34.2.156]
0/ 100 = 0% |
6 29ms 0/ 100 = 0% 0/ 100 = 0% bbr01blvlil-tge-0-0-0-11.blvl.il.charter.com [96.34.0.137]
0/ 100 = 0% |
7 28ms 0/ 100 = 0% 0/ 100 = 0% prr01ashbva-bue-6.ashb.va.charter.com [96.34.3.89]
0/ 100 = 0% |
8 47ms 0/ 100 = 0% 0/ 100 = 0% eqix-dc2.liquidweb.com [206.126.238.138]
0/ 100 = 0% |
9 54ms 0/ 100 = 0% 0/ 100 = 0% lw-dc3-core1.rtr.liquidweb.com [209.59.157.16]
0/ 100 = 0% |
10 52ms 0/ 100 = 0% 0/ 100 = 0% lw-dc3-storm2-po5.rtr.liquidweb.com [69.167.128.137]
1/ 100 = 1% |
11 51ms 1/ 100 = 1% 0/ 100 = 0% host.ezsolution.com [67.225.191.105]

Trace complete.

Tracing route to nawcc.org [67.225.191.105]
over a maximum of 30 hops:
0 McSpectre [192.168.123.55]
1 192.168.123.254
2 22.15.66.1
3 096-034-141-104.biz.spectrum.com [96.34.141.104]
4 be32.crr02oxfrma.netops.charter.com [96.34.80.238]
5 bbr02slidla-bue-2.slid.la.charter.com [96.34.2.156]
6 bbr01blvlil-tge-0-0-0-11.blvl.il.charter.com [96.34.0.137]
7 prr01ashbva-bue-6.ashb.va.charter.com [96.34.3.89]
8 eqix-dc2.liquidweb.com [206.126.238.138]
9 lw-dc3-core1.rtr.liquidweb.com [209.59.157.16]
10 lw-dc3-storm2-po5.rtr.liquidweb.com [69.167.128.137]
11 host.ezsolution.com [67.225.191.105]

Computing statistics for 275 seconds...
Source to Here This Node/Link
Hop RTT Lost/Sent = Pct Lost/Sent = Pct Address
0 McSpectre [192.168.123.55]
0/ 100 = 0% |
1 3ms 0/ 100 = 0% 0/ 100 = 0% 192.168.123.254
0/ 100 = 0% |
2 --- 100/ 100 =100% 100/ 100 =100% 22.15.66.1
0/ 100 = 0% |
3 14ms 0/ 100 = 0% 0/ 100 = 0% 096-034-141-104.biz.spectrum.com [96.34.141.104]
0/ 100 = 0% |
4 19ms 0/ 100 = 0% 0/ 100 = 0% be32.crr02oxfrma.netops.charter.com [96.34.80.238]
0/ 100 = 0% |
5 22ms 0/ 100 = 0% 0/ 100 = 0% bbr02slidla-bue-2.slid.la.charter.com [96.34.2.156]
0/ 100 = 0% |
6 29ms 0/ 100 = 0% 0/ 100 = 0% bbr01blvlil-tge-0-0-0-11.blvl.il.charter.com [96.34.0.137]
0/ 100 = 0% |
7 28ms 0/ 100 = 0% 0/ 100 = 0% prr01ashbva-bue-6.ashb.va.charter.com [96.34.3.89]
0/ 100 = 0% |
8 47ms 0/ 100 = 0% 0/ 100 = 0% eqix-dc2.liquidweb.com [206.126.238.138]
0/ 100 = 0% |
9 54ms 0/ 100 = 0% 0/ 100 = 0% lw-dc3-core1.rtr.liquidweb.com [209.59.157.16]
0/ 100 = 0% |
10 52ms 0/ 100 = 0% 0/ 100 = 0% lw-dc3-storm2-po5.rtr.liquidweb.com [69.167.128.137]
1/ 100 = 1% |
11 51ms 1/ 100 = 1% 0/ 100 = 0% host.ezsolution.com [67.225.191.105]
 

Dick C

Registered User
Oct 14, 2009
2,037
132
63
Country
I captured what John Matthews suggested.

Here is the Pathping output when the system was running correctly at 1:42 pm. It is in 3 pages and shows no errors that I can find:

Pathping Jan 25 2022 1 42 pm page 1.jpg Pathping Jan 25 2022 1 42 pm page 2.jpg Pathping Jan 25 2022 1 42 pm page 3.jpg

Between 2:12 and 2:13 pm the system became inoperable. At 2:13 pm I started the Pathping. Notice that it stopped after 25 nodes. I do not know why it would. I attempted it twice with the same results before moving on. Here is the output:

Pathping Jan 25 2022 2 13 pm page 1.jpg Pathping Jan 25 2022 2 13 pm page 2.jpg Pathping Jan 25 2022 2 13 pm page 3.jpg

Around 2:15 pm, the system still not available, I started up a new cmd.exe file and captured the full Pathping as follows:

Pathping Jan 25 2022 2 15 pm page 1.jpg Pathping Jan 25 2022 2 15 pm page 2.jpg Pathping Jan 25 2022 2 15 pm page 3.jpg

Between 2:29 and 2:30 pm the system was operational.
 

gmorse

NAWCC Member
Jan 7, 2011
13,712
2,882
113
Breamore, Hampshire, UK
Country
Region
Hi Dick,
Then shouldn't I be looking further into the OS or the MB software to see what is going on if Ping can reach the system
Ping is handled by layer 3 of TCP/IP, the Network Layer, it doesn't need to get to layer 4, the Application Layer.

Regards,

Graham
 

Dick C

Registered User
Oct 14, 2009
2,037
132
63
Country
Hi Dick,


Ping is handled by layer 3 of TCP/IP, the Network Layer, it doesn't need to get to layer 4, the Application Layer.

Regards,

Graham
I cannot get my hands around this....if ping gets to layer 3 then my assumption is that the network is running ok; thus, something in the application layer or above or the application or operating system is not responding. Is this correct?
 

gmorse

NAWCC Member
Jan 7, 2011
13,712
2,882
113
Breamore, Hampshire, UK
Country
Region
Hi Dick,
if ping gets to layer 3 then my assumption is that the network is running ok; thus, something in the application layer or above or the application or operating system is not responding. Is this correct?
Ping uses a very simple protocol, and yes, it means that it can communicate using that method, but that isn't how useful data transfers happen, so Dave's comment is valid; the problem could still be internal or external to our servers. Also, as John M mentioned, the default packet length for the ping command is only 32 bytes.

Regards,

Graham
 

Dick C

Registered User
Oct 14, 2009
2,037
132
63
Country
Hi Dick,


Ping uses a very simple protocol, and yes, it means that it can communicate using that method, but that isn't how useful data transfers happen, so Dave's comment is valid; the problem could still be internal or external to our servers. Also, as John M mentioned, the default packet length for the ping command is only 32 bytes.

Regards,

Graham
thanks...I am going away....if and when someone figures it out please post it in this thread....

Regards,
Dick
 

Tom McIntyre

Technical Admin
Staff member
NAWCC Star Fellow
NAWCC Ruby Member
Sponsor
Golden Circle
Aug 24, 2000
84,737
2,385
113
85
Boston
awco.org
Country
Region
I have spent quite a bit of time in Columbia and have wroked through three different provider setups. We currently have a T1 connection to our facility that was a big step up when we installed it.

A year or two later Columbia was his by very sever lighning storms and the local facility was severely damaged. It took a long time to get it all back up and my impression was that ouor cennection was rebuilt from scrap parts.

If you look at the information posted you wil see lots of references to Lancaster but few if any to Columbia.

When they install our promised T3 link from the Lancaster hub we may have adequate reliability and will certainly have more bandwidth than we need.

However, we will still have a modest machine room on the second floor of the museum and very little resources to monitor and maintain it. If anythng goes awry with our computers we have troubles. If anything goes wrong with our system software, we have troubles. I would have preferred to move our public serving systems off of uor campus to a managed facility where keeping it al tuned was the primary responsibility,

I am hopeful that our new wires will solve our problems, but I continue to be nervous.
 

Tom McIntyre

Technical Admin
Staff member
NAWCC Star Fellow
NAWCC Ruby Member
Sponsor
Golden Circle
Aug 24, 2000
84,737
2,385
113
85
Boston
awco.org
Country
Region
I did dig into the back end and get some information onthe overall performance of the system from December 2021.

I am not sure what we can learn from this.

1643153169157.png
1643152830022.png
 

Dick C

Registered User
Oct 14, 2009
2,037
132
63
Country
I have spent quite a bit of time in Columbia and have wroked through three different provider setups. We currently have a T1 connection to our facility that was a big step up when we installed it.

A year or two later Columbia was his by very sever lighning storms and the local facility was severely damaged. It took a long time to get it all back up and my impression was that ouor cennection was rebuilt from scrap parts.

If you look at the information posted you wil see lots of references to Lancaster but few if any to Columbia.

When they install our promised T3 link from the Lancaster hub we may have adequate reliability and will certainly have more bandwidth than we need.

However, we will still have a modest machine room on the second floor of the museum and very little resources to monitor and maintain it. If anythng goes awry with our computers we have troubles. If anything goes wrong with our system software, we have troubles. I would have preferred to move our public serving systems off of uor campus to a managed facility where keeping it al tuned was the primary responsibility,

I am hopeful that our new wires will solve our problems, but I continue to be nervous.
Sorry, I had to answer...if/when you get the T3 installed and you then find that the performance issues stay the same then what?

There has to be some link to the specific time periods we are seeing where the system goes away and until the server OS and the MB software are eliminated as possible causes then it will always be suspect. When are the MB data bases backed up and does Xenforo lock the system during the backup? What is the impact of memory, buffer allocation, etc.? Is there any way to monitor the software during the outage period to see what is going on in the system? What tasks are running during the outage period? And so on... Is the server logging statistics on the operation thereof?

If someone has the information that all of this and others have discounted then please let us know and we will not continue down this path.

Regards,
Dick
 
  • Like
Reactions: gmorse

Dick C

Registered User
Oct 14, 2009
2,037
132
63
Country
I did dig into the back end and get some information onthe overall performance of the system from December 2021.

I am not sure what we can learn from this.

View attachment 692176
View attachment 692175
Performance related info during the outage period of 12 to 15 minutes or so would be helpful. If this cannot be measured in the outage period then some software should be purchased/rented to do it. You may find that the T3 upgrade, although nice, may not be needed at this time. What is so special about this outage time period?

***IMPORTANT - Another question that may be pertinent. Assuming that the administrative systems in Columbia have access to the message board, what is their connection and do they lose access to the MB during this outage period?
 

Bill Stuntz

Registered User
Apr 6, 2012
4,958
45
48
73
Columbus. OH
Country
Region
About 10 minutes ago (20:22 GMT) I was unable to access these forums; it was unavailable for at least 5 minutes. NAWCC.org opened for me.
Between 2:12 and 2:13 pm the system became inoperable.
My site monitor (uptimerobot) shows the MB down from 3:17 to 3:30 EST. It's a free monitor and only checks 5 minute itervals. The paid version would check 1 minute intervals, but I can't imagine how that would help track down the problem.
 

John Matthews

NAWCC Member
Sep 22, 2015
3,703
1,829
113
France
Country
Region
From the data provided by Tom, Dick and my own logging, the point in the network where routes converge en route to the forum server, is at the IP 69.139.168.142 which is a Comcast router located in Pennsylvania in the vicinity of Lancaster.

I log the network activity hourly on the hour. When the response is good the RRT (round trip time - pc-server-pc) is 130 milliseconds. this will vary from user to user, for Tom in Massachusetts 55ms, Dick in Texas 80 ms. When the response is slow the RRT time will increase. If at a point on the network there is severe congestion then the pathping utility will terminate prematurely, as happened to Dick. The final output from pathping is obtained by sending packets specifically to the individual nodes progressively along the route - so you may find the time at say node 12 is smaller than node 10. In Dick's second trace the utility timed out attempting to send and receive packets from node 25 (IP 96.110.32.132). This is a Comcast route in Georgia and because it is also a node on Tom's route I infer this may be a significant hub in the Comcast network. The timeout may have occurred due to congestion at any point on the network to node 25, but it was most likely due to congestion on the line between nodes 24 and 25 or the router at 25 being overloaded.

In post #76 Jerry indicated that he was suffering problems 20:20 GMT. I have a log of pathping that finished at 20:10 ...

1643151920973.png

As can be seen the RRT has increased from 130ms to 181ms and the trace is showing lost packets in the critical part of the network which from the evidence we have, is common to all users trying to access the server from off-site locations. The information shown for node 17 records an RTT of 182ms then 4/100 = 4% which indicates that 4% of the data packets sent directly to this node were lost. The value "0/100 = 0%" shows that no packets that passed through the hop were discarded. The line below indicates that on the path to the next hop (IP address: mb.nawcc.org), 6% of the data packets were discarded. Of the packets sent to the server (node 18) 10% were lost - 100 packets were set and only 90 arrived and were acknowledged, hence 10 had to be resent - this results in an 11% increase in network traffic to serve any request to the server. The data for nodes 13,15, and 16 show that these routers discarded 2%, 3% and 1%, respectively of the packets that should have been forwarded to the following notes in the network

We are looking into this. Unfortunately, there is not a simple, obvious answer.
Dave I continue to be mystified why this is so difficult to analyse.

How do the staff working at headquarters access the forums? I assume their interaction can be across the LAN (the in-house network) and they do not have to use the Comcast network? If so, have they been explicitly asked to use the forum during the periods when remote response is degraded and report their experience?
and in reference to pathping ...

Someone in headquarters should use it - this will check that they are not being directed out to the Comcast network and back in again. I have known it happen!
The results from these observations and monitoring would be so helpful. If there is no degradation of service when the external network is excluded from the path, this will provide the data necessary to dispel the view that it is the server and the in-house network that are the problem. It will not of course rule out the on-site hardware that links to the Comcast network.

John
 
Last edited:

Dick C

Registered User
Oct 14, 2009
2,037
132
63
Country
My site monitor (uptimerobot) shows the MB down from 3:17 to 3:30 EST. It's a free monitor and only checks 5 minute itervals. The paid version would check 1 minute intervals, but I can't imagine how that would help track down the problem.
Thanks,.....my unpaid version is is the poor version, me....I attempted to reload the MB every minute from 2:04 until 2:30 and this is how I tracked the outage shown in post 78.

I do love problems!
 

Tom McIntyre

Technical Admin
Staff member
NAWCC Star Fellow
NAWCC Ruby Member
Sponsor
Golden Circle
Aug 24, 2000
84,737
2,385
113
85
Boston
awco.org
Country
Region
The issue is not the backup schedule. We use a barracuda backup system which back up in real time

We may stll be backing up the iMIS system to coordinate with the financial batch management. I do not know that answer.

Please do not interpret my comments on the new connection to imply that I think it will solve the problems. I am only willing to concede it is possible because Comcast sales has said it will.

As I have said several times now, I think it should all be moved to a better service area of which there are a least 1 or 2 hundred. The software we are running is much more than adequate. It operates other discussion sites with over 100 times more activity than ours.
 

Bill Stuntz

Registered User
Apr 6, 2012
4,958
45
48
73
Columbus. OH
Country
Region
Thanks,.....my unpaid version is is the poor version, me....I attempted to reload the MB every minute from 2:04 until 2:30 and this is how I tracked the outage shown in post 78.

I do love problems!
Hi Dick. I have no idea exactly what times uptimerobot checks for down, or how often it checks for "is it back up yet?" I only included that to give an idea of the max resolution/precision of the up/down times I expect to see. Note that your 2:12cst & uptime robot's 3:17est are only 5 minutes apart. Obviously, we both detected the same outage, within the expected 5 minute resolution. Also, note that the 3:30est up time isn't a multiple of 5 minutes from the time it went down by either your or my down detections. It's close enough to give us some ideas. Nothing more.

Tom McIntyre How confident are we that the new optical connection will actually avoid the trouble spots in the route? If the faster connection doesn't route around the trouble spots, it might not correct the problem.
 

Dick C

Registered User
Oct 14, 2009
2,037
132
63
Country
As I have said several times now, I think it should all be moved to a better service area of which there are a least 1 or 2 hundred. The software we are running is much more than adequate. It operates other discussion sites with over 100 times more activity than ours.
However, the hardware/OS configurations are different than what you have described as less than optimum.

Can you get someone in Columbia to attempt to reach the MB when the outage occurs?
 
Last edited:

John Matthews

NAWCC Member
Sep 22, 2015
3,703
1,829
113
France
Country
Region
We use a barracuda backup system which back up in real time
I have just been reading about Barracuda, for the first time.

So it is possible that I have missed something or that the NAWCC implementation is different from that described in the documentation I have just read.

Who is familiar with the way it has been implemented? Is the backup device on-site or off-site?

From my first pass of the documentation, it appears that it is little more than an incremental backup system with a real time agent on the server. There is still a scheduled backup when data is transferred to the backup device which consumes system resources.

Is this the the way the backup of mb.nawcc.org has been implemented?

After the initial full backup when all targeted files are transferred to the backup device, a background agent continually runs on the server and identifies changes to any targeted file. This information is stored on the server in a Barracuda database. The agent also performs 'deduplication' and compression to ensure that only unique compressed data blocks, those identified in its database, will be transferred to the backup device according to a backup schedule.

As to the resource consumption

The Barracuda Backup Agent runs as a service on the client system that it is protecting. The resources consumed by the Barracuda Backup Agent on the client system it is installed on are minimal. The amount of memory (RAM) consumed should be no more than 1-2 GB maximum and, in most cases, will never exceed 1 GB.
During a backup window, CPU and disk I/O usage will increase. It is recommended that you schedule backup windows during non-peak hours or during a less busy time for the client system to reduce the impact on normal operations, especially on older systems or systems with heavy I/O usage.
The Barracuda Backup Agent requires some hard disk storage for the tracking database. The size of the database is dependent on the number of files being protected on the client system. As the number of files increases, so does the size of the agent tracking database. Barracuda recommends 1 GB of disk space for every one million files protected.

So the question asked as to the times of the backup schedule is relevant if this is the way the Barracuda backup for the forum server has been implemented

John
 
  • Like
Reactions: gmorse

John Matthews

NAWCC Member
Sep 22, 2015
3,703
1,829
113
France
Country
Region
I should make the point that although it is essential to check all NAWCC on-site system activities, such as the backup schedules for all the servers, particularly NAWCC is using off-site backup devices, I do think that the connection to, and performance of, the Comcast network, in the vicinity of headquarters is the most likely cause of the degraded service to remote users.

In this context I note that Comcast appear to have tested a different routing on 21 January at around 02:00 EST. It was transient, only in use for < 2hrs. I thought it was to perform routine maintenance and the old route was restored from 04:00 EST. However, last night the route that was tested on the 21st, was reinstated from 23:00 EST. Since then there have been a couple of short periods when the forum server has not been reachable but the new route was in place until 07:00 EST with few errors. In the last hour it has reverted back to the old routing.

There could be a number of reasons for this extended period of re-routing. It is possible that it was to fix a problem perhaps faulty hardware has been replaced. It will be interesting to see if the degradation in service has been fixed.

John
 

Dick C

Registered User
Oct 14, 2009
2,037
132
63
Country
I should make the point that although it is essential to check all NAWCC on-site system activities, such as the backup schedules for all the servers, particularly NAWCC is using off-site backup devices, I do think that the connection to, and performance of, the Comcast network, in the vicinity of headquarters is the most likely cause of the degraded service to remote users.

In this context I note that Comcast appear to have tested a different routing on 21 January at around 02:00 EST. It was transient, only in use for < 2hrs. I thought it was to perform routine maintenance and the old route was restored from 04:00 EST. However, last night the route that was tested on the 21st, was reinstated from 23:00 EST. Since then there have been a couple of short periods when the forum server has not been reachable but the new route was in place until 07:00 EST with few errors. In the last hour it has reverted back to the old routing.

There could be a number of reasons for this extended period of re-routing. It is possible that it was to fix a problem perhaps faulty hardware has been replaced. It will be interesting to see if the degradation in service has been fixed.

John
John, Thank you.

I will attempt to follow the possible outage this afternoon and am awaiting Tom's possible checking to see what else is going on in the system during the outage period.

Is it also an extreme possibility that malware has infected the server and it takes control of the system at that time pushing data out of the system via the network? When it is finished it releases the lock on the system. I assume that someone has checked for it.
 

Tom McIntyre

Technical Admin
Staff member
NAWCC Star Fellow
NAWCC Ruby Member
Sponsor
Golden Circle
Aug 24, 2000
84,737
2,385
113
85
Boston
awco.org
Country
Region

Dick C

Registered User
Oct 14, 2009
2,037
132
63
Country
I bit my tongue hard on this one!

Start of the outage: 2:13pm Texas time. End of the Outage: 2:35pm. Yesterdays started at the same time; yet finished about 2:30pm

I started PATHPING -h 35 mb.nawcc.org at the beginning of the outage. The results showed 0 packets lost across all of the links.

I did have to start the capture twice, the first stopping at 25 nodes and I did not let it finish. The second and error free one was the full 32 nodes.

If anyone is interested in looking at the detail, send me a private message and I will take the time to capture the screen images and send them to you.
 

Dave Coatsworth

Senior Administrator
Staff member
NAWCC Business
NAWCC Fellow
Sponsor
Feb 11, 2005
8,711
3,451
113
62
Camarillo, CA
www.daveswatchparts.com
Country
Region
I can confirm Dick's outage times.

I also confirmed that, during this period, other servers in Columbia were functioning normally - namely those that handle iMIS and the Gift Shop.

Unfortunately, the IT staff, who was watching mb.nawcc.org during the outage, reports nothing out of the ordinary. Nothing in the error logs, machine was up the whole time.
 

Dick C

Registered User
Oct 14, 2009
2,037
132
63
Country
I can confirm Dick's outage times.

I also confirmed that, during this period, other servers in Columbia were functioning normally - namely those that handle iMIS and the Gift Shop.

Unfortunately, the IT staff, who was watching mb.nawcc.org during the outage, reports nothing out of the ordinary. Nothing in the error logs, machine was up the whole time.
Did they look at the tasks that were running on the system, the cpu usage, the disk i/o, etc.? The machine could be running; however, running a task that was eating all of the resources. Is there a task running that is scheduled for between 2:12 and 2:13 pm which is when all of this outage occurs?

By stating that the machine was up the whole time do you mean that the Columbia staff were able to reach mb.nawcc.org and they did not experience the outage. Is their connectivity via the LAN? If they were running on the LAN connected to that server and no external network was involved does that rule out the network being the issue (unless the network activity is so high, the system couldn't respond)?

Do the IMIS and the Gift Shop applications run on the same server as the Message Board?
 
Last edited:

Forum statistics

Threads
173,673
Messages
1,516,532
Members
51,879
Latest member
Nauman
Encyclopedia Pages
1,062
Total wiki contributions
2,969
Last update
-