Konebhar6 Posted September 18 Report Posted September 18 On 9/11/2025 at 6:53 PM, jefferson1 said: jenkins is sending file at the right time...issue is after that only Got it — Jenkins timing is fine, so focus on the fan-out from NFS → each Tomcat. Here’s a tight troubleshooting playbook + an easy way to see exactly which servers picked up which build. 0) Establish a single “build id” If you already embed a build number/commit in the WAR (e.g., Implementation-Version in MANIFEST.MF or a /version.txt), great. If not, compute one from the WAR on NFS right now: # On NFS share SRC=/nfs/releases/myapp.war sha256sum "$SRC" # => note the checksum; call it SRC_SHA stat -c "SRC_MTIME=%y SRC_SIZE=%s" "$SRC" 1) Quick triage on a “slow” server vs a “fast” server Run these on two contrasting servers to compare: APP=myapp WAR=/opt/tomcat/webapps/${APP}.war EXPL=/opt/tomcat/webapps/${APP} # exploded dir, if present echo "---- Mount options" nfsstat -m | sed -n '1,200p' echo "---- WAR on server" ls -lh --time-style=full-iso "$WAR" sha256sum "$WAR" || true echo "---- Exploded dir (stale code culprit if left behind)" ls -ld --time-style=full-iso "$EXPL" || echo "(no exploded dir)" echo "---- Tomcat deploy logs around 10:00" grep -iE "deploy|undeploy|reload|war|$APP" /opt/tomcat/logs/catalina.out | tail -n 100 What you’re looking for sha256sum of WAR on host should match SRC_SHA. If it doesn’t: you copied an older WAR or the copy began before the file settled. If EXPL exists and is older than the WAR, Tomcat may still be serving stale classes/JSPs. nfsstat -m differences: “slow” servers may show large attribute cache values (e.g., acregmax, acdirmax) → they “see” new files late. 2) Verify NFS visibility & attribute caching On a “slow” server, check if NFS is delaying what it sees: MOUNT=/nfs/releases # adjust to your mount FILE=$MOUNT/myapp.war echo "Client sees:" stat -c "MTIME=%y SIZE=%s" "$FILE" nfsstat -m | grep -A2 "$MOUNT" mount | grep "$MOUNT" If you see options like actimeo=600 (10 minutes), acdirmax=600, or unusually high acregmax, that explains the ~10-minute lag. For a quick experiment (not a permanent fix), remount one “slow” host with smaller caches: # Example: lower attribute cache to 1s (test on one host) sudo mount -o remount,actimeo=1 <nfs-server>:/export/releases /nfs/releases (Permanent fix = fstab or consistent automount map. Don’t use noac unless desperate.) 3) Prove/rule-out “copy while still writing” Even if Jenkins started at 10:00, your consumer may copy before the WAR is fully visible or stable. On a slow host, watch the NFS file size for 30–60s: watch -n1 stat -c "SIZE=%s MTIME=%y" /nfs/releases/myapp.war If size changes after your script started copying, you have a race. (Fix: gate on a “ready” marker or check for stable size before copying.) 4) Tomcat serving old code (common!) Two quick checks/fixes on any server where devs see old code: # See what Tomcat thinks is deployed grep -i "$APP" /opt/tomcat/logs/catalina.out | tail -n 100 # Nuke stale exploded dir and restart (safe test) sudo systemctl stop tomcat sudo rm -rf "/opt/tomcat/webapps/$APP" sudo rm -f "/opt/tomcat/webapps/$APP.war" sudo cp /nfs/releases/myapp.war "/opt/tomcat/webapps/$APP.war" sudo systemctl start tomcat If that fixes it, your normal flow wasn’t removing the exploded directory or Tomcat didn’t redeploy cleanly. 5) Exactly which servers got the new build? Pick one (or both) of these—both are simple and definitive. A) Read the version (or checksum) from each server If you expose a version endpoint/file (recommended): # If your app serves /version or /version.txt: for h in $(cat servers.txt); do echo -n "$h: " curl -sf "http://$h:8080/$APP/version.txt" || echo "(no version endpoint)" done If you don’t have an endpoint, compare WAR checksums: SRC_SHA=$(sha256sum /nfs/releases/myapp.war | awk '{print $1}') for h in $(cat servers.txt); do echo -n "$h: " ssh "$h" "sha256sum /opt/tomcat/webapps/${APP}.war 2>/dev/null | awk '{print \$1}'" | \ awk -v s="$SRC_SHA" '{print ($1==s) ? "OK (new WAR)" : "MISMATCH (old WAR)"}' done Create a one-line “deployment ledger” on NFS Add this to the end of your copy script on each server: # after copying and before/after Tomcat restart: SRC_SHA=$(sha256sum /nfs/releases/myapp.war | awk '{print $1}') HOST=$(hostname -f) NOW=$(date -Is) { flock -x 9 echo "$NOW $HOST $SRC_SHA copied_ok" } 9>>/nfs/releases/deploy-ledger.log Now you can tail /nfs/releases/deploy-ledger.log to see who deployed which checksum and when. 6) What usually causes your 10-minute lag + “old code” NFS attribute caching inconsistency across servers (some mounts use bigger ac* values). Copy started before file was stable/visible on NFS (race). Tomcat still serving an existing exploded directory while a new WAR sits next to it. Same WAR filename, unchanged mtime → Tomcat decides “nothing to do”. 7) Minimal hardening you can add today (no big redesign) Gate the copy: wait for a .ready marker (or stable size for 5–10s) before copying from NFS. Atomic local replace: copy to app.war.tmp then mv to app.war. Always remove exploded dir before restart or use versioned WAR names (myapp##2025-09-17-1000.war) so Tomcat does a clean parallel deploy. Normalize NFS mount opts across all servers (nfsstat -m should look the same everywhere). If you paste a snippet of nfsstat -m from a “fast” and a “slow” server (and one server’s catalina.out around 10:00), I’ll pinpoint whether it’s caching vs Tomcat redeploy. Quote
Konebhar6 Posted September 18 Report Posted September 18 39 minutes ago, jefferson1 said: tomcat is exploding the war file correctly, Server paina code kanipisthundi also on browser when we run the test, we are not seeing the kotha code changes Sounds like deploy is fine on at least one box, but your tests are hitting an old node or a cache. Here’s a tight plan to prove it and to see exactly which server/version each request hits. Quick proof (5–10 min) Expose the release ID in every response. Add a tiny servlet filter (or Spring OncePerRequestFilter) that sets a header from your WAR’s MANIFEST.MF: // javax.servlet.Filter public class ReleaseHeaderFilter implements Filter { private String releaseId; public void init(FilterConfig cfg){ try (java.io.InputStream in = getClass().getClassLoader().getResourceAsStream("META-INF/MANIFEST.MF")) { java.util.jar.Manifest mf = new java.util.jar.Manifest(in); releaseId = mf.getMainAttributes().getValue("Implementation-Version"); } catch (Exception e) { releaseId = "unknown"; } } public void doFilter(ServletRequest req, ServletResponse res, FilterChain chain) throws java.io.IOException, javax.servlet.ServletException { ((javax.servlet.http.HttpServletResponse)res).setHeader("X-Release-Id", releaseId); chain.doFilter(req, res); } } (Or serve /version.txt with the same value.) Have the load balancer add a backend header. Nginx example: *****_set_header X-Forwarded-For $*****_add_x_forwarded_for; add_header X-Served-By $upstream_addr always; (HA*****: http-response add-header X-Served-By %[srv_addr]) Sample the VIP and see who’s stale. URL="https://app.example.com/health" # any endpoint served by your app for i in {1..40}; do curl -sI "$URL" | awk '/X-Release-Id|X-Served-By|Date/' done | sort | uniq -c You should see something like: 20 X-Release-Id: 1.2.3-abcd X-Served-By: 10.0.0.12:8080 20 X-Release-Id: 1.2.2-zzzz X-Served-By: 10.0.0.13:8080 <-- stale node Hit each server directly to confirm. # Bypass LB: pin the VIP host to each node IP in turn curl -sI --resolve app.example.com:443:10.0.0.12 https://app.example.com/ curl -sI --resolve app.example.com:443:10.0.0.13 https://app.example.com/ Now you know which host(s) still serve the old build. Why your tests don’t see the new code (common culprits) LB stickiness / uneven rollout: test traffic pinned to an old pool member. Cache in front (CDN/*****): stale objects; check Age/Cache-Control headers. DNS caching on the test runner: pointing at an old VIP/DC. Multiple context paths/versions: tests call a path that maps to an older context. Long client/server cache for HTML/JS: tests reusing cached UI; ensure HTML has Cache-Control: no-cache and assets are fingerprinted. Fixes you can apply now Drain/refresh the stale node(s): remove from pool, redeploy, verify headers, re-add. Gate tests on rollout completion: only start tests after every node reports the expected X-Release-Id. Simple gate: EXPECTED="1.2.3-abcd" NODES=(10.0.0.12 10.0.0.13 10.0.0.14) until ok=1; do for ip in "${NODES[@]}"; do r=$(curl -sI --resolve app.example.com:443:$ip https://app.example.com/ | awk '/X-Release-Id/{print $2}') [[ "$r" == "$EXPECTED" ]] || ok=0 done [[ $ok -eq 1 ]] || { echo "waiting..."; sleep 5; } done echo "All nodes ready." Purge/bust caches: for HTML, set Cache-Control: no-cache, no-store, must-revalidate; for assets, use hashed filenames. Normalize Tomcat & LB config: same context, same autoDeploy settings, same LB stickiness across nodes. Add a deployment ledger: append HOST, DATE, X-Release-Id to a shared log from each node after successful start—your single source of truth. To your question: “since the code is on several servers, how to identify which server it went to?” Use one (or both): Runtime headers: X-Release-Id (app) + X-Served-By (LB). Your tests log these per request; the mapping is immediate. Out-of-band check: SSH and compare the WAR checksum on each node to the release checksum on NFS: SRC_SHA=$(sha256sum /nfs/releases/myapp.war | awk '{print $1}') for h in $(cat servers.txt); do echo -n "$h "; ssh "$h" "sha256sum /opt/tomcat/webapps/myapp.war 2>/dev/null | awk '{print \$1}'" done | awk -v s="$SRC_SHA" '{print $0, ($2==s)?"OK":"MISMATCH"}' Quote
Konebhar6 Posted September 18 Report Posted September 18 8 minutes ago, jefferson1 said: Every thing is asking to restart tomcat bro. with out restarting the tomcat, i am trying to see what i can do server paina other code undi, as this will affect other also Here's what chatGPT suggests without restarting server... Below is a fast, no-Tomcat-restart checklist you can run right now to isolate the problem. It focuses on: (1) which backend your tests are hitting, (2) whether that backend has the new bits, and (3) whether a cache/sticky session is masking the change. A) Prove which server your tests are hitting (no restart) From the test runner: URL="https://app.example.com/your-endpoint" # any endpoint your test calls # 1) Bypass stickiness & sample different backends for i in {1..20}; do curl -skI -H 'Cache-Control: no-cache' -H 'Pragma: no-cache' "${URL}?cb=$(date +%s%N)" \ | awk '/Date:|Server:|X-Served-By|Set-Cookie|Age:/' done | sort | uniq -c Look for a header like X-Served-By (if your LB sets it). If you don’t have it, force a specific backend (no restart needed): # Replace IPs with your backend nodes curl -skI --resolve app.example.com:443:10.0.0.12 "${URL}?cb=1" curl -skI --resolve app.example.com:443:10.0.0.13 "${URL}?cb=1" This tells you exactly which node serves which response. On each Tomcat, check if the test runner actually hit it: # If access logs are enabled (common default) grep -E 'TEST_RUNNER_IP|/your-endpoint' $CATALINA_BASE/logs/localhost_access_log.*.txt | tail Verify the version on each server (no restart) On each backend node: APP=myapp WAR="$CATALINA_BASE/webapps/$APP.war" EXP="$CATALINA_BASE/webapps/$APP" echo "== $(hostname -f)" # 1) Extract version from WAR MANIFEST (no deploy/restart needed) unzip -p "$WAR" META-INF/MANIFEST.MF | grep -Ei 'Implementation-Version|Build|Commit' || true # 2) Check the exploded app’s MANIFEST (what Tomcat is actually serving for JSP/static) grep -Ei 'Implementation-Version|Build|Commit' "$EXP/META-INF/MANIFEST.MF" 2>/dev/null || true # 3) Compare checksums to the canonical artifact on NFS (adjust path) sha256sum "$WAR" If any node shows an older version or checksum than your “source of truth” on NFS, that’s the stale node. C) Eliminate cache/stickiness (client/LB/CDN) without restarting Tomcat Bypass client & ***** caches: Add headers in your test call: Cache-Control: no-cache, no-store, must-revalidate and a unique query string (?cb=timestamp). Inspect Age: header in responses; if it’s >0, a *****/CDN is serving stale content. Avoid sticky sessions for the test: Don’t send Cookie: JSESSIONID=... (most LBs pin on it). With curl, don’t reuse -c/-b cookie jar; or explicitly send an empty cookie header: -H 'Cookie:'. If your LB uses a route cookie (e.g., SRV_ID), clear it too. Purge Tomcat’s compiled JSP cache for just this app (safe while running): # This does NOT restart Tomcat. It forces JSP recompile on next hit. rm -rf "$CATALINA_BASE/work/Catalina/localhost/$APP" If tests now see the new UI/views while the WAR/classes were already new, it was JSP cache. D) Reload just the webapp (not the server) if needed Still stale on a node but you can’t restart the service? Use Tomcat Manager to reload the context only: # Manager text API (adjust creds, host, and path) curl -s -u admin:password "http://localhost:8080/manager/text/reload?path=/$APP" This does not restart Tomcat, only your app, and often fixes a half-updated context. If autoDeploy=true and the WAR has the same name, you can also “touch” it to trigger redeploy without a server restart: touch "$CATALINA_BASE/webapps/$APP.war" (That’s still an app redeploy, not a Tomcat restart.) E) Confirm LB/DNS aren’t pointing tests elsewhere From the test runner: # Which VIP/IP is actually used? dig +short app.example.com # What route did TLS pick (SNI/resolve)? curl -skI --trace-ascii - "${URL}" | head -n 40 Sometimes CI runners use a different DNS view/VIP than your browser. F) One-shot script to summarize each node (run from a jump host) NODES=(app1 app2 app3); APP=myapp for h in "${NODES[@]}"; do echo "==== $h ====" ssh "$h" " set -e WAR=\$CATALINA_BASE/webapps/$APP.war EXP=\$CATALINA_BASE/webapps/$APP echo 'WAR:'; unzip -p \"\$WAR\" META-INF/MANIFEST.MF | egrep -i 'Implementation-Version|Build|Commit' || true echo 'EXP:'; egrep -i 'Implementation-Version|Build|Commit' \"\$EXP/META-INF/MANIFEST.MF\" 2>/dev/null || true echo 'SHA:'; sha256sum \"\$WAR\" | awk '{print \$1}' echo 'JSP cache size:'; du -sh \"\$CATALINA_BASE/work/Catalina/localhost/$APP\" 2>/dev/null || echo '(none)' " done What typically fixes “browser shows new code, tests show old” without restarting Tomcat Tests were pinned to a stale node → identify with --resolve or headers and reload that app via Manager (or touch WAR). *****/CDN cache in front → call with no-cache headers + cache-busting param; purge if necessary. JSP/work cache stale → delete $CATALINA_BASE/work/Catalina/localhost/<app>; hit endpoint again. Same-name WAR + autoDeploy off → Tomcat won’t redeploy automatically; do a context reload via Manager. If you paste: one test response header block, and the output of the mini node-summary script above for two nodes, I can point to the exact culprit and next command to run—still no Tomcat restart needed. Quote
jefferson1 Posted September 18 Author Report Posted September 18 47 minutes ago, Konebhar6 said: Here's what chatGPT suggests without restarting server... Below is a fast, no-Tomcat-restart checklist you can run right now to isolate the problem. It focuses on: (1) which backend your tests are hitting, (2) whether that backend has the new bits, and (3) whether a cache/sticky session is masking the change. A) Prove which server your tests are hitting (no restart) From the test runner: URL="https://app.example.com/your-endpoint" # any endpoint your test calls # 1) Bypass stickiness & sample different backends for i in {1..20}; do curl -skI -H 'Cache-Control: no-cache' -H 'Pragma: no-cache' "${URL}?cb=$(date +%s%N)" \ | awk '/Date:|Server:|X-Served-By|Set-Cookie|Age:/' done | sort | uniq -c Look for a header like X-Served-By (if your LB sets it). If you don’t have it, force a specific backend (no restart needed): # Replace IPs with your backend nodes curl -skI --resolve app.example.com:443:10.0.0.12 "${URL}?cb=1" curl -skI --resolve app.example.com:443:10.0.0.13 "${URL}?cb=1" This tells you exactly which node serves which response. On each Tomcat, check if the test runner actually hit it: # If access logs are enabled (common default) grep -E 'TEST_RUNNER_IP|/your-endpoint' $CATALINA_BASE/logs/localhost_access_log.*.txt | tail Verify the version on each server (no restart) On each backend node: APP=myapp WAR="$CATALINA_BASE/webapps/$APP.war" EXP="$CATALINA_BASE/webapps/$APP" echo "== $(hostname -f)" # 1) Extract version from WAR MANIFEST (no deploy/restart needed) unzip -p "$WAR" META-INF/MANIFEST.MF | grep -Ei 'Implementation-Version|Build|Commit' || true # 2) Check the exploded app’s MANIFEST (what Tomcat is actually serving for JSP/static) grep -Ei 'Implementation-Version|Build|Commit' "$EXP/META-INF/MANIFEST.MF" 2>/dev/null || true # 3) Compare checksums to the canonical artifact on NFS (adjust path) sha256sum "$WAR" If any node shows an older version or checksum than your “source of truth” on NFS, that’s the stale node. C) Eliminate cache/stickiness (client/LB/CDN) without restarting Tomcat Bypass client & ***** caches: Add headers in your test call: Cache-Control: no-cache, no-store, must-revalidate and a unique query string (?cb=timestamp). Inspect Age: header in responses; if it’s >0, a *****/CDN is serving stale content. Avoid sticky sessions for the test: Don’t send Cookie: JSESSIONID=... (most LBs pin on it). With curl, don’t reuse -c/-b cookie jar; or explicitly send an empty cookie header: -H 'Cookie:'. If your LB uses a route cookie (e.g., SRV_ID), clear it too. Purge Tomcat’s compiled JSP cache for just this app (safe while running): # This does NOT restart Tomcat. It forces JSP recompile on next hit. rm -rf "$CATALINA_BASE/work/Catalina/localhost/$APP" If tests now see the new UI/views while the WAR/classes were already new, it was JSP cache. D) Reload just the webapp (not the server) if needed Still stale on a node but you can’t restart the service? Use Tomcat Manager to reload the context only: # Manager text API (adjust creds, host, and path) curl -s -u admin:password "http://localhost:8080/manager/text/reload?path=/$APP" This does not restart Tomcat, only your app, and often fixes a half-updated context. If autoDeploy=true and the WAR has the same name, you can also “touch” it to trigger redeploy without a server restart: touch "$CATALINA_BASE/webapps/$APP.war" (That’s still an app redeploy, not a Tomcat restart.) E) Confirm LB/DNS aren’t pointing tests elsewhere From the test runner: # Which VIP/IP is actually used? dig +short app.example.com # What route did TLS pick (SNI/resolve)? curl -skI --trace-ascii - "${URL}" | head -n 40 Sometimes CI runners use a different DNS view/VIP than your browser. F) One-shot script to summarize each node (run from a jump host) NODES=(app1 app2 app3); APP=myapp for h in "${NODES[@]}"; do echo "==== $h ====" ssh "$h" " set -e WAR=\$CATALINA_BASE/webapps/$APP.war EXP=\$CATALINA_BASE/webapps/$APP echo 'WAR:'; unzip -p \"\$WAR\" META-INF/MANIFEST.MF | egrep -i 'Implementation-Version|Build|Commit' || true echo 'EXP:'; egrep -i 'Implementation-Version|Build|Commit' \"\$EXP/META-INF/MANIFEST.MF\" 2>/dev/null || true echo 'SHA:'; sha256sum \"\$WAR\" | awk '{print \$1}' echo 'JSP cache size:'; du -sh \"\$CATALINA_BASE/work/Catalina/localhost/$APP\" 2>/dev/null || echo '(none)' " done What typically fixes “browser shows new code, tests show old” without restarting Tomcat Tests were pinned to a stale node → identify with --resolve or headers and reload that app via Manager (or touch WAR). *****/CDN cache in front → call with no-cache headers + cache-busting param; purge if necessary. JSP/work cache stale → delete $CATALINA_BASE/work/Catalina/localhost/<app>; hit endpoint again. Same-name WAR + autoDeploy off → Tomcat won’t redeploy automatically; do a context reload via Manager. If you paste: one test response header block, and the output of the mini node-summary script above for two nodes, I can point to the exact culprit and next command to run—still no Tomcat restart needed. will try, thank you Quote
jpismahatma Posted September 18 Report Posted September 18 On 9/11/2025 at 9:15 PM, jefferson1 said: jenkins is deploying code to nfs mount , lets say at 10:00 am script starts the copy of file immediately from mount to tom cat webapps on destination servers for some servers, the war file is placed there after 10 mins (this is where the issue is). In some cases when the dev tests the code they are seeing old code. ela troubleshoot cheyali In copy script from nfs to local add timestamp . So you know when exactly copy is triggered. 1 Quote
jpismahatma Posted September 18 Report Posted September 18 14 minutes ago, psycopk said: Invalidate cache This !!!! Quote
jefferson1 Posted September 18 Author Report Posted September 18 1 hour ago, psycopk said: Invalidate cache How ? Quote
psycopk Posted September 18 Report Posted September 18 1 minute ago, jefferson1 said: How ? cache ni flush cheyochu... you need to know where it is stored... how you are maintaining the cache...etc.. Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.