Skip to content

Commit f3626a5

Browse files
mkutsevolfacebook-github-bot
authored andcommitted
Fix fbkpatch override reference script too early
Summary: Fixes the order of creation for the fbkpatch script and the override that uses it. Example failures in https://fburl.com/scuba/chef/rykzxxbw Moves the script to fb_kpatch, so it's closer to the override. Differential Revision: D67226415 fbshipit-source-id: 0dce8b171ed8656f0416be0453cba85324fe090a
1 parent 183ef44 commit f3626a5

File tree

2 files changed

+119
-5
lines changed

2 files changed

+119
-5
lines changed
Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
#!/bin/bash
2+
#
3+
# Execute the kpatch script to load the next kernel live patch hotfix,
4+
# and log the output of the kpatch script to scribe.
5+
# This also execute /usr/local/bin/klp_netcons.sh to populate the netconsole
6+
# dictionary
7+
8+
# You probably want to call kpatch directly
9+
if [ -z "$1" ] || [ "$1" == "--help" ] || [ "$1" == "help" ] || [ "$1" == "-h" ] ; then
10+
echo "$0 is a wrapper for kpatch, used from chef to log results to"
11+
echo "kernel_livepatch logview. Please use kpatch directly from the shell."
12+
exit 0
13+
fi
14+
15+
function report_to_scuba {
16+
# Logs can be found in `journalctl -t kpatch.service`. Beware of the aggressive journal
17+
# rotation settings. It might throw away these logs pretty quickly.
18+
local JSON=$1
19+
local TIMEOUT=${2:-600}
20+
local START_TIME
21+
START_TIME=$(/bin/date +%s)
22+
echo Attempt to submit data to scribe | systemd-cat -t kpatch.service
23+
until echo "$JSON" | /usr/local/bin/scribe_cat --sync --check-non-ok-result errorlog_kpatch; do
24+
trying_for_seconds=$(($(/bin/date +%s)-START_TIME))
25+
echo Failed to submit data, sleeping. | systemd-cat -t kpatch.service
26+
sleep 1;
27+
if [ "${trying_for_seconds}" -gt "$TIMEOUT" ]; then
28+
echo Timeout reached for data submission. Failing. | systemd-cat -t kpatch.service
29+
break
30+
fi
31+
done
32+
33+
}
34+
35+
# Gather some basic information
36+
HOSTNAME=$(hostname)
37+
KVER=$(uname -r)
38+
TIME=$(/bin/date +%s)
39+
KLPNETCONS="/usr/local/bin/klp_netcons.sh"
40+
41+
# Get the hardware type.
42+
. /etc/fbwhoami
43+
if [ -n "${MODEL_NAME}" ] ; then
44+
HARDWARE=${MODEL_NAME}
45+
else
46+
HARDWARE="UNKNOWN"
47+
fi
48+
49+
# Attempt to load the KLP and get the status of that attempt
50+
KPATCHOUT=$(/usr/sbin/kpatch "$@" 2>&1)
51+
STATUS=${PIPESTATUS[0]}
52+
53+
OUTESCAPED=$(echo "$KPATCHOUT" | jq -asR)
54+
55+
HOTFIXES=""
56+
MODULESDIR="/var/lib/kpatch/${KVER}"
57+
if [ -d "${MODULESDIR}" ]
58+
then
59+
HOTFIXES=$(modinfo -Fname "${MODULESDIR}"/*ko | sed 's/^.*_hotfix/hotfix/')
60+
fi
61+
62+
MESSAGE=$(cat << EOF
63+
{
64+
"command": "kpatch $@",
65+
"exit_status": "$STATUS",
66+
"hardware": "$HARDWARE",
67+
"hostname": "$HOSTNAME",
68+
"kernel": "$KVER",
69+
"time": "$TIME",
70+
"output": $OUTESCAPED,
71+
"hotfixes": "$HOTFIXES"
72+
}
73+
EOF
74+
)
75+
76+
JSON=$(jq -c -n "$MESSAGE")
77+
78+
# Scuba submission should run in background detached from this process.
79+
# Because this starts very early during boot and network/scribed is not yet
80+
# available.
81+
( # () makes a group (manual: https://fburl.com/n84qccbx). This gives us 2 things we need
82+
# here. A separate process to run and io redirection of the whole group.
83+
trap '' HUP INT # ignore these signals. We will get them if the parent exits first
84+
report_to_scuba "$JSON"
85+
) </dev/null 2>&1 1>/dev/null & # throw out in/out/err. & makes it run in the background
86+
87+
if [ "${STATUS}" -ne "0" ]; then
88+
echo kpatch failed with exit status "${STATUS}"
89+
# exit "${STATUS}"
90+
# Pretend the KLP load was successful. If it was not we will
91+
# try again at the next chef run. KLP load success & failure
92+
# are monitored separately, and the kernel team is working on
93+
# reducing the failure rate to the point where we can pass
94+
# errors to chef again without breaking the fleet.
95+
fi
96+
97+
# Update the netconsole cmdline dictionary after every successful
98+
# operation
99+
100+
if [ -x ${KLPNETCONS} ]
101+
then
102+
${KLPNETCONS}
103+
fi
104+
105+
exit 0

cookbooks/fb_kpatch/recipes/default.rb

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -27,11 +27,6 @@
2727
action :upgrade
2828
end
2929

30-
service 'kpatch' do
31-
only_if { node['fb_kpatch']['enable'] }
32-
action [:enable, :start]
33-
end
34-
3530
service 'disable kpatch' do
3631
not_if { node['fb_kpatch']['enable'] }
3732
service_name 'kpatch'
@@ -47,6 +42,15 @@
4742
},
4843
})
4944
end
45+
46+
# Script to log kpatch results to scribe
47+
cookbook_file '/usr/local/bin/fbkpatch' do
48+
source 'fbkpatch'
49+
owner node.root_user
50+
group node.root_group
51+
mode '0755'
52+
end
53+
5054
fb_systemd_override 'fbkpatch' do
5155
unit_name 'kpatch.service'
5256
content({
@@ -57,3 +61,8 @@
5761
},
5862
})
5963
end
64+
65+
service 'kpatch' do
66+
only_if { node['fb_kpatch']['enable'] }
67+
action [:enable, :start]
68+
end

0 commit comments

Comments
 (0)