Changes in 4.9.242 SUNRPC: ECONNREFUSED should cause a rebind. scripts/setlocalversion: make git describe output more reliable powerpc/powernv/opal-dump : Use IRQ_HANDLED instead of numbers in interrupt handler efivarfs: Replace invalid slashes with exclamation marks in dentries. ravb: Fix bit fields checking in ravb_hwtstamp_get() tipc: fix memory leak caused by tipc_buf_append() arch/x86/amd/ibs: Fix re-arming IBS Fetch fuse: fix page dereference after free p54: avoid accessing the data mapped to streaming DMA mtd: lpddr: Fix bad logic in print_drs_error ata: sata_rcar: Fix DMA boundary mask fscrypt: return -EXDEV for incompatible rename or link into encrypted dir fscrypto: move ioctl processing more fully into common code fscrypt: use EEXIST when file already uses different policy mlxsw: core: Fix use-after-free in mlxsw_emad_trans_finish() powerpc/powernv/smp: Fix spurious DBG() warning sparc64: remove mm_cpumask clearing to fix kthread_use_mm race f2fs: add trace exit in exception path f2fs: fix to check segment boundary during SIT page readahead um: change sigio_spinlock to a mutex ARM: 8997/2: hw_breakpoint: Handle inexact watchpoint addresses xfs: fix realtime bitmap/summary file truncation when growing rt volume video: fbdev: pvr2fb: initialize variables ath10k: fix VHT NSS calculation when STBC is enabled media: tw5864: check status of tw5864_frameinterval_get mmc: via-sdmmc: Fix data race bug printk: reduce LOG_BUF_SHIFT range for H8300 kgdb: Make "kgdbcon" work properly with "kgdb_earlycon" cpufreq: sti-cpufreq: add stih418 support USB: adutux: fix debugging arm64/mm: return cpu_all_mask when node is NUMA_NO_NODE drivers/net/wan/hdlc_fr: Correctly handle special skb->protocol values bus/fsl_mc: Do not rely on caller to provide non NULL mc_io power: supply: test_power: add missing newlines when printing parameters by sysfs md/bitmap: md_bitmap_get_counter returns wrong blocks clk: ti: clockdomain: fix static checker warning net: 9p: initialize sun_server.sun_path to have addr's value only when addr is valid drivers: watchdog: rdc321x_wdt: Fix race condition bugs ext4: Detect already used quota file early gfs2: add validation checks for size of superblock memory: emif: Remove bogus debugfs error handling ARM: dts: s5pv210: remove DMA controller bus node name to fix dtschema warnings ARM: dts: s5pv210: move PMU node out of clock controller ARM: dts: s5pv210: remove dedicated 'audio-subsystem' node md/raid5: fix oops during stripe resizing perf/x86/amd/ibs: Don't include randomized bits in get_ibs_op_count() perf/x86/amd/ibs: Fix raw sample data accumulation leds: bcm6328, bcm6358: use devres LED registering function fs: Don't invalidate page buffers in block_write_full_page() NFS: fix nfs_path in case of a rename retry ACPI / extlog: Check for RDMSR failure ACPI: video: use ACPI backlight for HP 635 Notebook ACPI: debug: don't allow debugging when ACPI is disabled acpi-cpufreq: Honor _PSD table setting on new AMD CPUs w1: mxc_w1: Fix timeout resolution problem leading to bus error scsi: mptfusion: Fix null pointer dereferences in mptscsih_remove() btrfs: reschedule if necessary when logging directory items btrfs: cleanup cow block on error btrfs: fix use-after-free on readahead extent after failure to create it usb: dwc3: core: add phy cleanup for probe error handling usb: dwc3: core: don't trigger runtime pm when remove driver usb: host: fsl-mph-dr-of: check return of dma_set_mask() vt: keyboard, simplify vt_kdgkbsent vt: keyboard, extend func_buf_lock to readers dmaengine: dma-jz4780: Fix race in jz4780_dma_tx_status iio:light:si1145: Fix timestamp alignment and prevent data leak. iio:adc:ti-adc12138 Fix alignment issue with timestamp iio:gyro:itg3200: Fix timestamp alignment and prevent data leak. powerpc: Warn about use of smt_snooze_delay powerpc/powernv/elog: Fix race while processing OPAL error log event. ubifs: dent: Fix some potential memory leaks while iterating entries ubi: check kthread_should_stop() after the setting of task state ia64: fix build error with !COREDUMP ceph: promote to unsigned long long before shifting libceph: clear con->out_msg on Policy::stateful_server faults 9P: Cast to loff_t before multiplying ring-buffer: Return 0 on success from ring_buffer_resize() vringh: fix __vringh_iov() when riov and wiov are different rtc: rx8010: don't modify the global rtc ops tty: make FONTX ioctl use the tty pointer they were actually passed arm64: berlin: Select DW_APB_TIMER_OF cachefiles: Handle readpage error correctly hil/parisc: Disable HIL driver when it gets stuck ARM: samsung: fix PM debug build with DEBUG_LL but !MMU ARM: s3c24xx: fix missing system reset device property: Keep secondary firmware node secondary by type device property: Don't clear secondary pointer for shared primary firmware node KVM: arm64: Fix AArch32 handling of DBGD{CCINT,SCRext} and DBGVCR staging: comedi: cb_pcidas: Allow 2-channel commands for AO subdevice staging: octeon: repair "fixed-link" support staging: octeon: Drop on uncorrectable alignment or FCS error xen/events: don't use chip_data for legacy IRQs tipc: fix use-after-free in tipc_bcast_get_mode gianfar: Replace skb_realloc_headroom with skb_cow_head for PTP gianfar: Account for Tx PTP timestamp in the skb headroom Fonts: Replace discarded const qualifier ALSA: usb-audio: Add implicit feedback quirk for Qu-16 kthread_worker: prevent queuing delayed work from timer_fn when it is being canceled ftrace: Fix recursion check for NMI test ftrace: Handle tracing when switching between context tracing: Fix out of bounds write in get_trace_buf ARM: dts: sun4i-a10: fix cpu_alert temperature x86/kexec: Use up-to-dated screen_info copy to fill boot params of: Fix reserved-memory overlap detection scsi: core: Don't start concurrent async scan on same host vsock: use ns_capable_noaudit() on socket create ACPI: NFIT: Fix comparison to '-ENXIO' vt: Disable KD_FONT_OP_COPY fork: fix copy_process(CLONE_PARENT) race with the exiting ->real_parent serial: 8250_mtk: Fix uart_get_baud_rate warning serial: txx9: add missing platform_driver_unregister() on error in serial_txx9_init USB: serial: cyberjack: fix write-URB completion race USB: serial: option: add LE910Cx compositions 0x1203, 0x1230, 0x1231 USB: serial: option: add Telit FN980 composition 0x1055 USB: Add NO_LPM quirk for Kingston flash drive ARC: stack unwinding: avoid indefinite looping Revert "ARC: entry: fix potential EFA clobber when TIF_SYSCALL_TRACE" Linux 4.9.242 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I594296d57790eb8b7fa737119346d2b60572e5fd
274 lines
7.2 KiB
C
274 lines
7.2 KiB
C
/*
|
|
* SMP support for PowerNV machines.
|
|
*
|
|
* Copyright 2011 IBM Corp.
|
|
*
|
|
* This program is free software; you can redistribute it and/or
|
|
* modify it under the terms of the GNU General Public License
|
|
* as published by the Free Software Foundation; either version
|
|
* 2 of the License, or (at your option) any later version.
|
|
*/
|
|
|
|
#include <linux/kernel.h>
|
|
#include <linux/module.h>
|
|
#include <linux/sched.h>
|
|
#include <linux/smp.h>
|
|
#include <linux/interrupt.h>
|
|
#include <linux/delay.h>
|
|
#include <linux/init.h>
|
|
#include <linux/spinlock.h>
|
|
#include <linux/cpu.h>
|
|
|
|
#include <asm/irq.h>
|
|
#include <asm/smp.h>
|
|
#include <asm/paca.h>
|
|
#include <asm/machdep.h>
|
|
#include <asm/cputable.h>
|
|
#include <asm/firmware.h>
|
|
#include <asm/vdso_datapage.h>
|
|
#include <asm/cputhreads.h>
|
|
#include <asm/xics.h>
|
|
#include <asm/opal.h>
|
|
#include <asm/runlatch.h>
|
|
#include <asm/code-patching.h>
|
|
#include <asm/dbell.h>
|
|
#include <asm/kvm_ppc.h>
|
|
#include <asm/ppc-opcode.h>
|
|
|
|
#include "powernv.h"
|
|
|
|
#ifdef DEBUG
|
|
#include <asm/udbg.h>
|
|
#define DBG(fmt...) udbg_printf(fmt)
|
|
#else
|
|
#define DBG(fmt...) do { } while (0)
|
|
#endif
|
|
|
|
static void pnv_smp_setup_cpu(int cpu)
|
|
{
|
|
if (cpu != boot_cpuid)
|
|
xics_setup_cpu();
|
|
|
|
#ifdef CONFIG_PPC_DOORBELL
|
|
if (cpu_has_feature(CPU_FTR_DBELL))
|
|
doorbell_setup_this_cpu();
|
|
#endif
|
|
}
|
|
|
|
static int pnv_smp_kick_cpu(int nr)
|
|
{
|
|
unsigned int pcpu = get_hard_smp_processor_id(nr);
|
|
unsigned long start_here =
|
|
__pa(ppc_function_entry(generic_secondary_smp_init));
|
|
long rc;
|
|
uint8_t status;
|
|
|
|
BUG_ON(nr < 0 || nr >= NR_CPUS);
|
|
|
|
/*
|
|
* If we already started or OPAL is not supported, we just
|
|
* kick the CPU via the PACA
|
|
*/
|
|
if (paca[nr].cpu_start || !firmware_has_feature(FW_FEATURE_OPAL))
|
|
goto kick;
|
|
|
|
/*
|
|
* At this point, the CPU can either be spinning on the way in
|
|
* from kexec or be inside OPAL waiting to be started for the
|
|
* first time. OPAL v3 allows us to query OPAL to know if it
|
|
* has the CPUs, so we do that
|
|
*/
|
|
rc = opal_query_cpu_status(pcpu, &status);
|
|
if (rc != OPAL_SUCCESS) {
|
|
pr_warn("OPAL Error %ld querying CPU %d state\n", rc, nr);
|
|
return -ENODEV;
|
|
}
|
|
|
|
/*
|
|
* Already started, just kick it, probably coming from
|
|
* kexec and spinning
|
|
*/
|
|
if (status == OPAL_THREAD_STARTED)
|
|
goto kick;
|
|
|
|
/*
|
|
* Available/inactive, let's kick it
|
|
*/
|
|
if (status == OPAL_THREAD_INACTIVE) {
|
|
pr_devel("OPAL: Starting CPU %d (HW 0x%x)...\n", nr, pcpu);
|
|
rc = opal_start_cpu(pcpu, start_here);
|
|
if (rc != OPAL_SUCCESS) {
|
|
pr_warn("OPAL Error %ld starting CPU %d\n", rc, nr);
|
|
return -ENODEV;
|
|
}
|
|
} else {
|
|
/*
|
|
* An unavailable CPU (or any other unknown status)
|
|
* shouldn't be started. It should also
|
|
* not be in the possible map but currently it can
|
|
* happen
|
|
*/
|
|
pr_devel("OPAL: CPU %d (HW 0x%x) is unavailable"
|
|
" (status %d)...\n", nr, pcpu, status);
|
|
return -ENODEV;
|
|
}
|
|
|
|
kick:
|
|
return smp_generic_kick_cpu(nr);
|
|
}
|
|
|
|
#ifdef CONFIG_HOTPLUG_CPU
|
|
|
|
static int pnv_smp_cpu_disable(void)
|
|
{
|
|
int cpu = smp_processor_id();
|
|
|
|
/* This is identical to pSeries... might consolidate by
|
|
* moving migrate_irqs_away to a ppc_md with default to
|
|
* the generic fixup_irqs. --BenH.
|
|
*/
|
|
set_cpu_online(cpu, false);
|
|
vdso_data->processorCount--;
|
|
if (cpu == boot_cpuid)
|
|
boot_cpuid = cpumask_any(cpu_online_mask);
|
|
xics_migrate_irqs_away();
|
|
return 0;
|
|
}
|
|
|
|
static void pnv_smp_cpu_kill_self(void)
|
|
{
|
|
unsigned int cpu;
|
|
unsigned long srr1, wmask;
|
|
u32 idle_states;
|
|
|
|
/* Standard hot unplug procedure */
|
|
local_irq_disable();
|
|
idle_task_exit();
|
|
current->active_mm = NULL; /* for sanity */
|
|
cpu = smp_processor_id();
|
|
DBG("CPU%d offline\n", cpu);
|
|
generic_set_cpu_dead(cpu);
|
|
smp_wmb();
|
|
|
|
wmask = SRR1_WAKEMASK;
|
|
if (cpu_has_feature(CPU_FTR_ARCH_207S))
|
|
wmask = SRR1_WAKEMASK_P8;
|
|
|
|
idle_states = pnv_get_supported_cpuidle_states();
|
|
|
|
/* We don't want to take decrementer interrupts while we are offline,
|
|
* so clear LPCR:PECE1. We keep PECE2 (and LPCR_PECE_HVEE on P9)
|
|
* enabled as to let IPIs in.
|
|
*/
|
|
mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) & ~(u64)LPCR_PECE1);
|
|
|
|
/*
|
|
* Hard-disable interrupts, and then clear irq_happened flags
|
|
* that we can safely ignore while off-line, since they
|
|
* are for things for which we do no processing when off-line
|
|
* (or in the case of HMI, all the processing we need to do
|
|
* is done in lower-level real-mode code).
|
|
*/
|
|
hard_irq_disable();
|
|
local_paca->irq_happened &= ~(PACA_IRQ_DEC | PACA_IRQ_HMI);
|
|
|
|
while (!generic_check_cpu_restart(cpu)) {
|
|
/*
|
|
* Clear IPI flag, since we don't handle IPIs while
|
|
* offline, except for those when changing micro-threading
|
|
* mode, which are handled explicitly below, and those
|
|
* for coming online, which are handled via
|
|
* generic_check_cpu_restart() calls.
|
|
*/
|
|
kvmppc_set_host_ipi(cpu, 0);
|
|
|
|
ppc64_runlatch_off();
|
|
|
|
if (cpu_has_feature(CPU_FTR_ARCH_300))
|
|
srr1 = power9_idle_stop(pnv_deepest_stop_state);
|
|
else if (idle_states & OPAL_PM_WINKLE_ENABLED)
|
|
srr1 = power7_winkle();
|
|
else if ((idle_states & OPAL_PM_SLEEP_ENABLED) ||
|
|
(idle_states & OPAL_PM_SLEEP_ENABLED_ER1))
|
|
srr1 = power7_sleep();
|
|
else
|
|
srr1 = power7_nap(1);
|
|
|
|
ppc64_runlatch_on();
|
|
|
|
/*
|
|
* If the SRR1 value indicates that we woke up due to
|
|
* an external interrupt, then clear the interrupt.
|
|
* We clear the interrupt before checking for the
|
|
* reason, so as to avoid a race where we wake up for
|
|
* some other reason, find nothing and clear the interrupt
|
|
* just as some other cpu is sending us an interrupt.
|
|
* If we returned from power7_nap as a result of
|
|
* having finished executing in a KVM guest, then srr1
|
|
* contains 0.
|
|
*/
|
|
if (((srr1 & wmask) == SRR1_WAKEEE) ||
|
|
((srr1 & wmask) == SRR1_WAKEHVI) ||
|
|
(local_paca->irq_happened & PACA_IRQ_EE)) {
|
|
if (cpu_has_feature(CPU_FTR_ARCH_300))
|
|
icp_opal_flush_interrupt();
|
|
else
|
|
icp_native_flush_interrupt();
|
|
} else if ((srr1 & wmask) == SRR1_WAKEHDBELL) {
|
|
unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
|
|
asm volatile(PPC_MSGCLR(%0) : : "r" (msg));
|
|
}
|
|
local_paca->irq_happened &= ~(PACA_IRQ_EE | PACA_IRQ_DBELL);
|
|
smp_mb();
|
|
|
|
if (cpu_core_split_required())
|
|
continue;
|
|
|
|
if (srr1 && !generic_check_cpu_restart(cpu))
|
|
DBG("CPU%d Unexpected exit while offline !\n", cpu);
|
|
}
|
|
|
|
/* Re-enable decrementer interrupts */
|
|
mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) | LPCR_PECE1);
|
|
DBG("CPU%d coming online...\n", cpu);
|
|
}
|
|
|
|
#endif /* CONFIG_HOTPLUG_CPU */
|
|
|
|
static int pnv_cpu_bootable(unsigned int nr)
|
|
{
|
|
/*
|
|
* Starting with POWER8, the subcore logic relies on all threads of a
|
|
* core being booted so that they can participate in split mode
|
|
* switches. So on those machines we ignore the smt_enabled_at_boot
|
|
* setting (smt-enabled on the kernel command line).
|
|
*/
|
|
if (cpu_has_feature(CPU_FTR_ARCH_207S))
|
|
return 1;
|
|
|
|
return smp_generic_cpu_bootable(nr);
|
|
}
|
|
|
|
static struct smp_ops_t pnv_smp_ops = {
|
|
.message_pass = smp_muxed_ipi_message_pass,
|
|
.cause_ipi = NULL, /* Filled at runtime by xics_smp_probe() */
|
|
.probe = xics_smp_probe,
|
|
.kick_cpu = pnv_smp_kick_cpu,
|
|
.setup_cpu = pnv_smp_setup_cpu,
|
|
.cpu_bootable = pnv_cpu_bootable,
|
|
#ifdef CONFIG_HOTPLUG_CPU
|
|
.cpu_disable = pnv_smp_cpu_disable,
|
|
.cpu_die = generic_cpu_die,
|
|
#endif /* CONFIG_HOTPLUG_CPU */
|
|
};
|
|
|
|
/* This is called very early during platform setup_arch */
|
|
void __init pnv_smp_init(void)
|
|
{
|
|
smp_ops = &pnv_smp_ops;
|
|
|
|
#ifdef CONFIG_HOTPLUG_CPU
|
|
ppc_md.cpu_die = pnv_smp_cpu_kill_self;
|
|
#endif
|
|
}
|