Kernel Interpolation in Python: A Complete Beginner’s Guide to Gaussian RBF Kernels and RKHS

Leave a Comment / GIS / By spatial-dev.guru

Figure: Kernel Interpolation creates smooth curves through data points

Keywords: Kernel Interpolation, Gaussian RBF Kernel, RKHS, Kernel Ridge Regression, Python, Machine Learning, Spatial Interpolation

Table of Contents

1. What is Kernel Interpolation?

2. Understanding the Gaussian (RBF) Kernel

3. The Kernel Matrix Explained

4. What is RKHS? (Reproducing Kernel Hilbert Space)

5. Kernel Ridge Regression vs Kernel Interpolation

6. Step-by-Step Algorithm with Visualization

7. Effect of Bandwidth Parameter (σ)

8. Real-World Applications

9. Complete Python Implementation

10. Summary and Key Takeaways

1. What is Kernel Interpolation?

1.1 The Simple Explanation

Imagine you have a few data points on a graph, and you want to draw a smooth curve that passes exactly through every single point. That’s kernel interpolation in a nutshell.

Think of it like connecting dots, but instead of straight lines, you use a smooth, elegant curve. The “kernel” is like a magic brush that determines how smooth the curve should be.

Figure 1.1: Kernel Interpolation creates a smooth curve through all data points

1.2 A Real-World Analogy

Imagine you’re a weather scientist with temperature readings from 5 weather stations across a city:

Station A (downtown): 75°F
Station B (park): 72°F
Station C (industrial): 78°F
Station D (residential): 73°F
Station E (waterfront): 70°F

Now someone asks: “What’s the temperature at the shopping mall between stations A and B?”

Kernel interpolation creates a smooth temperature map that:

Passes exactly through all 5 known temperatures
Gives reasonable estimates for any location in between
Is as “smooth” as possible (no wild temperature jumps)

1.3 Why “Kernel”?

The word “kernel” refers to a mathematical function that measures similarity between two points. Points that are close together have high similarity (kernel value close to 1), while distant points have low similarity (kernel value close to 0).

The kernel acts like a “zone of influence” around each data point. When predicting at a new location, nearby training points have more influence than far-away ones.

2. Understanding the Gaussian (RBF) Kernel

2.1 The Bell Curve Connection

The Gaussian kernel (also called RBF – Radial Basis Function) is named after the famous “bell curve” or normal distribution. It measures similarity using this formula:

k(x₁, x₂) = exp(-||x₁ – x₂||² / 2σ²)

In plain English: “The similarity between two points decreases as they get farther apart, following a bell-curve shape.”

2.2 Understanding the Formula

Let’s break down each part:

Symbol	Meaning	Layman’s Explanation
k(x₁, x₂)	Kernel value	How similar are points x₁ and x₂? (0 to 1)
\|\|x₁ – x₂\|\|	Distance	How far apart are the two points?
σ (sigma)	Bandwidth	How quickly does similarity drop off?
exp(…)	Exponential	Ensures output is always between 0 and 1

Figure 2.1: The Gaussian kernel measures similarity – closer points are more similar

2.3 The Bandwidth Parameter (σ)

The bandwidth σ is the most important setting. Think of it as a “reach” or “radius of influence”:

Small σ (e.g., 0.3): Each point only influences its immediate neighbors → wiggly, spiky curve
Medium σ (e.g., 1.0): Balanced influence → smooth but responsive curve
Large σ (e.g., 5.0): Points influence distant areas → very smooth, almost flat curve

It’s like adjusting the “blur” on a photo – small σ keeps details sharp (maybe too sharp), large σ smooths everything out (maybe too much).

3. The Kernel Matrix Explained

3.1 What is the Kernel Matrix?

If we have n training points, we create an n×n table (matrix) where:

K[i,j] = similarity between point i and point j

This matrix captures “who is similar to whom” in our training data.

3.2 Key Properties

Diagonal is always 1 (each point is perfectly similar to itself)
Symmetric: K[i,j] = K[j,i] (similarity works both ways)
All values between 0 and 1

Figure 3.1: Kernel matrices with different bandwidths – darker means more similar

3.3 Reading the Matrix

Example with 5 points at x = [0, 1.5, 3.0, 4.5, 6.0]:

K[0,0] = 1.000 → Point 0 is identical to itself
K[0,1] = 0.325 → Points 0 and 1.5 are somewhat similar
K[0,4] = 0.000 → Points 0 and 6.0 are very far apart, almost unrelated

4. What is RKHS? (Reproducing Kernel Hilbert Space)

4.1 Don’t Be Scared by the Name!

RKHS sounds intimidating, but the concept is actually quite intuitive. Let’s break it down:

Term	Simple Meaning
Space	A collection of all possible functions we might use
Hilbert	The space has a way to measure “size” and “distance” between functions
Reproducing Kernel	There’s a special property that makes calculations tractable

4.2 The Key Insight

Think of RKHS as a “function neighborhood” – a specific set of smooth functions that our solution must belong to. The Gaussian kernel defines which functions are “allowed” in this neighborhood.

Figure 4.1: RKHS selects the smoothest function that passes through all points

4.3 The RKHS Norm

The RKHS norm ||f||_H measures the “complexity” or “wiggliness” of a function:

Low RKHS norm = Simple, smooth function
High RKHS norm = Complex, wiggly function

4.4 The Representer Theorem

This is the magic that makes kernel methods practical:

“Among all infinitely many functions that could interpolate your data, the best one (with minimum complexity) can be written as a simple weighted sum of kernel functions centered at your training points.”

This transforms an impossible infinite-dimensional problem into a simple linear algebra problem!

5. Kernel Ridge Regression vs Kernel Interpolation

5.1 The Problem with Perfect Interpolation

Kernel interpolation (λ=0) forces the curve to pass exactly through every training point. But what if your data has noise or measurement errors?

Example with a noisy thermometer:

True temperature: 72°F
Measured: 74°F (thermometer error)
Interpolation: Forces curve through 74°F (wrong!)

5.2 Enter Regularization (λ > 0)

Kernel Ridge Regression adds a “penalty” for complexity. Instead of perfectly fitting the data, it finds a balance between:

Fitting the data well (low training error)
Keeping the function simple (low RKHS norm)

5.3 The Mathematical Difference

Both methods solve a linear system, but with a small difference:

Kernel Interpolation (λ=0): K·α = y

Kernel Ridge Regression: (K + λI)·α = y

Figure 5.1: With noisy data, regularization (λ>0) prevents overfitting

5.4 When to Use What

λ Value	Behavior	Best For
λ = 0	Exact interpolation	Perfect data with no noise
λ small (0.01)	Nearly interpolates	Low-noise data
λ medium (0.1)	Balanced fit	Moderate noise
λ large (1.0+)	Very smooth fit	High noise data

6. Step-by-Step Algorithm with Visualization

6.1 Overview

The complete algorithm has just 3 main steps:

Step 1: Compute the kernel matrix K (pairwise similarities)
Step 2: Solve the linear system Kα = y (find coefficients)
Step 3: Predict using f(x) = Σᵢ αᵢ · k(x, xᵢ)

Figure 6.1: The complete kernel interpolation algorithm, step by step

6.2 Understanding Each Step

Step 1: Compute Kernel Matrix

For each pair of training points, calculate their similarity using the Gaussian kernel. This gives us an n×n matrix where n is the number of training points.

Step 2: Solve for Coefficients

We solve the linear system Kα = y to find the coefficients. Each coefficient αᵢ tells us how much the i-th training point contributes to the final prediction.

Positive αᵢ: This point pulls the curve UP
Negative αᵢ: This point pulls the curve DOWN

Step 3: Make Predictions

For any new point x*, compute:

f(x*) = α₁·k(x*,x₁) + α₂·k(x*,x₂) + … + αₙ·k(x*,xₙ)

Each training point “votes” on the prediction, weighted by its coefficient and its similarity to the new point.

7. Effect of Bandwidth Parameter (σ)

7.1 The Most Important Hyperparameter

The bandwidth σ controls the smoothness of your interpolant. Choosing it well is crucial for good results.

Figure 7.1: Different values of σ produce very different interpolations

7.2 Guidelines for Choosing σ

σ Value	Characteristic	When to Use
Very small (0.1-0.3)	Spiky, each point isolated	Never (usually overfits)
Small (0.5-0.7)	Responds to local patterns	High-frequency data
Medium (1.0-2.0)	Balanced smoothness	Most applications (start here)
Large (3.0+)	Very smooth, global trends	When you want simple fits

7.3 Practical Tips

Start with σ = median distance between your training points
Use cross-validation to find the optimal value
If the fit is too wiggly, increase σ
If the fit misses local patterns, decrease σ

8. Real-World Applications

Figure 8.1: Kernel methods are used across many domains

8.1 Spatial Interpolation

Application: Estimating values at unmeasured locations.

Weather forecasting: Temperature maps from weather stations
Mining/geology: Estimating mineral concentrations from soil samples
Pollution monitoring: Air quality maps from sensor readings
Real estate: Property value estimation based on nearby sales

Figure 8.2: Temperature interpolation from weather stations

8.2 Time Series Prediction

Application: Predicting future values based on past observations.

Stock price forecasting
Energy demand prediction
Medical monitoring (heart rate trends)

8.3 Machine Learning & AI

Application: Classification and regression in high-dimensional spaces.

Support Vector Machines (SVM)
Gaussian Processes
Drug discovery (molecular property prediction)

8.4 Computer Graphics

Application: Creating smooth surfaces and shapes.

3D surface reconstruction from point clouds
Medical imaging (reconstructing organ shapes from scans)
Animation (smooth motion interpolation)

9. Complete Python Implementation

9.1 The Kernel Function

First, we define the Gaussian kernel:

import numpy as np

def gaussian_kernel(x1, x2, sigma=1.0):
    """Compute similarity between two points."""
    distance_squared = np.sum((x1 - x2) ** 2)
    return np.exp(-distance_squared / (2 * sigma ** 2))

1

2

3

4

5

6

import numpy as np

def gaussian_kernel(x1, x2, sigma=1.0):

"""Compute similarity between two points."""

distance_squared = np.sum((x1 - x2) ** 2)

return np.exp(-distance_squared / (2 * sigma ** 2))

9.2 The Kernel Matrix

Compute pairwise similarities:

def compute_kernel_matrix(X, sigma=1.0):
    """Build the n×n matrix of pairwise similarities."""
    n = len(X)
    K = np.zeros((n, n))
    for i in range(n):
        for j in range(n):
            K[i, j] = gaussian_kernel(X[i], X[j], sigma)
    return K

1

2

3

4

5

6

7

8

def compute_kernel_matrix(X, sigma=1.0):

"""Build the n×n matrix of pairwise similarities."""

n = len(X)

K = np.zeros((n, n))

for i in range(n):

for j in range(n):

K[i, j] = gaussian_kernel(X[i], X[j], sigma)

return K

9.3 Fitting the Interpolator

Solve the linear system:

def fit_interpolation(X, y, sigma=1.0):
    """Find coefficients by solving Kα = y."""
    K = compute_kernel_matrix(X, sigma)
    K += 1e-10 * np.eye(len(K))  # Numerical stability
    alpha = np.linalg.solve(K, y)
    return alpha

1

2

3

4

5

6

def fit_interpolation(X, y, sigma=1.0):

"""Find coefficients by solving Kα = y."""

K = compute_kernel_matrix(X, sigma)

K += 1e-10 * np.eye(len(K)) # Numerical stability

alpha = np.linalg.solve(K, y)

return alpha

9.4 Making Predictions

Predict at new points:

def predict(x_new, X_train, alpha, sigma=1.0):
    """Predict using f(x) = Σᵢ αᵢ · k(x, xᵢ)"""
    pred = 0
    for i in range(len(X_train)):
        sim = gaussian_kernel(x_new, X_train[i], sigma)
        pred += alpha[i] * sim
    return pred

1

2

3

4

5

6

7

def predict(x_new, X_train, alpha, sigma=1.0):

"""Predict using f(x) = Σᵢ αᵢ · k(x, xᵢ)"""

pred = 0

for i in range(len(X_train)):

sim = gaussian_kernel(x_new, X_train[i], sigma)

pred += alpha[i] * sim

return pred

9.5 Complete Example

# Training data
X_train = np.array([0, 1.5, 3, 4.5, 6])
y_train = np.array([1, 2.5, 0.5, 2, 1.5])

# Fit the model
alpha = fit_interpolation(X_train, y_train, sigma=1.0)

# Predict at a new point
y_pred = predict(2.0, X_train, alpha, sigma=1.0)
print(f'Prediction at x=2.0: {y_pred:.4f}')

1

2

3

4

5

6

7

8

9

10

# Training data

X_train = np.array([0, 1.5, 3, 4.5, 6])

y_train = np.array([1, 2.5, 0.5, 2, 1.5])

# Fit the model

alpha = fit_interpolation(X_train, y_train, sigma=1.0)

# Predict at a new point

y_pred = predict(2.0, X_train, alpha, sigma=1.0)

print(f'Prediction at x=2.0: {y_pred:.4f}')

9.6 Complete Example

"""
===============================================================================
KERNEL INTERPOLATION IN PYTHON
A Complete Implementation with Gaussian (RBF) Kernel
===============================================================================

This module provides a clean, production-ready implementation of kernel
interpolation (Kernel Ridge Regression with λ=0) using the Gaussian RBF kernel.

Usage:
------
    from kernel_interpolation_implementation import GaussianKernelInterpolator
    
    model = GaussianKernelInterpolator(sigma=1.0)
    model.fit(X_train, y_train)
    predictions = model.predict(X_test)
"""

import numpy as np
import matplotlib.pyplot as plt


class GaussianKernelInterpolator:
    """
    Kernel Interpolation using the Gaussian (RBF) Kernel.
    
    Parameters:
    -----------
    sigma : float, default=1.0
        Bandwidth of the Gaussian kernel. Controls smoothness.
    jitter : float, default=1e-10
        Small value added to diagonal for numerical stability.
    """
    
    def __init__(self, sigma=1.0, jitter=1e-10):
        self.sigma = sigma
        self.jitter = jitter
        self.X_train_ = None
        self.alpha_ = None
        self.K_ = None
        
    def _compute_kernel_matrix(self, X1, X2):
        """Compute Gaussian kernel matrix between X1 and X2."""
        X1_sq = np.sum(X1 ** 2, axis=1, keepdims=True)
        X2_sq = np.sum(X2 ** 2, axis=1)
        sq_distances = np.maximum(X1_sq + X2_sq - 2 * X1 @ X2.T, 0)
        return np.exp(-sq_distances / (2 * self.sigma ** 2))
    
    def fit(self, X, y):
        """Fit the interpolator by solving Kα = y."""
        X = np.atleast_2d(X)
        if X.shape[0] == 1 and X.shape[1] > 1:
            X = X.T
        self.X_train_ = X
        
        self.K_ = self._compute_kernel_matrix(X, X)
        K_stable = self.K_ + self.jitter * np.eye(len(self.K_))
        self.alpha_ = np.linalg.solve(K_stable, y)
        return self
    
    def predict(self, X):
        """Predict at new points using f(x) = Σᵢ αᵢ · k(x, xᵢ)."""
        X = np.atleast_2d(X)
        if X.shape[0] == 1 and X.shape[1] > 1:
            X = X.T
        K_new = self._compute_kernel_matrix(X, self.X_train_)
        return K_new @ self.alpha_
    
    def rkhs_norm(self):
        """Compute RKHS norm ||f||_H = √(α'Kα)."""
        return np.sqrt(self.alpha_ @ self.K_ @ self.alpha_)


class KernelRidgeRegressor(GaussianKernelInterpolator):
    """Kernel Ridge Regression with regularization λ."""
    
    def __init__(self, sigma=1.0, lam=0.0, jitter=1e-10):
        super().__init__(sigma=sigma, jitter=jitter)
        self.lam = lam
        
    def fit(self, X, y):
        """Fit by solving (K + λI)α = y."""
        X = np.atleast_2d(X)
        if X.shape[0] == 1 and X.shape[1] > 1:
            X = X.T
        self.X_train_ = X
        
        self.K_ = self._compute_kernel_matrix(X, X)
        K_reg = self.K_ + (self.lam + self.jitter) * np.eye(len(self.K_))
        self.alpha_ = np.linalg.solve(K_reg, y)
        return self


# ============================================================================
# DEMONSTRATION
# ============================================================================

if __name__ == "__main__":
    print("=" * 60)
    print("KERNEL INTERPOLATION DEMO")
    print("=" * 60)
    
    # Training data
    X_train = np.array([0, 1.5, 3, 4.5, 6])
    y_train = np.array([1, 2.5, 0.5, 2, 1.5])
    
    print(f"\nTraining data:")
    print(f"  X = {X_train}")
    print(f"  y = {y_train}")
    
    # Fit model
    model = GaussianKernelInterpolator(sigma=1.0)
    model.fit(X_train, y_train)
    
    print(f"\nLearned coefficients α:")
    for i, a in enumerate(model.alpha_):
        print(f"  α[{i}] = {a:+.4f}")
    
    # Verify interpolation
    print(f"\nInterpolation verification:")
    y_pred = model.predict(X_train)
    for x, y_true, y_fit in zip(X_train, y_train, y_pred):
        print(f"  x={x:.1f}: true={y_true:.4f}, fitted={y_fit:.6f}")
    
    print(f"\nRKHS norm: {model.rkhs_norm():.4f}")
    
    # Visualization
    X_test = np.linspace(-0.5, 7, 200)
    y_test = model.predict(X_test)
    
    plt.figure(figsize=(10, 6))
    plt.plot(X_test, y_test, 'b-', linewidth=2.5, label='Kernel Interpolant')
    plt.scatter(X_train, y_train, s=150, c='red', edgecolors='black', 
                linewidth=2, zorder=5, label='Training Data')
    plt.xlabel('x', fontsize=12)
    plt.ylabel('y', fontsize=12)
    plt.title(f'Kernel Interpolation (σ={model.sigma})', fontsize=14)
    plt.legend(fontsize=11)
    plt.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.show()

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

"""

===============================================================================

KERNEL INTERPOLATION IN PYTHON

A Complete Implementation with Gaussian (RBF) Kernel

===============================================================================

This module provides a clean, production-ready implementation of kernel

interpolation (Kernel Ridge Regression with λ=0) using the Gaussian RBF kernel.

Usage:

------

from kernel_interpolation_implementation import GaussianKernelInterpolator

model = GaussianKernelInterpolator(sigma=1.0)

model.fit(X_train, y_train)

predictions = model.predict(X_test)

"""

import numpy as np

import matplotlib.pyplot as plt

class GaussianKernelInterpolator:

"""

Kernel Interpolation using the Gaussian (RBF) Kernel.

Parameters:

-----------

sigma : float, default=1.0

Bandwidth of the Gaussian kernel. Controls smoothness.

jitter : float, default=1e-10

Small value added to diagonal for numerical stability.

"""

def __init__(self, sigma=1.0, jitter=1e-10):

self.sigma = sigma

self.jitter = jitter

self.X_train_ = None

self.alpha_ = None

self.K_ = None

def _compute_kernel_matrix(self, X1, X2):

"""Compute Gaussian kernel matrix between X1 and X2."""

X1_sq = np.sum(X1 ** 2, axis=1, keepdims=True)

X2_sq = np.sum(X2 ** 2, axis=1)

sq_distances = np.maximum(X1_sq + X2_sq - 2 * X1 @ X2.T, 0)

return np.exp(-sq_distances / (2 * self.sigma ** 2))

def fit(self, X, y):

"""Fit the interpolator by solving Kα = y."""

X = np.atleast_2d(X)

if X.shape[0] == 1 and X.shape[1] > 1:

X = X.T

self.X_train_ = X

self.K_ = self._compute_kernel_matrix(X, X)

K_stable = self.K_ + self.jitter * np.eye(len(self.K_))

self.alpha_ = np.linalg.solve(K_stable, y)

return self

def predict(self, X):

"""Predict at new points using f(x) = Σᵢ αᵢ · k(x, xᵢ)."""

X = np.atleast_2d(X)

if X.shape[0] == 1 and X.shape[1] > 1:

X = X.T

K_new = self._compute_kernel_matrix(X, self.X_train_)

return K_new @ self.alpha_

def rkhs_norm(self):

"""Compute RKHS norm ||f||_H = √(α'Kα)."""

return np.sqrt(self.alpha_ @ self.K_ @ self.alpha_)

class KernelRidgeRegressor(GaussianKernelInterpolator):

"""Kernel Ridge Regression with regularization λ."""

def __init__(self, sigma=1.0, lam=0.0, jitter=1e-10):

super().__init__(sigma=sigma, jitter=jitter)

self.lam = lam

def fit(self, X, y):

"""Fit by solving (K + λI)α = y."""

X = np.atleast_2d(X)

if X.shape[0] == 1 and X.shape[1] > 1:

X = X.T

self.X_train_ = X

self.K_ = self._compute_kernel_matrix(X, X)

K_reg = self.K_ + (self.lam + self.jitter) * np.eye(len(self.K_))

self.alpha_ = np.linalg.solve(K_reg, y)

return self

# ============================================================================

# DEMONSTRATION

# ============================================================================

if __name__ == "__main__":

print("=" * 60)

print("KERNEL INTERPOLATION DEMO")

print("=" * 60)

# Training data

X_train = np.array([0, 1.5, 3, 4.5, 6])

y_train = np.array([1, 2.5, 0.5, 2, 1.5])

print(f"\nTraining data:")

print(f" X = {X_train}")

print(f" y = {y_train}")

# Fit model

model = GaussianKernelInterpolator(sigma=1.0)

model.fit(X_train, y_train)

print(f"\nLearned coefficients α:")

for i, a in enumerate(model.alpha_):

print(f" α[{i}] = {a:+.4f}")

# Verify interpolation

print(f"\nInterpolation verification:")

y_pred = model.predict(X_train)

for x, y_true, y_fit in zip(X_train, y_train, y_pred):

print(f" x={x:.1f}: true={y_true:.4f}, fitted={y_fit:.6f}")

print(f"\nRKHS norm: {model.rkhs_norm():.4f}")

# Visualization

X_test = np.linspace(-0.5, 7, 200)

y_test = model.predict(X_test)

plt.figure(figsize=(10, 6))

plt.plot(X_test, y_test, 'b-', linewidth=2.5, label='Kernel Interpolant')

plt.scatter(X_train, y_train, s=150, c='red', edgecolors='black',

linewidth=2, zorder=5, label='Training Data')

plt.xlabel('x', fontsize=12)

plt.ylabel('y', fontsize=12)

plt.title(f'Kernel Interpolation (σ={model.sigma})', fontsize=14)

plt.legend(fontsize=11)

plt.grid(True, alpha=0.3)

plt.tight_layout()

plt.show()

10. Summary and Key Takeaways

10.1 Quick Reference Table

Concept	What It Does	Key Parameter
Gaussian Kernel	Measures similarity between points	σ (bandwidth)
Kernel Matrix K	Table of all pairwise similarities	Size: n × n
Coefficients α	Weights for each training point	Solve Kα = y
RKHS	Space of allowed functions	Determined by kernel
RKHS Norm	Measures function complexity	Lower = smoother
λ (regularization)	Trade-off: fit vs. smoothness	λ=0: exact interpolation

10.2 The Algorithm in 3 Steps

Step 1: Compute kernel matrix K where K[i,j] = k(xᵢ, xⱼ)

Step 2: Solve Kα = y to find coefficients

Step 3: Predict using f(x) = Σᵢ αᵢ · k(x, xᵢ)

10.3 When to Use

Use Kernel Interpolation (λ=0) when:

Your data has no noise (exact measurements)
You need the curve to pass exactly through known points

Use Kernel Ridge Regression (λ>0) when:

Your data has measurement noise
You want better generalization

10.4 Final Thoughts

Kernel interpolation is a powerful technique that lets you create smooth functions through any set of points. The key insight is that the Representer Theorem transforms an infinite-dimensional problem into simple linear algebra.

“Kernel interpolation turns an infinite-dimensional function-finding problem into simple linear algebra.”

I hope this tutorial will create a good foundation for you. If you want tutorials on another topic or you have any queries, please send an mail at contact@spatial-dev.guru.

Leave a ReplyCancel reply