Title: | Functions, apps, exercises and other R related stuff used in "AI - Aalborg Intelligence" |
---|---|
Description: | Functions, apps, exercises and other R related stuff used in "AI - Aalborg Intelligence" The project (2020 - 2026) is supported by the Novo Nordisk Foundation to develop teaching material to be used in the Danish highschools to strengthen the understanding of AI while explaining how basic maths is used in the some popular AI methods. |
Authors: | Ege Rubak, Torben Tvedebrink, Mikkel Meyer Andersen, Lisbeth Fajstrup |
Maintainer: | Ege Rubak <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.2.0 |
Built: | 2024-11-12 04:56:41 UTC |
Source: | https://github.com/aalborg-intelligence/aai |
Functions, apps, exercises and other R related stuff used in "AI - Aalborg Intelligence" The project (2020 - 2026) is supported by the Novo Nordisk Foundation to develop teaching material to be used in the Danish high schools to strengthen the understanding of AI while explaining how basic maths is used in the some popular AI methods.
A data frame containing the length add weight of SOMETHING. In 'classification_train_data' the class is also given.
classification_test_data
classification_test_data
A data frame with 10 rows and 2 variables:
Length
Weight
A data frame containing the length add weight of SOMETHING. In 'classification_train_data' the class is also given.
classification_train_data
classification_train_data
A data frame with 150 rows and 3 variables:
Length
Weight
Class
Simple function for DT output
dt_simple(tab, ...)
dt_simple(tab, ...)
tab |
The table to format |
... |
Arguments passed to 'DT::datatable' |
Short function for DT output
dt_table(tab, ...)
dt_table(tab, ...)
tab |
The table to format |
... |
Arguments passed to 'DT::datatable' |
Simple function for kable output
kable_(tab, ...)
kable_(tab, ...)
tab |
The table to format |
... |
Arguments passed to 'knitr::kable' |
Data for plotting a grid based the mean of the K nearest neighbors
kMD_plot(K = 3, .train, response = "Type", grid = 100)
kMD_plot(K = 3, .train, response = "Type", grid = 100)
K |
Number of neighbors |
.train |
Training data |
response |
Name of the class variable |
grid |
Resolution of the grid - higher values gives finer grid |
Wrapper around 'class::knn'
kNN(K = 3, .test, .train, response = "Type")
kNN(K = 3, .test, .train, response = "Type")
K |
Number of nearest neighbors to use |
.test |
Data that should be classified based on the training data |
.train |
Annotated training data that should be classified the test data |
response |
Name of the response variable |
Visualise a kNN trainer
kNN_plot(K = 3, .train, response = "Type", grid = 100)
kNN_plot(K = 3, .train, response = "Type", grid = 100)
K |
Number of neighbors to use |
.train |
The training data |
response |
The name of the response/class variable |
grid |
The resolution of the grid. Larger numbers gives higher resolution (and slower performance). |
k <- 3 kNN_plot(.train = classification_train_data, K = k) %>% ggplot() + labs(title = paste("K =", k)) + geom_rect(aes(xmin = Længde_0, xmax = Længde_1, ymin = Vægt_0, ymax = Vægt_1, fill = Type), alpha = 0.3) + geom_point(data = train, aes(x = Længde, y = Vægt, colour = Type))
k <- 3 kNN_plot(.train = classification_train_data, K = k) %>% ggplot() + labs(title = paste("K =", k)) + geom_rect(aes(xmin = Længde_0, xmax = Længde_1, ymin = Vægt_0, ymax = Vægt_1, fill = Type), alpha = 0.3) + geom_point(data = train, aes(x = Længde, y = Vægt, colour = Type))
Actual cross-validation function for kNN.
kNN.cv(K = 3, .train, response = "Type", fold = 10)
kNN.cv(K = 3, .train, response = "Type", fold = 10)
K |
Vector of nearest neighbor values (the k in kNN) |
.train |
The data to use kNN on |
response |
The variable name of the response |
fold |
The number of folds to use in cross validation |
data(classification_train_data) K_LOO <- tibble(K = 1:15, LOO = kNN.loo(K, .train = classification_train_data) ) %>% rowwise() %>% mutate(CV = list(kNN.cv(K, .train = classification_train_data))) K_LOO %>% ggplot(aes(x = factor(K))) + geom_boxplot(data = unnest(K_LOO, CV), aes(y = CV)) + geom_point(aes(y = LOO), colour = "#999999") + labs(x = "K", y = "Accuracy")
data(classification_train_data) K_LOO <- tibble(K = 1:15, LOO = kNN.loo(K, .train = classification_train_data) ) %>% rowwise() %>% mutate(CV = list(kNN.cv(K, .train = classification_train_data))) K_LOO %>% ggplot(aes(x = factor(K))) + geom_boxplot(data = unnest(K_LOO, CV), aes(y = CV)) + geom_point(aes(y = LOO), colour = "#999999") + labs(x = "K", y = "Accuracy")
Wrapper around 'class::knn.cv' which does Leave one Out (LoO)
kNN.loo(K = 3, .train, response = "Type")
kNN.loo(K = 3, .train, response = "Type")
K |
Number of nearest neighbors to use (can be a vector) |
.train |
Annotated training data that should be classified the test data |
response |
Name of the response variable |
Wrapper around 'class::knn1'
kNN1(.test, .train, response = "Type")
kNN1(.test, .train, response = "Type")
.test |
Data that should be classified based on the training data |
.train |
Annotated training data that should be classified the test data |
response |
Name of the response variable |
Mean distance to k nearest
meandist_to_k_nearest( K = 3, .test, .train, response = "Type", dist = FALSE, info = TRUE )
meandist_to_k_nearest( K = 3, .test, .train, response = "Type", dist = FALSE, info = TRUE )
K |
Number of nearest neighbors |
.train |
The training data |
return_all |
Logical. Should the distance to the nearest K be returned or just the mean distance of them? |
If 'return_all = FALSE' a dataframe of the mean distance to each class of 'response' is returned. If 'return_all = TRUE' a list is returned - 'top_K' is as above, 'all' contains the closest neighbors from each class.
data(classification_train_data) meandist_to_k_nearest_(K = 3, .train = classification_train_data) %>% mutate(same_Type = ifelse(obs_Type == Type, "Y", "N")) %>% ggplot(aes(x = obs_Type, y = Distance, fill = Type, colour = same_Type)) + labs(x = "Type of the observation", fill = "Type of the nearest points") + theme(legend.position = "top") + guides(colour = FALSE) + scale_colour_manual(values = c("Y" = "#666666", "N" = "#000000")) + geom_boxplot() + coord_flip()
data(classification_train_data) meandist_to_k_nearest_(K = 3, .train = classification_train_data) %>% mutate(same_Type = ifelse(obs_Type == Type, "Y", "N")) %>% ggplot(aes(x = obs_Type, y = Distance, fill = Type, colour = same_Type)) + labs(x = "Type of the observation", fill = "Type of the nearest points") + theme(legend.position = "top") + guides(colour = FALSE) + scale_colour_manual(values = c("Y" = "#666666", "N" = "#000000")) + geom_boxplot() + coord_flip()
Mean distance to k nearest
meandist_to_k_nearest_(K = 5, .train, response = "Type", return_all = FALSE)
meandist_to_k_nearest_(K = 5, .train, response = "Type", return_all = FALSE)
K |
Number of nearest neighbors |
.train |
The training data |
return_all |
Logical. Should the distance to the nearest K be returned or just the mean distance of them? |
If 'return_all = FALSE' a dataframe of the mean distance to each class of 'response' is returned. If 'return_all = TRUE' a list is returned - 'top_K' is as above, 'all' contains the closest neighbors from each class.
A data frame containing the responses to two fictive questions on the scale -2,-1,0,1,2 together with a classification color.
perceptron31
perceptron31
A data frame with 31 rows and 3 variables:
Answer to first question.
Answer to second question.
Class
Helper function for making preditive grid
pred_grid( data, step = 10, response = "Type", pred_var = "Prediction", center = 0 )
pred_grid( data, step = 10, response = "Type", pred_var = "Prediction", center = 0 )
data |
Dataset |
step |
Step size in each data variable |
response |
The name of the response variable |
center |
If not through zero, then through 'center' |
Method for predicting the majority vote or "?" if ties
pred_max(n, x)
pred_max(n, x)
n |
Counts |
x |
Data vector |
Helper function for making preditive grid
pred_plot_grid(pred_grd, pred_var = "Prediction", remove = TRUE)
pred_plot_grid(pred_grd, pred_var = "Prediction", remove = TRUE)
pred_grd |
Output from YY function |
remove |
Is parsed to the 'remove' argument of 'tidyr::separate()' |
Create grid for new data
predict_grid(pred_grd, newdata)
predict_grid(pred_grd, newdata)
pred_grd |
Returned from ? |
newdata |
New data to be used in prediction |
Makes print return all rows in a tibble
Print(...)
Print(...)
... |
Arguments passed to 'knitr::kable' |
Create discretised version with some pretty labels
seq_cut(x, step, center, breaks = FALSE)
seq_cut(x, step, center, breaks = FALSE)
x |
Data variable |
step |
Step size |
center |
If not through zero, then through 'center' |
breaks |
Logical. Should break labels we returned? |
seq_cut(rnorm(100), step = 2, center = 0, breaks = TRUE)
seq_cut(rnorm(100), step = 2, center = 0, breaks = TRUE)
Create breaks for 'seq_cut'
seq_zero(x, step, center)
seq_zero(x, step, center)
x |
Data variable |
step |
Step size |
center |
If not through zero, then through 'center' |
seq_zero(rnorm(100), step = 2, center = 0)
seq_zero(rnorm(100), step = 2, center = 0)
Plot of data for exercise by Jan B Sørensen on classification
xy_plot(train, x, y, colour, test = NULL, selected = NULL)
xy_plot(train, x, y, colour, test = NULL, selected = NULL)
train |
Training data set |
x , y , colour
|
parameters controlling the x and y axis and point colours |
test |
Test data set |
selected |
points to highlight |
data(classification_train_data) data(classification_test_data) type_cols <- c("1" = "#E41A1C", "2" = "#377EB8", "3" = "#4DAF4A", "?" = "#444444") xy_plot(train = classification_train_data, x = Længde, y = Vægt, colour = Type) + scale_colour_manual(values = type_cols)
data(classification_train_data) data(classification_test_data) type_cols <- c("1" = "#E41A1C", "2" = "#377EB8", "3" = "#4DAF4A", "?" = "#444444") xy_plot(train = classification_train_data, x = Længde, y = Vægt, colour = Type) + scale_colour_manual(values = type_cols)