Home Programming Random Forest in R

Random Forest in R

94
0
r tutorial
r tutorial

Random Forest in R, Random forest created by an aggregating tree and this can be utilized for characterization and regression. One of the major benefits is its stays away from overfitting. So, learn R Programming Training Course in Bangalore

The random forest can manage a large number of features and it assists with distinguishing the important attributes.

The random forest contains two user-friendly parameters ntree and mtry.

ntree-ntree as a matter of course is 500 trees.

mtry-variables randomly tests as up-and-comers at each split.

Random Forest Steps

1. Draw ntree bootstrap tests.

2. For each bootstrap, grow an un-pruned tree by picking the best parted dependent on a random example of mtry predictors at every hub

3. Predict new data utilizing majority votes in favor of order and average for regression dependent on ntree trees.

Burden Library

library(randomForest)

library(datasets)

library(caret)

Getting Data

data<-iris

str(data)

The datasets contain 150 observations and 5 variables. Species considered as response variables. Species variable ought to be a factor variable.

data$Species <-as.factor(data$Species)

table(data$Species)

setosa versicolor virginica

50

From the above results, we can recognize that our data set is adjusted.

Correlation investigation in R

Data Partition

Lets start with random seed so the result will be repeatable and store train and test data.

set.seed(222)

ind <-sample(2, nrow(data), replace = TRUE, prob = c(0.7, 0.3))

train <-data[ind==1,]

test <-data[ind==2,]

106 observations in train data set and 44 observatons in test data.

Random Forest in R

rf <-randomForest(Species~., data=train, proximity=TRUE) print(rf)

Call:

randomForest(formula = Species ~ ., data = train)

Kind of random forest: characterization

Number of trees: 500

No. of variables tried at each split: 2

OOB gauge of error rate: 2.83%

Disarray matrix:

setosa versicolor virginica class.error

setosa 35 0 0.00000000

versicolor 0 35 1 0.02777778

virginica 0 2 33 0.05714286

Out of sack error is 2.83%, so the train data set model accuracy is around 97%.

tidyverse complete tutorial

Ntree is 500 and mtry is 2

Prediction and Confusion Matrix – train data

p1 <-predict(rf, train)

confusionMatrix(p1, train$ Species)

Disarray Matrix and Statistics

Reference

Prediction setosa versicolor virginica

setosa 35 0

versicolor 0 36 0

virginica 0 35

Overall Statistics

Accuracy : 1

95% CI : (0.9658, 1)

No Information Rate : 0.3396

P-Value [Acc > NIR] : < 2.2e-16

Kappa : 1

Mcnemar’s Test P-Value : NA

Insights by Class:

Class: setosa Class: versicolor Class: virginica

Affectability 1.0000

Explicitness 1.0000

Pos Pred Value 1.0000

Neg Pred Value 1.0000

Prevalence 0.3302 0.3396 0.3302

Location Rate 0.3302 0.3396 0.3302

Location Prevalence 0.3302 0.3396 0.3302

Adjusted Accuracy 1.0000

Train data accuracy is 100% that shows every one of the qualities grouped correctly.

Naive Bayes Classification in R

Prediction and Confusion Matrix – test data

p2 <-predict(rf, test)

confusionMatrix(p2, test$ Species)

Disarray Matrix and Statistics

Reference

Prediction setosa versicolor virginica

setosa 15 0

versicolor 0 11 1

virginica 0 3 14

Overall Statistics

Accuracy : 0.9091

95% CI : (0.7833, 0.9747)

No Information Rate : 0.3409

– P-Value [Acc > NIR] : 5.448e-15

Kappa : 0.8634

Mcnemar’s Test P-Value : NA

Insights by Class:

Class: setosa Class: versicolor Class: virginica

Affectability 1.0000 0.7857 0.9333

Explicitness 1.0000 0.9667 0.8966

Pos Pred Value 1.0000 0.9167 0.8235

Neg Pred Value 1.0000 0.9062 0.9630

Prevalence 0.3409 0.3182 0.3409

Discovery Rate 0.3409 0.2500 0.3182

Discovery Prevalence 0.3409 0.2727 0.3864

Adjusted Accuracy 1.0000 0.8762 0.9149

Test data accuracy is 90%

Error rate of Random Forest

plot(rf)

The model is predicted with high accuracy, with no requirement for further tuning. However, we can tune a number of trees and mtry premise beneath the capacity.

LSTM networks in R

Tune mtry

t <-tuneRF(train[,- 5], train[,5],

stepFactor = 0.5,

plot = TRUE,

ntreeTry = 150,

trace = TRUE,

improve = 0.05)

No. of hubs for the trees

hist(treesize(rf),

main = “No. of Nodes for the Trees”,

col = “green”)

Variable Importance

varImpPlot(rf,

sort = T,

n.var = 10,

main = “Top 10 – Variable Importance”)

importance(rf)

MeanDecreaseGini

Sepal.Length 7.170376

Sepal.Width 1.318423

Petal.Length 32.286286

Petal.Width 29.117348

Petal.Length is the main attribute followed by Petal.Width.

Partial Dependence Plot

partialPlot(rf, train, Petal.Width, “setosa”)

The inference ought to be, on the off chance that the petal width is under 1.5, higher odds of grouping into Setosa class.

Multi-dimensional Scaling Plot of Proximity Matrix

Measurement plot likewise can create from random forest model.

MDSplot(rf, train$Species)