Principal Component Analysis, is one of the most useful data analysis and machine learning methods out there. It can be used to identify patterns in highly complex datasets and it can tell you what variables in your data are the most important. Lastly, it can tell you how accurate your new understanding of the data actually is.

In this video, I go one step at a time through PCA, and the method used to solve it, Singular Value Decomposition. I take it nice and slowly so that the simplicity of the method is revealed and clearly explained.

There is a minor error at 1:47: Points 5 and 6 are not in the right location

If you are interested in doing PCA in R see:

If you are interested in learning more about how to determine the number of principal components, see:

For a complete index of all the StatQuest videos, check out:

If you’d like to support StatQuest, please consider…

Patreon:

…or…

YouTube Membership:

…a cool StatQuest t-shirt or sweatshirt (USA/Europe):

(everywhere):

…buying one or two of my songs (or go large and get a whole album!)

…or just donating to StatQuest!

Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:

0:00 Awesome song and introduction

0:30 Conceptual motivation for PCA

3:23 PCA worked out for 2-Dimensional data

5:03 Finding PC1

12:08 Singular vector/value, Eigenvector/value and loading scores defined

12:56 Finding PC2

14:14 Drawing the PCA graph

15:03 Calculating percent variation for each PC and scree plot

16:30 PCA worked out for 3-Dimensional data

#statquest #PCA #ML

Nguồn: https://scoutdawson.com/

Xem thêm bài viết khác: https://scoutdawson.com/tong-hop/

### Xem thêm Bài Viết:

- Mao Đệ Đệ Ra Phố Học Hỏi Kinh Nghiệm Nhuộm Tóc Về Nhuộm Lông Chó Và Cái Kết
- Hướng dẫn diệt virus folder.exe – Worm.Win32.Nimnul.A (Virus)
- Como clonar seu disco duro para um SSD Kingston para Desktop e Notebook usando o Acronis True Image
- Tự học excel hiệu quả – Sử dụng hàm SUMIF và SUMIFS
- Học Excel Cơ Bản | 95 Hướng dẫn dùng hàm Forecast – dự đoán doanh số

Best website to learn fundamental concepts in the most simplest manner

First, great video and this helped tremendously in my understanding of how PCA works. My question is about how one begins to calculate PC levels 4 and higher since there is not a good visual analogy to these higher dimensional components. Does this require some sort of matrix algebra? Thanks again for the great video!

The sum of all percentage variations has to be hundred right?

if the number of pcs is reduced from 3 to 2 because of their variation then the plot of us has changed from 3d to 2d ,now from this can we conclude that we are removing the gene 3 for the representation of our data.

so how did pca can be used for the dimensionality reduction

Hi! can I just say that please make more videos on topics that you understand because your way of teaching is super-rare. Really easy to understand for an otherwise elusive concept (Yeah I know, I'm not that bright)

Thanks for simplified explanation 😊 I have a doubt, how is covariance matrix and Eigen vectors are related in PCA?

great video. clear explanation, thank you!

Just wanted to say your vids helped me a bunch in my intro to ML class. Despite being an intro class they kinda just throw equations up there and call it a day. These explanations are very intuitive. Thanks.

Dr. Starmer: please make a video on factor analysis! in general and in machine learning!! thanks!!!

Thank you for this wonderful video – turning a very abstract concept into something that we can interpret with (biological) meaning!

Josh please help me understand. I have 4 variables (columns)(X1,X2, X3, X4) and 8 entries (rows) then I find out pca1 which has a Eigen vector. How is this pca1 column filled. What are the 8 values of pca1 column? How do we get those from Eigen vector or Eigen value. Please help.Eagerly waiting

I loved your explanation. Keep up the good work. BAM! 🙂

Awesome! Thanks a lot for the great video. You nailed it!

Loved the video

please do insert small tunnel at starting sir, your channel was good

You are a savior

Wish there was a Video about robust PCA, that would be great.

and so talented musically

Hi there, can someone help me understand what he mean by "1,2,3 are more similar to each other than 4,5,6" at 1:40.Thank you and stay safe!

I am a non-cs Student and after watching your video i bet will be ahead of most cs students with ML as course.Thank U….

East or west

Statquest is the best

Thankyou very much! So much appreciate your work. I have a question for you sir.

q1) When calculating PC2, should the line must pass through the origin? or should we calculate the mean of PC1?

q2) How do we calculate the position of points in new PCA plot only by using distance between origin and projected points?

Josh, if you added a little animated character for your intro song I think we could make a video around the entire song, and you too could become a one-hit wonder!

how to know the slope is 0.25?

absolutely fantastic! Thank you

Best explanation of PCA I have ever seen! Congrat 🙂

great explanation !

fantastic video i ever seen in my life but i didnt get 4D . my bad

I like how well you explain this. Thank you.

1 Question, in the 3-dimensional example, how do you know what orientation to rotate PC1 to begin with?

In 2 dimensions it's pretty obvious, but in 3 it seems really complicated. Is it constricted to rotating between two axes?

The best tuto on PCA ever. All what I can say is Thank you

Thank you very much, you are the best !

BAM!!!!

I have watched many videos of yours and liked it a lot.

Have one query. With 4 variables for example, we obtained 4 PCs and 3 are found sufficient. In that case, the variable dimension reduces from 4 to 3. Now, my question is, does it mean removing one variable among 4? Does it mean we are left with same variable set but only 3 variables? Or does it mean that we will be ending up with 3 variables but with totally new data sets?

Awesome work as always Josh. Just one question. In regression, we try to minimize the sum of the squared vertical distances between the line and the actual point. But here we try to minimize the sum of the squared projected distances from the line to the actual point. Why do we try to minimize different distances in regression vs. pca?