Tổng Hợp

StatQuest: Principal Component Analysis (PCA), Step-by-Step

Principal Component Analysis, is one of the most useful data analysis and machine learning methods out there. It can be used to identify patterns in highly complex datasets and it can tell you what variables in your data are the most important. Lastly, it can tell you how accurate your new understanding of the data actually is.

In this video, I go one step at a time through PCA, and the method used to solve it, Singular Value Decomposition. I take it nice and slowly so that the simplicity of the method is revealed and clearly explained.

There is a minor error at 1:47: Points 5 and 6 are not in the right location

If you are interested in doing PCA in R see:

If you are interested in learning more about how to determine the number of principal components, see:

For a complete index of all the StatQuest videos, check out:

If you’d like to support StatQuest, please consider…
YouTube Membership:

…a cool StatQuest t-shirt or sweatshirt (USA/Europe):

…buying one or two of my songs (or go large and get a whole album!)

…or just donating to StatQuest!

Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:

0:00 Awesome song and introduction
0:30 Conceptual motivation for PCA
3:23 PCA worked out for 2-Dimensional data
5:03 Finding PC1
12:08 Singular vector/value, Eigenvector/value and loading scores defined
12:56 Finding PC2
14:14 Drawing the PCA graph
15:03 Calculating percent variation for each PC and scree plot
16:30 PCA worked out for 3-Dimensional data

#statquest #PCA #ML

Nguồn: https://scoutdawson.com/

Xem thêm bài viết khác: https://scoutdawson.com/tong-hop/

NewsX@9: 2G spectrum auction flops; BJP, Cong play blame game

Previous article

Black Ops 2 Zombies: First Ever Game "Town" | Lucky Ray Gun!

Next article

You may also like


  1. Best website to learn fundamental concepts in the most simplest manner

  2. First, great video and this helped tremendously in my understanding of how PCA works. My question is about how one begins to calculate PC levels 4 and higher since there is not a good visual analogy to these higher dimensional components. Does this require some sort of matrix algebra? Thanks again for the great video!

  3. The sum of all percentage variations has to be hundred right?

  4. if the number of pcs is reduced from 3 to 2 because of their variation then the plot of us has changed from 3d to 2d ,now from this can we conclude that we are removing the gene 3 for the representation of our data.

  5. so how did pca can be used for the dimensionality reduction

  6. Hi! can I just say that please make more videos on topics that you understand because your way of teaching is super-rare. Really easy to understand for an otherwise elusive concept (Yeah I know, I'm not that bright)

  7. Thanks for simplified explanation 😊 I have a doubt, how is covariance matrix and Eigen vectors are related in PCA?

  8. great video. clear explanation, thank you!

  9. Just wanted to say your vids helped me a bunch in my intro to ML class. Despite being an intro class they kinda just throw equations up there and call it a day. These explanations are very intuitive. Thanks.

  10. Dr. Starmer: please make a video on factor analysis! in general and in machine learning!! thanks!!!

  11. Thank you for this wonderful video – turning a very abstract concept into something that we can interpret with (biological) meaning!

  12. Josh please help me understand. I have 4 variables (columns)(X1,X2, X3, X4) and 8 entries (rows) then I find out pca1 which has a Eigen vector. How is this pca1 column filled. What are the 8 values of pca1 column? How do we get those from Eigen vector or Eigen value. Please help.Eagerly waiting

  13. I loved your explanation. Keep up the good work. BAM! 🙂

  14. Awesome! Thanks a lot for the great video. You nailed it!

  15. Loved the video

  16. please do insert small tunnel at starting sir, your channel was good

  17. You are a savior

  18. Wish there was a Video about robust PCA, that would be great.

  19. and so talented musically

  20. Hi there, can someone help me understand what he mean by "1,2,3 are more similar to each other than 4,5,6" at 1:40.Thank you and stay safe!

  21. I am a non-cs Student and after watching your video i bet will be ahead of most cs students with ML as course.Thank U….

  22. East or west
    Statquest is the best

  23. Thankyou very much! So much appreciate your work. I have a question for you sir.
    q1) When calculating PC2, should the line must pass through the origin? or should we calculate the mean of PC1?
    q2) How do we calculate the position of points in new PCA plot only by using distance between origin and projected points?

  24. Josh, if you added a little animated character for your intro song I think we could make a video around the entire song, and you too could become a one-hit wonder!

  25. how to know the slope is 0.25?

  26. absolutely fantastic! Thank you

  27. Best explanation of PCA I have ever seen! Congrat 🙂

  28. great explanation !

  29. fantastic video i ever seen in my life but i didnt get 4D . my bad

  30. I like how well you explain this. Thank you.

    1 Question, in the 3-dimensional example, how do you know what orientation to rotate PC1 to begin with?

    In 2 dimensions it's pretty obvious, but in 3 it seems really complicated. Is it constricted to rotating between two axes?

  31. The best tuto on PCA ever. All what I can say is Thank you

  32. Thank you very much, you are the best !

  33. BAM!!!!

  34. I have watched many videos of yours and liked it a lot.
    Have one query. With 4 variables for example, we obtained 4 PCs and 3 are found sufficient. In that case, the variable dimension reduces from 4 to 3. Now, my question is, does it mean removing one variable among 4? Does it mean we are left with same variable set but only 3 variables? Or does it mean that we will be ending up with 3 variables but with totally new data sets?

  35. Awesome work as always Josh. Just one question. In regression, we try to minimize the sum of the squared vertical distances between the line and the actual point. But here we try to minimize the sum of the squared projected distances from the line to the actual point. Why do we try to minimize different distances in regression vs. pca?

Leave a reply

Your email address will not be published. Required fields are marked *

More in Tổng Hợp