Introduction to Text Analytics with R Part 1 | Overview | Video

June 3, 2020 Edgar Reiss World 43

This data science series introduces the viewer to the exciting world of text analytics with R programming. As exemplified by the popularity of blogging and social media, textual data if far from dead – it is increasing exponentially! Not surprisingly, knowledge of text analytics is a critical skill for data scientists if this wealth of information is to be harvested and incorporated into data products. This data science training provides introductory coverage of the following tools and techniques:

– Tokenization, stemming, and n-grams
– The bag-of-words and vector space models
– Feature engineering for textual data (e.g. cosine similarity between documents)
– Feature extraction using singular value decomposition (SVD)
– Training classification models using textual data
– Evaluating accuracy of the trained classification models

The overview of this video series provides an introduction to text analytics as a whole and what is to be expected throughout the instruction. It also includes specific coverage of:

– Overview of the spam dataset used throughout the series
– Loading the data and initial data cleaning
– Some initial data analysis, feature engineering, and data visualization

Kaggle Dataset:
https://www.kaggle.com/uciml/sms-spam-collection-dataset

The data and R code used in this series is available here:
https://code.datasciencedojo.com/datasciencedojo/tutorials/tree/master/Introduction%20to%20Text%20Analytics%20with%20R
—
Learn more about Data Science Dojo here:
https://datasciencedojo.com/data-science-bootcamp/

Watch the latest video tutorials here:
https://tutorials.datasciencedojo.com/

See what our past attendees are saying here:
https://datasciencedojo.com/bootcamp/reviews/#videos
—
Like Us: https://www.facebook.com/datasciencedojo
Follow Us: https://twitter.com/DataScienceDojo
Connect with Us: https://www.linkedin.com/company/datasciencedojo

Also find us on:
Instagram: https://www.instagram.com/data_science_dojo
Vimeo: https://vimeo.com/datasciencedojo

#rprogramming #textanalytics #rtutorial
Proudly WWW.PONIREVO.COM

Source

Post Views: 1,884

science tutorials

Vansh Jauhari says:

June 3, 2020 at 9:21 am

Hi Dave, my data is showing 2 missing values??
Aditya Raj says:

June 3, 2020 at 9:21 am

hey @Dave getting this as error
package or namespace load failed for ‘quanteda’ in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]):

namespace ‘rlang’ 0.4.2 is already loaded, but >= 0.4.3 is required
please help me out here please
AA AA says:

June 3, 2020 at 9:21 am

Thank you for your contribution for the world David! you are amazing and your parents are proud of you.
مشاعل says:

June 3, 2020 at 9:21 am

what if i want to exclude stop words from stop_words() list how can i do it? i tried to to make custom stopwords but it didn't work.
slk slk says:

June 3, 2020 at 9:21 am

To the MALAYALI data scientist who noticed something at 23:42
مشاعل says:

June 3, 2020 at 9:21 am

Hi.
what should i learn first? natural language processing or text analysis?
Vipul Gupta says:

June 3, 2020 at 9:21 am

Great video…to start with..nice job
Mohammed Asadi says:

June 3, 2020 at 9:21 am

Very helpful, thank you so much!
Alexandre Chaves says:

June 3, 2020 at 9:21 am

Dave, I've done several trainings during my career, both online and in-person, and I can assure you that your teaching style is the best I've ever known. Congratulations, you have the gift. Well done!
shun peng says:

June 3, 2020 at 9:21 am

this is a great video for me!
bedanta madhab gogoi says:

June 3, 2020 at 9:21 am

Thank you !!
raj kumar says:

June 3, 2020 at 9:21 am

Excellent video, thanks for this. Can you make some video with a multi-label classification problem?
Arunabh Lala says:

June 3, 2020 at 9:21 am

Does this series of videos give me the path to learn about extracting complex data in PDF file and then analysing them?

Sir, please do reply
Onimisi Esho says:

June 3, 2020 at 9:21 am

pls i am getting this error "Error in socketConnection(port = port, server = TRUE, blocking = TRUE, : cannot open the connection" when i run "cl <- makeCluster(3, type = "SOCK")" on my laptop
Kristy Burns says:

June 3, 2020 at 9:21 am

Love love love the way you teach! Thank you ?
Rohit Nagal says:

June 3, 2020 at 9:21 am

Here how we deal with corpus. And idf is negative in this case also ?
Rohit Nagal says:

June 3, 2020 at 9:21 am

If we have to deal with 1 lakh articles then tfidf is relevant. Basically i am working on question answering algorithm
evry1loveronica says:

June 3, 2020 at 9:21 am

what's the difference between text count and length of text? Thank you so much for the awesome tutorial
mhjrt says:

June 3, 2020 at 9:21 am

Great video, thanks!
Pradeep Velavali says:

June 3, 2020 at 9:21 am

Nice explanation. But one small question " how does it differentiate spam & ham data? , because everything we took is raw data & all are messages only here" . Thanks in advance.
David Currie says:

June 3, 2020 at 9:21 am

Kaggle has a download link for the spam.csv file – github doesn't seem to have a download option. Part 1 is great.
Sometimes Manic says:

June 3, 2020 at 9:21 am

This might be a silly question, but if you've installed ggplot or dplyr or any other package in a previous analysis (on the same machine), do you have to reinstall it EVERY time you want to use it? Or can you just install a bunch of packages once and then never have to do it again? Thanks for your videos, btw. I landed a pretty significant interview by watching these!
Kyle Nash says:

June 3, 2020 at 9:21 am

the GitHub is not working. thanks Dave
Avijit Nandy says:

June 3, 2020 at 9:21 am

Hello Sir, I have a request, Can i use your code and the learning and demonstrate this whole in Bengali language and upload it, am I allowed to that. There are a lot of people who use this language and might be helpful for them to understand. As you know helping some one to understand in there mother tongue is the best way to teach.

Thank You
Ronaldo Sperman says:

June 3, 2020 at 9:21 am

Hi

if i want search one word in one location in twitter, how i do??

i used this code
cand <- searchTwitter('WORD', n = 100 ,since="2014-05-06", until="2018-05-06" , geocode="-23.55052,-46.63331,km")

run, search the word but not in one location especific….
Ka Bian says:

June 3, 2020 at 9:21 am

Hi Would you treat it as an imbalanced data set?
Dan Reznik says:

June 3, 2020 at 9:21 am

length(which(!complete.cases(df)) can be written as sum(!complete.cases(df))
shubham rai says:

June 3, 2020 at 9:21 am

like ur style : Don't hasitate… Good one
Daud Khan says:

June 3, 2020 at 9:21 am

1.
install.packages(c("gglot2","e1071","caret","quantenda","irlba","randomForest"))

2.
spam.raw <- read.csv("spam.csv",stringsAsFactors = FALSE)
view(spam.raw)
Sibi J says:

June 3, 2020 at 9:21 am

Thank u so much! It's very much helpful!
Hailah AlArifi says:

June 3, 2020 at 9:21 am

Mr.Dave
I work on a project using text mining and I have some problem
H.AlArifi.1995@gamil.com
Can you send me an email so I can explain the problem I faced
Shweta Patil says:

June 3, 2020 at 9:21 am

It's really amazing!!
thank u so much.
W Sophia says:

June 3, 2020 at 9:21 am

Hi Dave, thank you so much for the great content. It really help me a lot of data analysis.
I recommend everyone else watch Dave's other videos especially the introduction to data science series. Very informative and easy to understand. Happy holiday.
Cheers
alexandre gouvea says:

June 3, 2020 at 9:21 am

Tkanks a lot for sharing your knowledge!
Sonila Kar says:

June 3, 2020 at 9:21 am

everything stops working as soon as I run:

"spam.raw <- read.csv("spam.csv", stringsAsFactors = FALSE, fileEncoding = "UTF-16")"

anyone else facing same problem?
Marco Anelli says:

June 3, 2020 at 9:21 am

Very, very good.If they can teach a 58 year old physiscian, they can teach anybody… 🙂
Morton Wakeland says:

June 3, 2020 at 9:21 am

OK, so what is R, say vs Voyant? or is R something else. Coming into this cold, not apparent. Thanks.
Sonja Wap says:

June 3, 2020 at 9:21 am

R won't separate rows that contain quotation marks when I use read.csv. How do I solve this?
Alexandre M. Batista says:

June 3, 2020 at 9:21 am

Great video. I'm from Brail, and my level of English is beginner. but his teaching is very good, and I understood the idea and examples well. Thank you. I signed the Channel and left my like! A "hello" from Brazil to you!
Smriti Kalra says:

June 3, 2020 at 9:21 am

Hi I am facing an issue. When I pass this command, spam.raw$TestLength <- nchar(spam.raw$Text) the entire row with text length reads 1000 for some wierd reason. Plus summary says Length 0, Class Null Mode Null. Can some one explain how to get this running?
Karthikeyan Ganesan says:

June 3, 2020 at 9:21 am

Wonderful video sir.. Do you have any lecture or material on unsupervised text analytics as well? unsupervised when I mean it is I have lots of server log and want to make some sense of out it.
Daniele Maccari says:

June 3, 2020 at 9:21 am

Wonderful content. Just what I need to get up to speed for my university project about text mining 😀
Kumar Kumar says:

June 3, 2020 at 9:21 am

Excellent Explanation, clear understanding about the topic. if anyone want to learn text analytics this is the best one .