4. Data on the Web¶
some data will be given to you.
but most of the world’s data is uncollected and unorganized (despite google’s best efforts)
finding and using that data - that’s where we make our money
Now that we can use Python and Pandas to see(born) the data better (ouch), the only thing stopping us is a lack of (“big”) data. So, you know what we need to do…
It’s time to hack stuff.1
Over the next two weeks*, we are going to cover(ish)
Scraping data from the web (finding, downloading, and saving tables and webpages)
Accessing the tables and text within HTML documents
How to (start to) harness the raw power of all that text
And when we are done? OH BOY! Then we will be:
- 1
“Lehigh University does not condone the use of hacking, even as a joke about our students’ very impressive skills” - Our general counsel, probably