4. Data on the Web

some data will be given to you.

but most of the world’s data is uncollected and unorganized (despite google’s best efforts)

finding and using that data - that’s where we make our money

Now that we can use Python and Pandas to see(born) the data better (ouch), the only thing stopping us is a lack of (“big”) data. So, you know what we need to do…

It’s time to hack stuff.1

Over the next two weeks*, we are going to cover(ish)

  • Scraping data from the web (finding, downloading, and saving tables and webpages)

  • Accessing the tables and text within HTML documents

  • How to (start to) harness the raw power of all that text

And when we are done? OH BOY! Then we will be:

“Lehigh University does not condone the use of hacking, even as a joke about our students’ very impressive skills” - Our general counsel, probably