Scraping Source in Safari

Sun, 2008 Mar 30, 6:19pm

Here is an applescript solution for grabbing the html code from a page online. This is particularly handy if you are trying to grab the code from a page that you need to login to. I am sure there is a much better solution out there, but this one seems to work for me ok.

CODE:
  1. -- Define the page to save the document and the url
  2. set the pageFile to "/Users/yourUserNameHere/Desktop/safariSource.html"
  3. set the pageUrl to "http://www.plasticstare.com/"
  4.  
  5. -- define the applescript to run
  6.  
  7. tell application "Safari"
  8.    activate
  9.    make new document at end of documents
  10.    set URL of document 1 to pageUrl
  11. end tell
  12.  
  13. set web_page_is_loaded to false
  14. --check if page has loaded
  15. repeat
  16.    delay 0.5
  17.    tell application "System Events" to tell application process "Safari"
  18.       if (name of static text 1 of group 1 of window 1 as text) begins with "Contacting" or (name of static text 1 of group 1 of window 1 as text) begins with "Loading" then
  19.          -- do nothing
  20.       else
  21.          exit repeat
  22.       end if
  23.    end tell
  24. end repeat
  25.  
  26. tell application "Safari"
  27.    set siteSource to the source of document 1 as text
  28.    set theFile to open for access (pageFile) as POSIX file with write permission
  29.    set eof of theFile to 0
  30.    write siteSource to theFile
  31.    close access theFile
  32. end tell

Entry Filed under: apple, applescript, coding, downloadable, geek, osx, technology

Leave a Comment

Required

Required, hidden

Some HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Trackback this post  |  Subscribe to the comments via RSS Feed


Site Decryption

info = { PlaIns: "the section of the PlasticStare site, the digital external brain-repository of Ryan Todd, whose brain otherwise occupies space in San Francisco, CA", ryota: "mungified version of ryan's name", haikuBio: "robotic from birth. hears sounds - listens to music. makes pixels act." }

Calendar

March 2008
S M T W T F S
« Feb   Apr »
 1
2345678
9101112131415
16171819202122
23242526272829
3031  

Most Recent Posts