Saving offline copy of Google Python tutorial using python. -



Saving offline copy of Google Python tutorial using python. -

i trying write python code save offline re-create of "google python tutorial" can access file when not connected internet. importing next libraries - urllib, re, beautifulsoup, os thought identify urls under navigation path (class - gc-toc) , loop through each url , save html file locally. below code same.

my questions are,

the downloaded html files tries access css , js files online. how can download these files through program?

the whole programme seem cumbersome @ moment. can suggest ways improve it? ex, avoid using re , utilize beautifulsoup extract links under 'gc-toc' class.

import urllib import re beautifulsoup import * import os #the url tags scraped url = 'https://developers.google.com/edu/python/' html = urllib.urlopen(url).read() soup = beautifulsoup(html) #the scraped tags contain relative path. need append baseurl downloading base_url = 'https://developers.google.com' #save path save_path = 'd:\my local directory' urllist = list() # retreive anchor tags tags = soup.findall('nav',{'class':'gc-toc'}) tag in re.findall('a href="(.+?)" title="',str(tags)): urllist.append(tag) print 'the number of links extracted is', len(urllist) print '----------printing urls---------------' url in urllist: full_url = urllib.basejoin(base_url, url) if url.find('youtube') > 0: go on #open webpage , read html print 'opening webpage file: ', full_url response = urllib.urlopen(full_url) response_html = response.read() #save html file offlne print 'saving html file ', url.split('/')[-1] +'.htm' output_file = open(os.path.join(save_path, url.split('/')[-1] +'.htm'),'w') output_file.write(response_html) output_file.close()

you might not want utilize python. if want html page utilize wget. wget http://my.url html of page if that's want. alternately, using first-class requests api, similar this.

import requests open('page', 'w').write(requests.get(url).text)

python python-2.7 beautifulsoup

Comments

Popular posts from this blog

formatting - SAS SQL Datepart function returning odd values -

c++ - Apple Mach-O Linker Error(Duplicate Symbols For Architecture armv7) -

php - Yii 2: Unable to find a class into the extension 'yii2-admin' -