时间:2023-06-11 02:39:02 | 来源:网站运营
时间:2023-06-11 02:39:02 来源:网站运营
一款开源且超好用的网站克隆机 HTTrack:Ctrl + C / Ctrl + V
去扒下内容,然而我并非是鼓励这种扒网站的行为,不过只要是开源的,不用于商业用途,我觉得大家都可以相互借鉴一下的啦,毕竟重复造轮子的事情就是在浪费时间。而通过 Ctrl + C / Ctrl + V
这种方式过于麻烦,并且现在的图片基本上都有防盗链了,或者路径由图床改成了本地的相对路径,单纯的复制粘贴很难把网站的相关内容扒的干净,于是我们有了如下的思考:如何才能将一个网站的内容完整的 clone 下来呢?view-source:https://xxx.xxx.xxx
这种方式查看网页的源代码,新建一个 index.html 文件,然后将内容复制粘贴到 index.html 内容中,或者直接 wget 下来也是可以的。但就像我上面说的那样,没有办法完整的拷贝网页上的全部内容。随着学习的深入,了解到了 python 爬虫的时候,有过这种实例,但是在实现效果上并不是那么友好。WebZip
等等,据说挺好用的,咱也没试过,咱也不确定啊(多年不用 Windows )。今天我给大家介绍一款开源且超好用的网站克隆机 httrack
。# Debian/Ubuntu下安装sudo apt install httrack# CentOS/Fedora下安装sudo yum install httrack# Gentoo下安装sudo emerge httrack
sudo port install httrack# 或者brew install httrack
git clone https://github.com/xroche/httrack.git --recursecd httrack./configure --prefix=$HOME/usr && make -j8 && make install
具体参考:http://www.httrack.com/page/2/en/index.htmlhttrack --help
为例,来演示其操作过程。Welcome to HTTrack Website Copier (Offline Browser) 3.49-2Copyright (C) 1998-2017 Xavier Roche and other contributorsTo see the option list, enter a blank line or try httrack --help# 1. 输入待生成的项目名称Enter project name :progit# 2. 输入待保存的项目所在的路径Base path (return=/Users/apple/websites/) :/Users/apple/Desktop# 3. 输入需要克隆的网站的 urlEnter URLs (separated by commas or blank spaces) :https://progit.bootcss.com/Action:(enter) 1 Mirror Web Site(s) 2 Mirror Web Site(s) with Wizard 3 Just Get Files Indicated 4 Mirror ALL links in URLs (Multiple Mirror) 5 Test Links In URLs (Bookmark Test) 0 Quit:# 4. 没有特别要求直接回车即可Proxy (return=none) :You can define wildcards, like: -*.gif +www.*.com/*.zip -*img_*.zip# 5. 没有特别要求直接回车即可Wildcards (return=none) :You can define additional options, such as recurse level (-r<number>), separated by blank spacesTo see the option list, type help# 6. 没有特别要求直接回车即可Additional options (return=none) :---> Wizard command line: httrack https://progit.bootcss.com/ -O "/Users/apple/Desktop/progit" -%vReady to launch the mirror? (Y/n) :YMirror launched on Thu, 15 Aug 2019 11:54:40 by HTTrack Website Copier/3.49-2 [XR&CO'2014]mirroring https://progit.bootcss.com/ with the wizard help..Done.Thanks for using HTTrack!*