通过微信转账被人骗了钱微信被骗300能否追回回?

Java_爬虫,如何抓取Js动态生成数据的页面? - ITeye问答
很多网站是用js或Jquery 生成数据的,到后台获取到数据以后,用 document.write()或者("#id").html="" 的方式 写到页面中,这个时候用浏览器查看源码是看不到数据的。HttpClient是不行的,看网上说HtmlUnit,说
可以获取后台js加载完后的完整页面,但是我按照文章上说的 写了 ,都不好使。
String url = "/admin/main/flrpro.do";
WebClient webClient = new WebClient(BrowserVersion.FIREFOX_10);
//设置webClient的相关参数
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setCssEnabled(false);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
//webClient.getOptions().setTimeout(50000);
webClient.getOptions().setThrowExceptionOnScriptError(false);
//模拟浏览器打开一个目标网址
HtmlPage rootPage = webClient.getPage(url);
System.out.println("为了获取js执行的数据 线程开始沉睡等待");
Thread.sleep(3000);//主要是这个线程的等待 因为js加载也是需要时间的
System.out.println("线程结束沉睡");
String html = rootPage.asText();
System.out.println(html);
} catch (Exception e) {
其实这段代码不好使。求解答,其中典型的就是这个链接的页面,怎么能在java程序中获取其中的数据?/admin/main/flrpro.do
采纳的答案
我之前也遇到过这个问题,网上说法很多,不过觉得都没有解决问题,后来相过有什么功能可以获取请求某一个url地址时所附带请求的其他链接地址,但是这个好像说是用抓包可以实现,不过我没实现
只能采用最原始的方法就是就是自己去模拟一个请求,将js中ajax的链接地址拼接出来,再次进行请求,这个时候需要注意post方式还是get方法
兄弟, 你这个弄出来了吗? 我有个类似也抓取不了& help me
http://credit./XYXX/admin_client/form_designer/special/index.html?id=集友银行有限公司福州分行
看一下页面的js执行的什么异步请求,直接抓取异步请求的url,
htmlunit 你试试
已解决问题
未解决问题yrk5631483 的BLOG
用户名:yrk5631483
文章数:16
评论数:12
访问量:5738
注册日期:
阅读量:5863
阅读量:12276
阅读量:412961
阅读量:1100730
51CTO推荐博文
今天在群里有人讨论到了网络爬虫原理,在此,我就写了一个简单的网络爬虫,由于时间仓促,存在很多不规范,望大家担待,但基本原理,代码中已经体现了。愿大家学习开心。
import java.io.BufferedInputS
import java.io.IOE
import java.io.InputS
import java.net.HttpC
import java.net.HttpURLC
import java.net.MalformedURLE
import java.net.S
import java.net.URL;
import java.net.URLC
import java.net.UnknownHostE
import java.util.ArrayL
import java.util.L
public class SocketScan {
& & & & private static final int MAX_SIZE = 5;
& & & & public static List&String& httpContextList = new ArrayList&String&();
& & & & public static void main(String[] args) {
& & & & & & & & // 得到网站URL,并读取出来
& & & & & & & & String httpContext = searchHttpContexts(&http://10.125.2.36:8080/FileUpload/test.html&);
& & & & & & & & System.out.println(&httpContext size: &+httpContextList.size());
& & & & & & & &
& & & & & & & & for (String string : httpContextList) {
& & & & & & & & & & & & System.out.println(string);
& & & & & & & & & & & & System.out.println();
& & & & & & & & & & & & System.out.println(&分隔符==============================================================================&);
& & & & & & & & & & & & System.out.println();
& & & & & & & & }
& & & & & & & &
& & & & private static List&String& GetURLByHttpContext(String httpContext) {
& & & & & & & & List&String& urlList = new ArrayList&String&();
& & & & & & & & String mark = &href=\&&;
& & & & & & & & int len = mark.length();
& & & & & & & & int start = 0;
& & & & & & & & int end = 0 ;
& & & & & & & & while((start = httpContext.indexOf(mark,start))!=-1){
& & & & & & & & & & & & start = start +
& & & & & & & & & & & & end = httpContext.indexOf(&\&&,start);
& & & & & & & & & & & & urlList.add(httpContext.substring(start,end));
& & & & & & & & }
& & & & & & & & return urlL
& & & & private synchronized static String searchHttpContexts(String urlPath) {
& & & & & & & & try {
& & & & & & & & & & & & if(httpContextList.size() & MAX_SIZE){
& & & & & & & & & & & & & & & &
& & & & & & & & & & & & }
& & & & & & & & & & & & String sb = getHttpContext(urlPath);
& & & & & & & & & & & & httpContextList.add(sb);
& & & & & & & & & & & &
& & & & & & & & & & & & List&String& urlList = GetURLByHttpContext(sb.toString());
& & & & & & & & & & & & if(urlList.size() &0){
& & & & & & & & & & & & & & & & for (String subUrl : urlList) {
& & & & & & & & & & & & & & & & & & & & String subHttpContext = searchHttpContexts(subUrl);
& & & & & & & & & & & & & & & & & & & & if(httpContextList.size() & MAX_SIZE){
& & & & & & & & & & & & & & & & & & & & & & & &
& & & & & & & & & & & & & & & & & & & & }
& & & & & & & & & & & & & & & & & & & & httpContextList.add(subHttpContext);
& & & & & & & & & & & & & & & & }
& & & & & & & & & & & & }
& & & & & & & & & & & &
& & & & & & & & } catch (UnknownHostException e) {
& & & & & & & & & & & & // TODO Auto-generated catch block
& & & & & & & & & & & & e.printStackTrace();
& & & & & & & & } catch (IOException e) {
& & & & & & & & & & & & // TODO Auto-generated catch block
& & & & & & & & & & & & e.printStackTrace();
& & & & & & & & }
& & & & & & & &
& & & & private static String getHttpContext(String urlPath)
& & & & & & & & & & & & throws MalformedURLException, IOException {
& & & & & & & & URL url = new URL(urlPath);
& & & & & & & & URLConnection conn = url.openConnection();
& & & & & & & & BufferedInputStream input = new BufferedInputStream(conn.getInputStream());
& & & & & & & & byte[] b = new byte[1024];
& & & & & & & &
& & & & & & & & StringBuilder sb = new StringBuilder();
& & & & & & & & while ((temp = input.read(b)) != -1) {
& & & & & & & & & & & & String value = new String(b);
& & & & & & & & & & & & sb.append(value);
& & & & & & & & }
& & & & & & & & return sb.toString();
}本文出自 “” 博客,请务必保留此出处
了这篇文章
类别:未分类┆阅读(0)┆评论(0)下次自动登录
现在的位置:
& 综合 & 正文
一个简单的java网络爬虫(spider)
一个简单的java网络爬虫,由于时间原因,没有进一步解释.
需要的htmlparser.jar包到官方网上去下.
---------------------------------------------Spider.java-----------------------------------------------------------------
import java.io.BufferedR
import java.io.InputStreamR
import java.net.URL;
import java.net.URLC
import java.util.ArrayL
import java.util.HashM
import java.util.I
import java.util.L
import org.htmlparser.RemarkN
import org.htmlparser.StringN
import org.htmlparser.N
import org.htmlparser.tags.*;
import org.htmlparser.P
import org.htmlparser.filters.StringF
import org.htmlparser.util.NodeI
import org.htmlparser.util.NodeL
import org.htmlparser.util.ParserE
import java.util.Q
import java.util.LinkedL
public class Spider implements Runnable {
boolean search_key_words = false;
int count = 0;
int limitsite = 10;
int countsite = 1;
String keyword = "中国";
Parser parser = new Parser();
String startsite = "";
SearchResultB
List resultlist = new ArrayList();
List searchedsite = new ArrayList();
Queue linklist = new LinkedList();
HashMap&String, ArrayList&String&& disallowListCache = new HashMap&String, ArrayList&String&&();
public Spider(String keyword, String startsite) {
this.keyword =
this.startsite =
linklist.add(startsite);
srb = new SearchResultBean();
public void run() {
search(linklist);
public void search(Queue queue) {
String url = "";
while(!queue.isEmpty()){
url = queue.peek().toString();
if (!isSearched(searchedsite, url)) {
if (isRobotAllowed(new URL(url)))
processHtml(url);
System.out.println("this page is disallowed to search");
} catch (Exception ex) {
queue.remove();
public void processHtml(String url) throws ParserException, Exception {
searchedsite.add(url);
count = 0;
System.out.println("searching ... :" + url);
parser.setURL(url);
parser.setEncoding("GBK");
URLConnection uc = parser.getConnection();
uc.connect();
NodeIterator nit = parser.elements();
while (nit.hasMoreNodes()) {
Node node = nit.nextNode();
parserNode(node);
srb.setKeywords(keyword);
srb.setUrl(url);
srb.setCount_key_words(count);
resultlist.add(srb);
System.out.println("count keywords is :" + count);
System.out.println("----------------------------------------------");
public void dealTag(Tag tag) throws Exception {
NodeList list = tag.getChildren();
if (list != null) {
NodeIterator it = list.elements();
while (it.hasMoreNodes()) {
Node node = it.nextNode();
parserNode(node);
public void parserNode(Node node) throws Exception{
if (node instanceof StringNode) {
StringNode sNode = (StringNode)
StringFilter sf = new StringFilter(keyword,false);
search_key_words = sf.accept(sNode);
if (search_key_words) {
} else if (node instanceof Tag) {
Tag atag = (Tag)
if (atag instanceof TitleTag) {
srb.setTitle(atag.getText());
if (atag instanceof LinkTag) {
LinkTag linkatag = (LinkTag)
checkLink(linkatag.getLink(), linklist);
dealTag(atag);
} else if (node instanceof RemarkNode) {
public void checkLink(String link, Queue queue) {
if (link != null && !link.equals("") && link.indexOf("#") == -1) {
if (!link.startsWith("http://") && !link.startsWith("ftp://")
&& !link.startsWith("www.")) {
link = "file:///" +
} else if (link.startsWith("www.")) {
link = "http://" +
if (queue.isEmpty())
queue.add(link);
String link_end_=link.endsWith("/")?link.substring(0,link.lastIndexOf("/")):(link+"/");
if (!queue.contains(link)&&!queue .contains(link_end_)) {
queue.add(link);
public boolean isSearched(List list, String url) {
String url_end_ = "";
if (url.endsWith("/")) {
url_end_ = url.substring(0, url.lastIndexOf("/"));
url_end_ = url + "/";
if (list.size() & 0) {
if (list.indexOf(url) != -1 || list.indexOf(url_end_) != -1) {
return true;
return false;
private boolean isRobotAllowed(URL urlToCheck) {
String host = urlToCheck.getHost().toLowerCase();
ArrayList&String& disallowList = disallowListCache.get(host);
if (disallowList == null) {
disallowList = new ArrayList&String&();
URL robotsFileUrl = new URL("http://" + host + "/robots.txt");
BufferedReader reader = new BufferedReader(
new InputStreamReader(robotsFileUrl.openStream()));
while ((line = reader.readLine()) != null) {
if (line.indexOf("Disallow:") == 0) {
String disallowPath = line.substring("Disallow:"
.length());
int commentIndex = disallowPath.indexOf("#");
if (commentIndex != -1) {
disallowPath = disallowPath.substring(0,
commentIndex);
disallowPath = disallowPath.trim();
disallowList.add(disallowPath);
for (Iterator it = disallowList.iterator(); it.hasNext();) {
System.out.println("Disallow is :" + it.next());
disallowListCache.put(host, disallowList);
} catch (Exception e) {
return true;
String file = urlToCheck.getFile();
for (int i = 0; i & disallowList.size(); i++) {
String disallow = disallowList.get(i);
if (file.startsWith(disallow)) {
return false;
return true;
public static void main(String[] args) {
Spider ph = new Spider("英超", "");
Thread search = new Thread(ph);
search.start();
} catch (Exception ex) {
--------------------------------------SearchResultBean.java---------------------------------------------------------
public class SearchResultBean {
String url = "";
String title = "";
String keywords = "";
int count_key_words = 0;
public int getCount_key_words() {
return count_key_
public void setCount_key_words(int count_key_words) {
this.count_key_words = count_key_
public String getKeywords() {
public void setKeywords(String keywords) {
this.keywords =
public String getTitle() {
public void setTitle(String title) {
this.title =
public String getUrl() {
public void setUrl(String url) {
this.url =
&&&&推荐文章:
【上篇】【下篇】

我要回帖

更多关于 微信转账被骗能否退款 的文章

 

随机推荐