offline
- ljuba973
- Novi MyCity građanin
- Pridružio: 05 Nov 2007
- Poruke: 21
- Gde živiš: Malta
|
Pozdrav svima,
Zeleo bih da isparsiram HTML stranicu i pokupim podatke
Adresa je: google.com/search?q=link:www.elitesecurity.org&hl=en&num=42
Za svaki nadjeni item u rekordsetu na stranici postoji blok kao npr:
<h2 class="r"><a href="http://www.pungas.com/?id=1481&prikaz=vijest" class="l" onmousedown="return rwt(this,'','','res','3','AFQjCNHOKd-B2TrSznDjSfLAvl2kbcz_pA','&sig2=AzDRw-xTiIILH42wpajfgA')">Pungas.Com - - AutoMoto vijesti, sport, Formula 1</a></h2>
Za ovaj konkretno primer zeleo bih da pokupim 5 podataka:
pungas.com/?id=1481&prikaz=vijest
3
AFQjCNHOKd-B2TrSznDjSfLAvl2kbcz_pA
AzDRw-xTiIILH42wpajfgA
Pungas.Com - - AutoMoto vijesti, sport, Formula 1
Napravio sam PHP skriptu:
<?php
$url = "http://www.google.com/search?q=link:www.elitesecurity.org&hl=en&num=42";
$v = file_get_contents($url);
preg_match_all('/\<h2 class="r"\>\<a href="(.*?)" class="l" onmousedown="return rwt(this,\'\',\'\',\'res\',\'(.*?)\',\'(.*?)\',\'(
.*?)\')"\>(.*?)\<\/a\>\<\/h2\>/si',$v,$r);
$i = 0;
while ($i < 42) {
$adresa1 = ($r[$i][0]) ? $r[$i][0] : '0';
$adresa2 = ($r[$i][1]) ? $r[$i][1] : '0';
$adresa3 = ($r[$i][2]) ? $r[$i][2] : '0';
$adresa4 = ($r[$i][3]) ? $r[$i][3] : '0';
$adresa5 = ($r[$i][4]) ? $r[$i][4] : '0';
echo $i+1 . ":<br>" . $adresa1 . "<br>" .$adresa2 . "<br>" .$adresa3 . "<br>" .$adresa4 . "<br>" .$adresa5 . "<hr>";
$i++;
};
?>
Ali mi stalno vraca 0 kao rezultat za svih 42 pronadjena
Da li mi neko moze pomoci i objasniti u cemu je greska - da konacno razumem RE i da nemoram da gnjavim okolo.
Hvala unapred
Aleksandar
Dopuna: 05 Nov 2007 14:33
Znaci pomenuti link google.com/search?q=link:www.elitesecurity.org&hl=en&num=42
vraca result stranu:
some not important code1
<h2 class="r"><a href="http://asp-cyber.law.harvard.edu/filtering/list.html" class="l" onmousedown="return rwt(this,'','','res','1','AFQjCNFVCAwOP070r0f0EZkBm5Yfj9r5yQ','&sig2=LBWIL-caH6ZEWzrkho21VQ')">Sites Inaccessible in China - Documentation of Internet Filtering <b>...</b></a></h2>
some not important code2
<h2 class="r"><a href=" pungas.com/index.php?prikaz=vijest&id=2986" class="l" onmousedown="return rwt(this,'','','res','2','AFQjCNEcA--vhR4SXd7w3L4ApxnJkTXPaA','&sig2=kaiXF-17P-9weejPPVRigA')"> Pungas.Com - U helikopterskoj nesreći poginuo Colin McRae <b>...</b></a></h2>
some not important code3
<h2 class="r"><a href="http://www.pungas.com/?id=1481&prikaz=vijest " class="l" onmousedown="return rwt(this,'','','res','3','AFQjCNHOKd-B2TrSznDjSfLAvl2kbcz_pA','&sig2=gVXoUEcID10ZW954tA6g1A')">Pungas.Com - - AutoMoto vijesti, sport, Formula 1</a></h2>
some not important code4
<h2 class="r"><a href="http://www.pungas.com/forum/potrebne-naocale-t-19770.html" class="l" onmousedown="return rwt(this,'','','res','4','AFQjCNG9e4ZZFF7YNxUfscCxB746gzW8tg','&sig2=fElR-J06DwZ7E9Gcq8KqnA')">Potrebne naocale!!!</a></h2>
some not important code5
itd ... do maksimalno 42
some not important codeX
Znaci ponavljajuca grupa koju ciljam je <h2...> </h2>
Moj skript bi morao da vrati:
1:
asp-cyber.law.harvard.edu/filtering/list.html
1
AFQjCNFVCAwOP070r0f0EZkBm5Yfj9r5yQ
&sig2=LBWIL-caH6ZEWzrkho21VQ
Sites Inaccessible in China - Documentation of Internet Filtering
2:
pungas.com/index.php?prikaz=vijest&id=2986
2
AFQjCNEcA--vhR4SXd7w3L4ApxnJkTXPaA
&sig2=kaiXF-17P-9weejPPVRigA
Pungas.Com - U helikopterskoj nesreći poginuo Colin McRae
3:
pungas.com/?id=1481&prikaz=vijest
3
AFQjCNHOKd-B2TrSznDjSfLAvl2kbcz_pA
&sig2=gVXoUEcID10ZW954tA6g1A
Pungas.Com - - AutoMoto vijesti, sport, Formula 1
4:
pungas.com/forum/potrebne-naocale-t-19770.html
4
AFQjCNG9e4ZZFF7YNxUfscCxB746gzW8tg
&sig2=fElR-J06DwZ7E9Gcq8KqnA
Potrebne naocale!!!
Ali meni vraca samo 0-le
|