How to generate a random yet valid website link, regardless of languages. Actually, the more diverse the language of the website it generates, the better it is.
I’ve been doing it by using other people’s script on their webpage, how can i not rely on these random site forwarding script and make my own?. I’ve been doing it as such:
import webbrowser from random import choice random_page_generator = ['http://www.randomwebsite.com/cgi-bin/random.pl', 'http://www.uroulette.com/visit'] webbrowser.open(choice(random_page_generator), new=2)
I’ve been doing it by using other people’s script on their webpage, how can i not rely on these random site forwarding script and make my own?
There are two ways to do this:
- Create your own spider that amasses a huge collection of websites, and pick from that collection.
- Access some pre-existing collection of websites, and pick from that collection. For example, DMOZ/ODP lets you download their entire database;* Google used to have a customized random site URL;** etc.
There is no other way around it (short of randomly generating and testing valid strings of arbitrary characters, which would be a ridiculously bad idea).
Building a web spider for yourself can be a fun project. Link-driven scraping libraries like Scrapy can do a lot of the grunt work for you, leaving you to write the part you care about.
* Note that ODP is a pretty small database compared to something like Google’s or Yahoo’s, because it’s primarily a human-edited collection of significant websites rather than an auto-generated collection of everything anyone has put on the web.
** Google’s random site feature was driven by both popularity and your own search history. However, by feeding it an empty search history, you could remove that part of the equation. Anyway, I don’t think it exists anymore.
A conceptual explanation, not a code one.
Their scripts are likely very large and comprehensive. If it’s a random website selector, they have a huge, huge list of websites line by line, and the script just picks one. If it’s a random URL generator, it probably generates a string of letters (e.g. “asljasldjkns”), plugs it between
.com, tries to see if it is a valid URL, and if it is, sends you that URL.
The easiest way to design your own might be to ask to have a look at theirs, though I’m not certain of the success you’d have there.
The best way as a programmer is simply to decipher the nature of URL language. Practice the building of strings and testing them, or compile a huge database of them yourself.
As a hybridization, you might try building two things. One script that, while you’re away, searches for/tests URLs and adds them to a database. Another script that randomly selects a line out of this database to send you on your way. The longer you run the first, the better the second becomes.
EDIT: Do Abarnert’s thing about spiders, that’s much better than my answer.
The other answers suggest building large databases of URL, there is another method which I’ve used in the past and documented here:
Which is to create a random IP address and then try and grab a site from port 80 of that address. This method is not perfect with modern virtual hosted sites, and of course only fetches the top page but it can be an easy and effective way of getting random sites. The code linked above is C but it should be easily callable from python, or the method could be easily adapted to python.