How can I code in Java the following project to scrap all the results of all the students from a website and save it to a text file in my computer?
-
I am planning to do this project out of my own interest. The project is automating the process of getting university results of all the students based on thier register number (reg. no.) and date of birth (dob) and save it to a text file in my computer. So I have planned to write a java program which can increment registration number from 10000 to 99999 and Date of birth from 01-01-1980 to 31-12-1990 and feed these two fields to the webpage so that it will return the result. As I know only java, I am struck in proceeding to next step 1) How should my Java program feed the data to the webpage text fields (reg no. and dob)? 2) How should my Java program receive the result from the webpage and save it to text file? I can think of the java logic, but since I have no idea/minimal idea about webpage, I am struck with above two problems. Please help me. PS: The website doesn't do CAPTCHA check. So that is not a problem for me. I know this can be done easily in Python, but since I want to expertise myself in java, I would like to do it in java.
-
Answer:
I hate to be a spoil sport but I need to ask: Do you have the right to this information? It sounds like you're planning to run every combination of registration number and date of birth and capture any results that return valid data. If this data is protected by any kind of security or there is a disclaimer on the site that indicates that data is only for use by the student, you are actually committing a crime by writing such a program. See http://www.law.cornell.edu/uscode/text/18/1030 And since it does involve a date of birth this can be construed as identity theft as well. By the way, if you're hitting the site that repetitively, it may be considered a Denial-of-Service attack. Especially if there is a robots.txt file present and you're not honoring it. Sites can and do react badly to such things. Like cutting off your IP at their firewall and calling the authorities to figure out who's trying to hack their system. As to a hypothetical how: You need to figure out the structure of the web form, duplicate its GET/POST request with HttpClient, capture the response and interrogate the MIME type. It will most likely be HTML. Examine the resulting data to find the markup containing the data you're looking for. Find a good HTML markup parser in Java, load the response, walk the tree and extract the relevant data. Format the data and either hold it in memory or write it to an open PrintWriter or OutputStream that is linked to the text file locally you want to create. What I have just described is a general web scraper. It's not an uncommon thing to pull data. Technically straightforward. Just please make sure you have the legal right to do this. I suspect you don't and you'd be better served doing something similar on a public website that looks up product information or something.
Matt Pickering at Quora Visit the source
Other answers
I think Selenium Web Driver for Java would be the best match. http://docs.seleniumhq.org/projects/webdriver/
Varun Bilurkar
There is a HTTPClient class that's a part of Apache HTTP Components(http://hc.apache.org/). With this you can build a request-response model. From what I understood, there is an external educational website that gives you feeds of students results by taking input/request data(student reg number and student DOB) in a particular format(text/xml or application/atom+xml or so). Based on the format of the response data that you receive, you have to parse accordingly. For example xml data can be parsed with XPath expressions, DOM parser or such. Similarly atom feeds can be parsed using Apache Abdera parser. So there are two things to this. 1. Obtaining data from the webpage using HttpClient, HttpGet, HttpPost and other such APIs. 2. Parsing the response data that is received using a suitable parser. Thanks for the A2A.
Radhikaa Bhaskaran
I did a similar project that I used to get the results from my university's page using Java. You can use jsoup library which will easily parse the webpage and then FileWriter to write the results to a text file.
Raghav Guru
Related Q & A:
- In Visual Studio 2012 or 2013, how can I register a DLL file on Windows 7 64-bit computer using a Custom Action command?Best solution by Stack Overflow
- How can I save my computer desktop's background as an image/pic?Best solution by Yahoo! Answers
- How can I save a copy of my printed docs on my computer?Best solution by Yahoo! Answers
- How to search for a particular string in a text file using java?Best solution by Stack Overflow
- How can i send some one a text through my computer?Best solution by Yahoo! Answers
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.