Manipal Dot Net Pvt Ltd has an opening for a self-motivated, team-oriented, hard-working Web Scraping Template Development Engineers, to work on an exciting project in extracting and ingesting data from websites using web crawling tools.
- Extract structured/unstructured data from multiple retail and e-commerce websites.
- Identify a diverse and representative set of product pages of various retailer domains for testing and development.
- Develop and test XPath templates to extract various attributes from product pages.
- Use labelling tools to generate training data for Machine Learning engines.
- Fix bugs and maintain already developed templates.
- Respond to urgent client requirements – bug fixing for priority websites when requested by the client.
- Maintain documentation and spreadsheets, and daily reports with clean and precise information as well as statistics about the websites and parameters extracted.
- Guide and mentor other engineers – applicable for experienced engineers.
- Perform code reviews and suggest design changes – applicable for experienced engineers.
Desirable Skill Set:
- Good knowledge and experience of the Linux operating system and Python/Bash scripting.
- Strong foundation in the application of XPath to processing XML/HTML.
- Solid grasp of web technologies and protocols (HTML, XPath, JSON, HTTP, CSS etc.).
- Experienced in the use of version control and code sharing repositories e.g. git/github.
- Familiarity with Regular Expressions (regex).
- Experienced in the use of browser-based debuggers e.g. Chrome debugger.
- Keen attention to detail.
- Ability to thoroughly follow written instructions.
- Excellent written and verbal communication skills.
- Eager to learn new technical skills and grow.
- Team-oriented, with good interpersonal skills.
- Must be self-motivated and demonstrate a 'can do' attitude.
- Bachelor's Degree (BSc/BE/BCA) in Computer Science/IT/Electronics.
- Master's Degree Preferred (MSc/MCA/MTech).
- 0-3 years of experience in web-based application development.