Amazon Data Extraction Engine
Crawl, Scrape and Parse Amazon Content
Mass Data Extraction
The Data Extraction Engine is designed for the high performance of mass data extraction. Parallelized algorithms allows to run multiple simulations through a proxy-rotating platform.
On a dedicated server + rotating proxy account it is capable to sync hundreds of thousands of Amazon products every day.
Rich Configuration Instructions
It is easy to configure the Engine for certain needs. For example, you can scrape by a seller or prime status and exclude international merchants. You can exclude products like Add-ons, which are not sold alone or sold with additional fees.
Rich configuration instructions allow to build a proper workflow, and it will make accommodate HTML structure changes quite fast.
Custom Data Flow
Each package is tailored to your needs.
The advanced dataflow contains a set of opportunities for status tracking, importing, cleaning and preparing data for analysis so it can be easily and properly queried and analyzed in the analytics tools.
This makes it possible to manipulate your data and perform advanced calculations using data mining and machine learning algorithms.
Our platform deploys quickly and scales easily. Integrate Data Extraction platform with your enterprise systems, while satisfying stringent data security and privacy. We offer flexible Private Deployments that can run in private cloud or on-premise. We can give your developers the utmost flexibility in automating sophisticated data flows end-to-end via API plus rich XML configuration.
The private deployment, and rich toolset help users create, rapidly experiment, fully automate, and manage data workflows to power intelligent applications.
What Amazon’s Data You Can Extract
- Quantity, price, description, images, title, reviews
Offers page and product page are scraped.
- Certain sellers
You can set the system to scrape by certain sellers and get most of the data from it. You can define certain seller for all products or ban some sellers
- Shipping price
The shipping price is location specific. It's not available on the product page HTML, it is delivered by a specific additional request.
- Shipping weight
There is a field weight in the database. By default shipping weight is placed to it. Or just weight. Some products do not have that value.
- Price changes
In the database, there is a DateTime of the last price/qty change.
- Prime offers
It is possible to get prime offers, i.e. Parser will take the prime offer with the lowest price.
- Add-on items
It is possible to see if the product is Add-on
- EAN, MPN, UPC
In most cases, EAN, MPN, UPC are not available on the HTML.
The demo is set up to extract the following data:
- offer(url) / offers_data
- merchantId / merchantName
- stock / StockString
- mpn / ean / upc
- brand / made_by / manufacturer
- dimension / dimension_data
- delivery / delivery_data
- sync_speed / curl_code / sync_flag / sync_log
- created / modified / updated_date
- web_hierarchy_location_codes / web_hierarchy_location_name
Update Price and Stock as Fast as You Need
How many Amazon products can the system scrap per hour? The software is designed to update price/stock as fast as you need. The speed is totally a matter of server resources. For example, 500k products are well-updated on a regular $100/mo dedicated server.
How often does the system sync products with Amazon? The data is synced by the Cron script, so the product data is updated automatically. ASIN goes first; the system will populate other data after the first synchronization.
You may add more cron commands to the tab to speed up the synchronization process. This way we can start more processes every minute.
Can it schedule the automatic CSV download? We can create a controller, which can be triggered by Cron and send data to a server. Or you can call the link which generates CSV files.
Deploying and Running The Data Extraction Engine
Dedicated server + rotating proxies
Since the system is deployed on your personal servers, it provides privacy and compliance. You control what data parsing and saving occurs across your server.
It might be enough to have a $40/mo proxies account for 500k ASINs and $100-200/mo proxy account for 1 mln products. Please check Stormproxies.
What server size do you need? Depends on your requirements, for example, if there are 100k products which have to be synced every day, you may need a 4gb ram and 2 CPU machine.
From an overall IT management standpoint, you would have a parsing tool, built-in API, and all you need on the same platform. This is a complete, integrated solution designed for small businesses.
What included in the $590 price?
The price of $590 is without installation and without setup. This means you will have to install and set up the system on your own.
Installation involves: copy files, import database, setup proxies, and make sure everything works.
Since the software is not simple, we strongly recommend purchasing the installation service.