You have heard that buzzword. Everyone keeps saying, "Use our SaaS and stay competitive!". You've looked at your business goals in respect to the Scraping SaaS, and the reduced operational control associated with it, but you've also learned about its ease of use, minimal up-front cost, and on-demand scalability. You've possibly considered Service Level Agreements and the reputations of the SaaS providers. What's left when all of the dust settles?
Here, we're going to look at the pros and cons of each in more detail, to help you make an informed decision. Check out this info to get a clearer view of the realities of software as a service.
Designing and creating the extraction process is one of the most time-consuming tasks in the data extraction process in both variants.
Software-as-a-Service (SaaS)
There are a proliferation of techniques and technology of automated methods of data capturing, each suitable for a particular type data or source of data. Let's be clear here. We're talking about the business subscribes to an application it accesses over the Internet.
Pros |
Cons |
Infrastructure IndependenceTheoretically, the “cloud-based” solution can run from any OS and any browser. But today cloud-based solution requires a download file to your personal computer (or local storage) or install a browser extension. To sum up they are not platform or machine independent. You don’t have to host anything yourselfIt means accessing your data is fast and easy, for your computer, or others on your network. All the website page scrape, data extraction, and parse get handled on someone else’s server. You don't manage web proxy and captcha solvers.If you don't need to extract more than 2k pages per time, the browser extension version is good enough. |
Restrictions on available dataThere may be restrictions applied on what websites you can download data from sites that use AJAX or complex JavaScript sceneries cannot be handled by most cloud solutions. For example, it can be a forum with multiple pages. Performance limitation"Data extraction takes over 10 hours without ending". Performance issue is a kind of frequently asked question by customers of SaaS. It could be a serious limitation. No options for fine tune out resultsThere are no options to sort out your search results, no option for fine tune out results in most cloud solutions. Usually, there is no an easy way to directly fed your data through to a database. Finally, you can have difficulties opening/manipulating the delivered files (+2M rows) in excel. UI annoyancesUsually, the robot editor has various UI annoyances. If you need a "business volume" of data extraction, setting up your accounts will not easy. It will be done manually by SaaS support team and it can take a couple of weeks to get all done. If SaaS team constantly push out updates and UI re-designs. It will make using the platform very confusing as its always changing and you constantly have to learn how to use it again. You have to keep deleting and reinstalling their plugins when they do updates as the old ones no longer work. SaaS may intrude your data privacyYou might remember this particular privacy scandal when data-mining company exposed voter info. The data was leaked accidentally by a data-mining company that was responsible for compiling it. Moreover, who would have thought that a misconfigured spambot would be able to leak over 700 million email addresses and phone numbers in one of the biggest data breaches ever. Overall, those privacy scandals brought to light how pretty much data in third-party services have privacy endangered. If the SaaS provider shuts down your business operations can be broken down too and your data lost. For instance, in 2016 the Kimono web service was shut down and and the cloud service has been discontinued. |
Security concerns for SaaS adopters
People assume that SaaS application providers have all their data protection covered, including backup and restore. In the US – 79% and in the UK 65% of SaaS users were confident that their SaaS providers can easily restore their cloud-based data, while in reality, SaaS vendors cannot protect their customers from data loss due to hacking, human error, or other causes. Security, data loss, and compliance remain top concerns for SaaS adopters.
Stand-alone Web Data Capture Solution
Pros |
Cons |
Control over data extraction softwareIf you build your own infrastructure you will get security and confidentiality of your data, the high availability, complete transparency, and guaranteed confidentiality. Dedicated servers offer ultimate flexibility. It’s your device: you can install whatever you like whenever you need it. No problem. High performanceThe custom designed scraper software offer the high performance of mass data extraction. Proper workflowDevelopment of your independent software would allow you to build a proper workflow so you can manage the entire data retrieval process. You will get your data in a format most useful to you. Completely customizable. No limits on available dataCustom scraper allows to intelligently extract and timely update structured data from any interactive website. Extract data from sites that use anti-scraping technology. JavaScript, AJAX, complex JavaScript scenaries, or any dynamic website - your custom software can cover all. |
Infrastructure managementYou get full responsibility for your own infrastructure management and monitoring. It will include server, proxy and captcha solver management. Technical expertiseHarder to set up for a beginner if you do it by yourself. Technical expertise is necessary to connect, install and configure the software required for your system. Hosts often provide some level of assistance but you’re largely on your own. You might require additional staff costs to install, monitor and maintain your server. |
Choosing the Right Solution
Everyone has their own needs, and you'll need to take stock of what each can offer you before you decide. The more you know about the requirements of your data extraction needs and what you need to run your system efficiently, the easier it will be for you to make an informed choice.
Cost of SaaS Scraping Solution |
Cost of Custom Web Data Extraction Tool |
First, you will need to pay the total amount upfront before getting your data. SaaS fees depend on the amount of dataOngoing monthly SaaS fees depend on the amount of data you extract. Often SaaS publics pricing for websites without any anti-scraping mechanisms and less than 25K pages. You will need to keep in mind the fact that each sub-page costs a credit. It can quickly get expensive if you are extracting data from a number of sub-pages. As your web-scraping data amount increase, subscriptions payments can be very high. These can top the cost of a custom development and set up of your own software. For example, scraping around 150k Amazon products daily will cost $4k per month. You will need a very accurate check their terms. Have they an account query limit that will prevent you from exceeding your plan? Practice to charge an extra monthThere is shady practice to charge the customer an extra month. SaaS usually have put in their terms and conditions that cancellation takes 31 days, meaning you have to pay for the next 2 months in reality. Often the terms and conditions are filled with hidden tricks - if you don't cancel 7 days earlier, they charge you again with not possible to request a cancellation. You cannot put your account on pauseIf you do not need your data for a specific month, you cannot put your account on pause. As soon as you stop paying for your SaaS account all of your structured queries permanently disappear. |
One-time jobIf you need a one-time scraping job there are many freelancers at upwork.com. The price starts at $100. Regular scrape a huge number of productsIf you need to scrape a huge number of products. Stand-alone automated web data capture solution has a high initial price on account of the purchase of technology. But as the project proceeds, it is found to lower the operating costs significantly on account of low manpower requirement. It needs a rotating proxy account $50/mo and dedicated server $100/mo. It is capable to sync millions of links every day. Further, with the majority of web data today existing in electronic forms, the cost of using such automated technology has also reduced. |