follow up,
shiva nagaraj, January 21, 2017 - 6:59 pm UTC
Thanks for sharing the articles. The first option seems good and the sqoop example that you have provided is probably we are already trying that.
Our biggest challenge is bringing 100's of Oracle database data into HDFS. We have a goal of getting 25 per week and databases could be in few hundred gigs to 4-5 TB in size. We prefer an automated approach and sqoop so far fits into this but our biggest challenge is the "connection reset" problem in jdbc and rerunning all those tables that encountered the connection problem.
Thanks,