ahimsavaadhi1 Posted September 17, 2013 Report Posted September 17, 2013 [quote name='Search_engine' timestamp='1379435236' post='1304260591'] Hi Guys, I'm planning to learn Hadoop. Presently I'm working on a reporting tool. How are the opportunities and billing rate in this technology? Are there anyone who are working on this technology? Please PM me..... [/quote] Hadoop is mainly hosted on Unix/Linux Platforms ... from career perspective it can be broadly classified in to 4 categories.. 1)Hadoop Administrator : Requires Unix/Linux Admin + Hadoop Admin Skills + lil Javaa.... (70$ starting) 2)Hadoop Developer : Java experience + HadoopUnderstanding... (60$ Starting) 3)Hadoop DBA : HBase+Cassandra+Hive+pig.....blah blah... (70 $ starting) 4)Hadoop Architect : Handson experience on above three skills (100+ $ starting ).... [b] What Is Apache Hadoop?[/b] The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures. The project includes these modules:[list] [*][b]Hadoop Common[/b]: The common utilities that support the other Hadoop modules. [*][b]Hadoop Distributed File System (HDFS™)[/b]: A distributed file system that provides high-throughput access to application data. [*][b]Hadoop YARN[/b]: A framework for job scheduling and cluster resource management. [*][b]Hadoop MapReduce[/b]: A YARN-based system for parallel processing of large data sets. [/list] Other Hadoop-related projects at Apache include:[list] [*][url="http://incubator.apache.org/ambari/"][b]Ambari™[/b][/url]: A web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters which includes support for Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig and Sqoop. Ambari also provides a dashboard for viewing cluster health such as heatmaps and ability to view MapReduce, Pig and Hive applications visually alongwith features to diagnose their performance characteristics in a user-friendly manner. [*][url="http://avro.apache.org/"][b]Avro™[/b][/url]: A data serialization system. [*][url="http://cassandra.apache.org/"][b]Cassandra™[/b][/url]: A scalable multi-master database with no single points of failure. [*][url="http://incubator.apache.org/chukwa/"][b]Chukwa™[/b][/url]: A data collection system for managing large distributed systems. [*][url="http://hbase.apache.org/"][b]HBase™[/b][/url]: A scalable, distributed database that supports structured data storage for large tables. [*][url="http://hive.apache.org/"][b]Hive™[/b][/url]: A data warehouse infrastructure that provides data summarization and ad hoc querying. [*][url="http://mahout.apache.org/"][b]Mahout™[/b][/url]: A Scalable machine learning and data mining library. [*][url="http://pig.apache.org/"][b]Pig™[/b][/url]: A high-level data-flow language and execution framework for parallel computation. [*][url="http://zookeeper.apache.org/"][b]ZooKeeper™[/b][/url]: A high-performance coordination service for distributed applications. [/list] 1
siritptpras Posted September 17, 2013 Report Posted September 17, 2013 [quote name='ahimsavaadhi1' timestamp='1379446214' post='1304261860'] Hadoop is mainly hosted on Unix/Linux Platforms ... from career perspective it can be broadly classified in to 4 categories.. 1)Hadoop Administrator : Requires Unix/Linux Admin + Hadoop Admin Skills + lil Javaa.... (70$ starting) 2)Hadoop Developer : Java experience + HadoopUnderstanding... (60$ Starting) 3)Hadoop DBA : HBase+Cassandra+Hive+pig.....blah blah... (70 $ starting) 4)Hadoop Architect : Handson experience on above three skills (100+ $ starting ).... [b] What Is Apache Hadoop?[/b] The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures. The project includes these modules:[list] [*][b]Hadoop Common[/b]: The common utilities that support the other Hadoop modules. [*][b]Hadoop Distributed File System (HDFS™)[/b]: A distributed file system that provides high-throughput access to application data. [*][b]Hadoop YARN[/b]: A framework for job scheduling and cluster resource management. [*][b]Hadoop MapReduce[/b]: A YARN-based system for parallel processing of large data sets. [/list] Other Hadoop-related projects at Apache include:[list] [*][url="http://incubator.apache.org/ambari/"][b]Ambari™[/b][/url]: A web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters which includes support for Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig and Sqoop. Ambari also provides a dashboard for viewing cluster health such as heatmaps and ability to view MapReduce, Pig and Hive applications visually alongwith features to diagnose their performance characteristics in a user-friendly manner. [*][url="http://avro.apache.org/"][b]Avro™[/b][/url]: A data serialization system. [*][url="http://cassandra.apache.org/"][b]Cassandra™[/b][/url]: A scalable multi-master database with no single points of failure. [*][url="http://incubator.apache.org/chukwa/"][b]Chukwa™[/b][/url]: A data collection system for managing large distributed systems. [*][url="http://hbase.apache.org/"][b]HBase™[/b][/url]: A scalable, distributed database that supports structured data storage for large tables. [*][url="http://hive.apache.org/"][b]Hive™[/b][/url]: A data warehouse infrastructure that provides data summarization and ad hoc querying. [*][url="http://mahout.apache.org/"][b]Mahout™[/b][/url]: A Scalable machine learning and data mining library. [*][url="http://pig.apache.org/"][b]Pig™[/b][/url]: A high-level data-flow language and execution framework for parallel computation. [*][url="http://zookeeper.apache.org/"][b]ZooKeeper™[/b][/url]: A high-performance coordination service for distributed applications. [/list] [/quote] 1)Hadoop Administrator : Requires Unix/Linux Admin + Hadoop Admin Skills + lil Javaa.... (70$ starting) 2)Hadoop Developer : Java experience + HadoopUnderstanding... (60$ Starting) 3)Hadoop DBA : HBase+Cassandra+Hive+pig.....blah blah... (70 $ starting) 4)Hadoop Architect : Handson experience on above three skills (100+ $ starting ) where did you get this billing numbers ? people are saying 100+ for hadoop..above rates are very normal rates like any other technology..
ahimsavaadhi1 Posted September 17, 2013 Report Posted September 17, 2013 [quote name='CharlieSheen' timestamp='1379448389' post='1304262064'] 1)Hadoop Administrator : Requires Unix/Linux Admin + Hadoop Admin Skills + lil Javaa.... (70$ starting) 2)Hadoop Developer : Java experience + HadoopUnderstanding... (60$ Starting) 3)Hadoop DBA : HBase+Cassandra+Hive+pig.....blah blah... (70 $ starting) 4)Hadoop Architect : Handson experience on above three skills (100+ $ starting ) where did you get this billing numbers ? people are saying 100+ for hadoop..above rates are very normal rates like any other technology.. [/quote] These are all career starting numbers... I know one of my colleague gets 300+ per hour.. but he is like 6years in to hadoop and overall experience of 15 years.... when u start your career i think these are good rates.. if you do one or two projects ......Then the figures will go north of 100...
siritptpras Posted September 17, 2013 Report Posted September 17, 2013 ante mana starting exp 7 yr tho start authundi kada..so manamu 150 per hr expect cheyyochu.. meeru cheppina rates for entry level.. manamu any tech and starts with 7 yrs kada..ok as long as 100+ pothe best..just clarification kosam.. db lo evaru hadoop consulting cheyyadam ledu..so almost freinds info..first hand info kosam looking.. but mee reponse is close to accuracy..but 300+ ante too much ga undi.. seems we can invest time on it.. if we get 100+ in 1-2 yrs..
ahimsavaadhi1 Posted September 17, 2013 Report Posted September 17, 2013 [quote name='CharlieSheen' timestamp='1379451087' post='1304262201'] ante mana starting exp 7 yr tho start authundi kada..so manamu 150 per hr expect cheyyochu.. meeru cheppina rates for entry level.. manamu any tech and starts with 7 yrs kada..ok as long as 100+ pothe best..just clarification kosam.. db lo evaru hadoop consulting cheyyadam ledu..so almost freinds info..first hand info kosam looking.. but mee reponse is close to accuracy..but 300+ ante too much ga undi.. seems we can invest time on it.. if we get 100+ in 1-2 yrs.. [/quote] u guys missing some points... 100+ tho career start chesthey imagine what the clients will expect....peelchi pippi chesthaaru...... atleast oka 2-3 projects will be short stints tho.. enduku antey... unlike Java no .Net ayithey 50,60 lu kabatti...they will atleast give time to settle down ...to deliver... Hadoop lo ayithey frontline lo on fire annamata... aa billing rates lo velithey... na suggestion enti antey Hadoop starters ki.. dont load your resume with 7+... just keep it simple 5+ .. andutlo 3.5 to 4 years of Hadoop aney cheppandi... and also dont quote 100+ .... around 70 is appropriate so that meeru project loki vellaka you will have time to settle down.. will be able to learn things .. or else join ayyina week ey malli inkoka project vethhukovalsi vasthundi.... hadoop as a technology is matured and came to mainstream in only last 3-4 years...... Hadoop ki emi too much billing ledhu... mostly around 120-150 max... for architects... and max 110 for Admins,Developers... ekkado kontha mandi exceptional people untaaru North of 200....
panthulu Posted September 18, 2013 Report Posted September 18, 2013 [quote name='ahimsavaadhi1' timestamp='1379462051' post='1304262663'] u guys missing some points... 100+ tho career start chesthey imagine what the clients will expect....peelchi pippi chesthaaru...... atleast oka 2-3 projects will be short stints tho.. enduku antey... unlike Java no .Net ayithey 50,60 lu kabatti...they will atleast give time to settle down ...to deliver... Hadoop lo ayithey frontline lo on fire annamata... aa billing rates lo velithey... na suggestion enti antey Hadoop starters ki.. dont load your resume with 7+... just keep it simple 5+ .. andutlo 3.5 to 4 years of Hadoop aney cheppandi... and also dont quote 100+ .... around 70 is appropriate so that meeru project loki vellaka you will have time to settle down.. will be able to learn things .. or else join ayyina week ey malli inkoka project vethhukovalsi vasthundi.... hadoop as a technology is matured and came to mainstream in only last 3-4 years...... Hadoop ki emi too much billing ledhu... mostly around 120-150 max... for architects... and max 110 for Admins,Developers... ekkado kontha mandi exceptional people untaaru North of 200.... [/quote] bhayya any consultancies you can recommend for in class training ?? and how long it would take to learn it?
ahimsavaadhi1 Posted September 18, 2013 Report Posted September 18, 2013 [quote name='panthulu' timestamp='1379480615' post='1304263893'] bhayya any consultancies you can recommend for in class training ?? and how long it would take to learn it? [/quote] As i said Hadoop requires 3 prerequisites.... based up on the Role 1)Java + SQL skills for Hadoop Developer 2)Unix/Linux Administration for Hadoop Administrator 3)DBA + SQL skills for Hadoop DB technologies... Im not sure if know any consultancies for Hadoop Training. Ikkada kontha mandi ads vesthunnaru ga.... try to see demos and oka 15 days classes... attend ayithey telsipothundi where we stand ani.... Oka sari Hadoop Training ayyi.. if you are comortable to dive in to Hadoop try for Hadoop Certifications from Cloudera.. It will maximize your Career Prospectus....... Apart from that there are lot of tech materials, Self learning guides over the Internet.. oh 10 -15 books read cheyyadhu... Read only one authentic Text book and stick to it for some time..... For your information im Not a Hadoop person but I closely follow tech trends.... and sharing my info and understanding on it...... 1
BUGZ Posted September 18, 2013 Report Posted September 18, 2013 Nice[quote name='ahimsavaadhi1' timestamp='1379509853' post='1304264690'] As i said Hadoop requires 3 prerequisites.... based up on the Role 1)Java + SQL skills for Hadoop Developer 2)Unix/Linux Administration for Hadoop Administrator 3)DBA + SQL skills for Hadoop DB technologies... Im not sure if know any consultancies for Hadoop Training. Ikkada kontha mandi ads vesthunnaru ga.... try to see demos and oka 15 days classes... attend ayithey telsipothundi where we stand ani.... Oka sari Hadoop Training ayyi.. if you are comortable to dive in to Hadoop try for Hadoop Certifications from Cloudera.. It will maximize your Career Prospectus....... Apart from that there are lot of tech materials, Self learning guides over the Internet.. oh 10 -15 books read cheyyadhu... Read only one authentic Text book and stick to it for some time..... For your information im Not a Hadoop person but I closely follow tech trends.... and sharing my info and understanding on it...... [/quote]
siritptpras Posted September 18, 2013 Report Posted September 18, 2013 [quote name='ahimsavaadhi1' timestamp='1379509853' post='1304264690'] As i said Hadoop requires 3 prerequisites.... based up on the Role 1)Java + SQL skills for Hadoop Developer 2)Unix/Linux Administration for Hadoop Administrator 3)DBA + SQL skills for Hadoop DB technologies... Im not sure if know any consultancies for Hadoop Training. Ikkada kontha mandi ads vesthunnaru ga.... try to see demos and oka 15 days classes... attend ayithey telsipothundi where we stand ani.... Oka sari Hadoop Training ayyi.. if you are comortable to dive in to Hadoop try for Hadoop Certifications from Cloudera.. It will maximize your Career Prospectus....... Apart from that there are lot of tech materials, Self learning guides over the Internet.. oh 10 -15 books read cheyyadhu... Read only one authentic Text book and stick to it for some time..... For your information im Not a Hadoop person but I closely follow tech trends.... and sharing my info and understanding on it...... [/quote] Nice Info..
panthulu Posted September 19, 2013 Report Posted September 19, 2013 [quote name='ahimsavaadhi1' timestamp='1379509853' post='1304264690'] As i said Hadoop requires 3 prerequisites.... based up on the Role 1)Java + SQL skills for Hadoop Developer 2)Unix/Linux Administration for Hadoop Administrator 3)DBA + SQL skills for Hadoop DB technologies... Im not sure if know any consultancies for Hadoop Training. Ikkada kontha mandi ads vesthunnaru ga.... try to see demos and oka 15 days classes... attend ayithey telsipothundi where we stand ani.... Oka sari Hadoop Training ayyi.. if you are comortable to dive in to Hadoop try for Hadoop Certifications from Cloudera.. It will maximize your Career Prospectus....... Apart from that there are lot of tech materials, Self learning guides over the Internet.. oh 10 -15 books read cheyyadhu... Read only one authentic Text book and stick to it for some time..... For your information im Not a Hadoop person but I closely follow tech trends.... and sharing my info and understanding on it...... [/quote] Thanks bhayya...very useful info
ahimsavaadhi1 Posted September 19, 2013 Report Posted September 19, 2013 I came across this article one of techblog [url="http://www.pythian.com/blog/tag/big-data/"]http://www.pythian.com/blog/tag/big-data/[/url] [color=#343434][font=proxima-nova, Verdana, Arial, sans-serif] [b] Hadoop FAQ – But What About the DBAs?[/b] Jan 24, 2013 / By [url="http://www.pythian.com/blog/author/shapira"]Gwen Shapira[/url] Tags: [url="http://www.pythian.com/blog/tag/big-data/"]Big Data[/url], [url="http://www.pythian.com/blog/tag/dba-lounge/"]DBA Lounge[/url], [url="http://www.pythian.com/blog/tag/hadoop/"]Hadoop[/url][/font][/color] There is one question I hear every time I make a presentation about [url="http://www.pythian.com/services/big-data/hadoop/"]Hadoop[/url] to an audience of DBAs. This question was also recently asked in LinkedIn’s DBA Manager forum, so I finally decided to answer it in writing, once and for all. [i]“As we all see there are lot of things happening on [url="http://www.pythian.com/services/big-data/"]Big Data[/url] using Hadoop etc…. Can you let me know where do normal DBAs like fit in this : DBAs supporting normal OLTP databases using Oracle, SQL Server databases DBAs who support day to day issues in Datawarehouse environments .[/i] Do DBAs need to learn Java (or) Storage Admin ( like SAN technology ) to get into Big Data ? ” I hear a few questions here: [list] [*]Do DBAs have a place at all in Big Data and Hadoop world? If so, what is that place? [*]Do they need new skills? Which ones? [/list] Let me start by introducing everyone to a new role that now exists in many organizations: [b]Hadoop Cluster Administrator[/b]. Organizations that did not yet adopt Hadoop sometimes imagine Hadoop as a developer-only system. I think this is the reason why I get so many questions about whether or not we need to learn Java every time I mention Hadoop. Even within Pythian, when I first introduced the idea of Hadoop services, my managers asked whether we will need to learn Java or hire developers. Organizations that did adopt Hadoop found out that any production cluster larger than 20-30 nodes requires a full time admin. This admin’s job is surprising similar to a DBA’s job – he is responsible for the performance and availability of the cluster, the data it contains, and the jobs that run there. The list of tasks is almost endless and also strangely familiar – deployment, upgrades, troubleshooting, configuration, tuning, job management, installing tools, architecting processes, monitoring, backups, recovery, etc. I did not see a single organization with production Hadoop cluster that didn’t have a full-time admin, but if you don’t believe me – note that Cloudera is offering Hadoop Administrator Certification and that O’Reilly is selling a book called “Hadoop Operations”. [b]So you are going to need a Hadoop admin.[/b] Who are the candidates for the position? The best option is to hire an experienced Hadoop admin. In 2-3 years, no one will even consider doing anything else. But right now there is an extreme shortage of Hadoop admins, so we need to consider less perfect candidates. The usual suspects tend to be: Junior java developers, sysadmins, storage admins, and DBAs. Junior java developers tend not to do well in cluster admin role, just like PL/SQL developers rarely make good DBAs. Operations and dev are two different career paths, that tend to attract different types of personalities. When we get to the operations personnel, storage admins are usually out of consideration because their skillset is too unique and valuable to other parts of the organization. I’ve never seen a storage admin who became a Hadoop admin, or any place where it was even seriously considered. I’ve seen both DBAs and sysadmins becoming excellent Hadoop admins. In my highly biased opinions, DBAs have some advantages: [list] [*]Everyone knows DBA stands for “Default Blame Acceptor”. Since the database is always blamed, DBAs typically have great troubleshooting skills, processes, and instincts. All of these are critical for good cluster admins. [*]DBAs are used to manage systems with millions of knobs to turn, all of which have a critical impact on the performance and availability of the system. Hadoop is similar to databases in this sense – tons of configurations to fine-tune. [*]DBAs, much more than sysadmins, are highly skilled in keeping developers in check and making sure no one accidentally causes critical performance issues on an entire system. This skill is critical when managing Hadoop clusters. [*]DBA experience with DWH (especially Exadata) is very valuable. There are many similarities between DWH workloads and Hadoop workloads, and similar principles guide the management of the system. [*]DBAs tend to be really good at writing their own monitoring jobs when needed. Every production database system I’ve seen has crontab file full of customized monitors and maintenance jobs. This skill continues to be critical for Hadoop system. [/list] To be fair, sysadmins also have important advantages: [list] [*]They typically have more experience managing huge number of machines (much more so than DBAs). [*]They have experience working with configuration management and deployment tools (puppet, chef), which is absolutely critical when managing large clusters. [*]They can feel more comfortable digging in the OS and network when configuring and troubleshooting systems, which is an important part of Hadoop administration. [/list] Note that in both cases I’m talking about good, experienced admins – not those that can just click their way through the UI. Those who really understand their systems and much of what is going on outside the specific system they are responsible for. You need DBAs who care about the OS, who understand how hardware choices impact performance, and who understand workload characteristics and how to tune for them. There is another important role for DBAs in the Hadoop world: Hadoop jobs often get data from databases or output data to databases. Good DBAs are very useful in making sure this doesn’t cause issues. (Even small Hadoop clusters can easily bring down an Oracle database by starting too many full-table scans at once.) In this role, the DBA doesn’t need to be part of the Hadoop team as long as there is good communication between the DBA and Hadoop developers and admins. [b]What about Java?[/b] Hadoop is written in Java, and a fairly large amount of Hadoop jobs will be written in Java too. Hadoop admins will need to be able to read Java error messages (because this is typically what you get from Hadoop), understand concepts of Java virtual machines and a bit about tuning them, and write small Java programs that can help in troubleshooting. On the other hand, most admins don’t need to write huge amounts of Hadoop code (you have developers for that), and for what they do write, non-Java solutions such as Streaming, Hive, and Pig (and Impala!) can be enough. My experience taught me that good admins learn enough Java to work on Hadoop cluster within a few days. There’s really not that much to know. [b]What about SAN technology? [/b] Hadoop storage system is very different from SAN and generally uses local disks (JBOD), not storage arrays and not even RAID. Hadoop admins will need to learn about HDFS, Hadoop’s file system, but not about traditional SAN systems. However, if they are DBAs or sysadmins, I suspect they already know far too much about SAN storage. [b]So what skills do Hadoop Administrators need?[/b] First and foremost, Hadoop admins need general operational expertise such as good troubleshooting skills, understanding of system’s capacity, bottlenecks, basics of memory, CPU, OS, storage, and networks. I will assume that any good DBA has these covered. Second, good knowledge of Linux is required, especially for DBAs who spent their life working with Solaris, AIX, and HPUX. Hadoop runs on Linux. They need to learn Linux security, configuration, tuning, troubleshooting, and monitoring. Familiarity with open source configuration management and deployment tools such as Puppet or Chef can help. Linux scripting (perl / bash) is also important – they will need to build a lot of their own tools here. Third, they need Hadoop skills. There’s no way to avoid this They need to be able to deploy Hadoop cluster, add and remove nodes, figure out why a job is stuck or failing, configure and tune the cluster, find the bottlenecks, monitor critical parts of the cluster, configure name-node high availability, pick a scheduler and configure it to meet SLAs, and sometimes even take backups. So yes, there’s a lot to learn. But very little of it is Java, and there is no reason DBAs can’t do it. However, with Hadoop Administrator being one of the hottest jobs in the market (judging by my LinkedIn inbox), they may not stay DBAs for long after they become Hadoop Admins…
ahimsavaadhi1 Posted September 19, 2013 Report Posted September 19, 2013 [media=]http://www.youtube.com/watch?feature=player_embedded&v=pnmH_ktl5EI[/media] [media=]http://www.youtube.com/watch?feature=player_embedded&v=Wt7w0AEvIZo[/media]
Recommended Posts