Sunday, January 21, 2018

Kumar

• Experience around 9 years in IT industry with complete software development of life cycle (SDLC) which includes business requirements gathering, system analysis & design, data modeling, development, testing and implementation of the projects. 
• Experience around 5 years in development, implementation and configuration of Hadoop ecosystem tools such as HDFS, MapReduce, Hive, Oozie, Sqoop, NiFi, Kafka, Zookeeper, ElasticSearch, Knox, Ranger, Cassandra, HBase, MongoDB, Spark Core, Spark Streaming, Spark Data Frame and Spark MLlib. 
• Experienced in configuration, deployments and managing of different Hadoop distributions like Cloudera (CDH4 & CDH5) and Hortonworks (HDP). 
• Experience of import/export data using Sqoop from Hadoop distributed file systems to relational database systems and vice versa. 
• Experience in handling various file formats like AVRO, Sequential, text, xml, JSON and Parquet with different compression techniques such as gzip, LZO, Snappy etc. 
• Experienced on Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming and Spark MLlib. 
• Imported the data from source HDFS into Spark Data Frame for in-memory data computation to generate the optimized output response and better visualizations. 
• Expertise in writing Spark RDD transformations, actions, Data Frame's, case classes for the required input data and performed the data transformations using Spark-Core also convert RRD to Data Frame. 
• Experienced on collection the real time streaming data and creating the pipeline for row data from different source using Kafka and store data into HDFS and NoSQL using Spark. 
• Extending HIVE core functionality by using custom User Defined Function's (UDF) and User Defined Aggregating Functions (UDAF). 
• Implemented POC for using Impala for data processing on top of HIVE for better utilization of C++ executions engines. 
• Experience in NoSQL Databases HBase, Cassandra and it’s integrated with Hadoop cluster. 
• Implemented Cluster for NoSQL tools HBase as a part of POC to address HBase limitations. 
• Exploring with Spark Beta version API to improve the performance, and optimization of the existing algorithms with different modes such as YARN, Mesos and standalone for POC. 
• Expertise in using ETL Tool Informatica Power Center designer, workflow manager, repository manager, data quality and ETL concepts. 
• Experienced with NiFi to automate the data movement between different Hadoop systems. 
• Worked with different Hadoop Security such as Knox and Ranger integrated LDAP store with Kerberos KDC. 
• Good understanding on security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure. 
• Experienced on cloud integration with AWS using Elastic Map Reduce (EMR), Simple Storage Service (S3), EC2, Redshift and Microsoft Azure. 
• Experienced on different Relational Data Base Management Systems like Teradata, PostgreSQL, DB2, Oracle and SQL Server. 
• Experienced in scheduling and monitoring the production jobs using Oozie and Azkaban.

No comments:

Post a Comment