Amar Prakash Pandey - ᕦ(ò

Amar Prakash Pandey - ᕦ(ò_óˇ)ᕤ https://amarpandey.me/ Recent content on Amar Prakash Pandey - ᕦ(ò_óˇ)ᕤ Hugo en-us Tue, 15 Oct 2024 00:00:00 +0000 The CAP Theorem Explained: Balancing the Big Three in Distributed Databases https://amarpandey.me/blog/the-cap-theorem-explained-balancing-the-big-three-in-distributed-databases/ Tue, 15 Oct 2024 00:00:00 +0000 https://amarpandey.me/blog/the-cap-theorem-explained-balancing-the-big-three-in-distributed-databases/ The CAP theorem, also known as Brewer’s theorem (named after computer scientist Eric Brewer), defines a fundamental trade-off in distributed systems: any distributed data store can provide only two out of three guarantees at any time: C: Consistency A: Availability P: Partition Tolerance What Do These Terms Mean? Consistency (C): Every read receives the most recent write or an error. This means that the data you access is guaranteed to be the latest version, or the system will notify you that something went wrong. Fine-Tuning Shuffle Partitions in Apache Spark for Maximum Efficiency https://amarpandey.me/blog/fine-tuning-shuffle-partitions-in-apache-spark-for-maximum-efficiency/ Fri, 24 May 2024 00:00:00 +0000 https://amarpandey.me/blog/fine-tuning-shuffle-partitions-in-apache-spark-for-maximum-efficiency/ Apache Spark’s shuffle partitions play a critical role in data processing, especially during operations like joins and aggregations. Properly configuring these partitions is essential for optimizing performance. Default Shuffle Partition Count By default, Spark sets the shuffle partition count to 200. While this may work for small datasets (less than 20 GB), it is usually inadequate for larger data sizes. Besides, who would work with just 20 GB of data on Spark? Handling Large Broadcast Joins in Apache Spark https://amarpandey.me/blog/handling-large-broadcast-joins-in-apache-spark/ Wed, 22 May 2024 00:00:00 +0000 https://amarpandey.me/blog/handling-large-broadcast-joins-in-apache-spark/ In Apache Spark, efficient data processing often relies on the use of broadcast joins. However, when the dataset exceeds a certain size, specifically 8GB, you may encounter the following error: Caused by: org.apache.spark.SparkException: Cannot broadcast the table that is larger than 8GB: 13 GB This error arises because Spark is attempting to broadcast a dataset that is larger than the maximum threshold allowed for broadcast joins. By default, Spark’s threshold for broadcasting is set to 8GB. Symptoms of Bad Code https://amarpandey.me/blog/symptoms-of-bad-code/ Tue, 19 Jul 2022 00:00:00 +0000 https://amarpandey.me/blog/symptoms-of-bad-code/ 1. Rigidity Rigidity is the tendency of the system to be hard to change. Code that has dependencies that snake out in so many directions and you cannot make an isolated change without changing everything around it. Rigidity causes compile time error. 2. Fragility A system is fragile when a small change in one module causes other unrelated modules to misbehave. It is the tendency of the code to break in many places even when you make changes in one place. Docker - the right way https://amarpandey.me/blog/docker---the-right-way/ Sun, 23 Jan 2022 00:00:00 +0000 https://amarpandey.me/blog/docker---the-right-way/ Docker is a software framework for building, running, and managing containers on servers and the cloud. Here are the several best practices for using Docker in production to improve security, optimize image size and write cleaner and more maintainable Dockerfiles. 1. Use Official Docker Image as Base Image Always use the official or verified base image when writing the docker file. Let’s say you are developing a java application and want to build it and run it as a docker image. GitOps - the easy way https://amarpandey.me/blog/gitops---the-easy-way/ Tue, 04 Jan 2022 00:00:00 +0000 https://amarpandey.me/blog/gitops---the-easy-way/ What is GitOps? Treat the infrastructure as code the same way as application code. Separate repository for Infrastructure as code. DevOps pipeline. How does GitOps works? Infrastructure as Code hosted on Git repository. Version controlled. Team collaboration. Use branching strategy to merge code in git repository. With CI pipeline to test the code. With CD pipeline to apply the changes to the Infrastructure. With the above steps we achieve: Automated Process. Finger Detection and Tracking using OpenCV and Python https://amarpandey.me/blog/finger-detection-and-tracking-using-opencv-and-python/ Sat, 28 Jul 2018 00:00:00 +0000 https://amarpandey.me/blog/finger-detection-and-tracking-using-opencv-and-python/ TL;DR. Code is here. Finger detection is an important feature of many computer vision applications. In this application, A histogram based approach is used to separate out the hand from the background frame. Thresholding and Filtering techniques are used for background cancellation to obtain optimum results. One of the challenges that I faced in detecting fingers is differentiating a hand from the background and identifying the tip of a finger. What is Google Summer of Code? How to prepare for it? https://amarpandey.me/blog/what-is-google-summer-of-code---how-to-prepare-for-it/ Sun, 02 Jul 2017 00:00:00 +0000 https://amarpandey.me/blog/what-is-google-summer-of-code---how-to-prepare-for-it/ We will talk about Google Summer of Code but before that let’s talk about what Open Source Development is. Yes, it’s very important. What is open source development? Open-source software development is the process by which open-source software, or similar software whose source code is publicly available, is developed. These are software products available with its source code under an open-source license to study, change, and improve its design. About https://amarpandey.me/about/ Mon, 01 Jan 0001 00:00:00 +0000 https://amarpandey.me/about/ Hello, I’m Amar Prakash Pandey :) I’m a developer, hacker, driven by genuine curiosity, and have an optimistic attitude. I believe that I’m passionate about software development and writing simple, readable code. I love to learn new things and grow. I’ve spent the last 6 years growing as a developer through experience and education. I prefer to think of myself as an open-minded, down to earth and outgoing person. I care very much for people who I come across in my life and try to make their lives easier. Projects https://amarpandey.me/projects/ Mon, 01 Jan 0001 00:00:00 +0000 https://amarpandey.me/projects/ Here are some of the projects that I have worked on. You can find the source code and demo link for each project below. geohashviz.com - A web application that lets you visualize geohashes in bulk. code demo Chat Rooms - Real time public/private chat application using spring boot web-sockets. code