
Everything posted by Blogger
-
SmartStudi Sidebar
by: aiparabellum.com Tue, 24 Dec 2024 02:33:06 +0000 https://chromewebstore.google.com/detail/smartstudi-sidebar-ai-det/hcbkeogkclchohipphaajhjhdcpnejko?pli=1 SmartStudi Sidebar is a versatile Chrome extension designed for content creators, researchers, and writers who require advanced AI tools. This extension integrates seamlessly into your workflow, offering features like AI detection, paraphrasing, grammar checking, and more. With its compact sidebar design, SmartStudi enhances productivity and ensures the creation of high-quality, undetectable AI-generated content. Whether you’re a student, professional, or creative writer, this tool is tailored to meet diverse content-related needs. Features SmartStudi Sidebar comes packed with powerful features to streamline your content creation and editing process: AI and Plagiarism Detection: Check your content for AI-generated text and plagiarism to maintain originality. Paraphrasing Tool: Rephrase your content to bypass AI detectors while preserving the original meaning. AI Essay Generation: Effortlessly generate undetectable AI-written essays. Citation Generator: Create accurate citations in various formats, including APA, MLA, and Chicago. Text Summarization: Summarize lengthy texts into concise versions for better understanding. Grammar Checker: Identify and correct grammatical errors to polish your writing. How It Works Using SmartStudi Sidebar is straightforward and efficient. Here’s how it works: Install the Extension: Add the SmartStudi Sidebar extension to your Chrome browser. Sign Up or Log In: Create an account or log in to your existing account on the SmartStudi platform. Access Features: Open the sidebar to access tools like AI detection, paraphrasing, and more. Input Content: Paste your text or upload files to utilize the chosen feature. Generate Results: View results instantly, be it a paraphrased version, a summary, or AI detection insights. Benefits SmartStudi Sidebar offers numerous advantages, making it an essential tool for content creators: Enhanced Productivity: Perform multiple tasks within a single tool, saving time and effort. Improved Content Quality: Detect and refine AI-written or plagiarized content with ease. User-Friendly Interface: The sidebar design ensures quick access to all features without disrupting your workflow. Versatile Applications: Suitable for academic, professional, and creative writing needs. Accurate Citations: Generate error-free citations to support your research and writing. Pricing The SmartStudi Sidebar extension requires users to create an account on the SmartStudi website to access its features. Specific pricing details for premium or advanced functionalities are available through the SmartStudi platform. Users can explore free basic features or opt for paid plans for a comprehensive experience. Review Although the SmartStudi Sidebar is a relatively new tool, it boasts a robust set of features that cater to diverse writing and content creation needs. With no current user reviews yet on the Chrome Web Store, it remains an untested gem among other AI-driven tools. Its focus on undetectable AI content and user-friendly design positions it as a promising choice for professionals and students alike. Conclusion SmartStudi Sidebar is a valuable Chrome extension offering advanced AI tools in a compact, accessible format. From detecting AI-generated content to creating polished, undetectable essays, it simplifies complex tasks for writers and researchers. Whether you’re looking to refine your writing, generate citations, or ensure originality, this tool is a reliable companion in your content creation journey. Sign up today to explore its full potential and elevate your productivity. Visit Website The post SmartStudi Sidebar appeared first on AI Parabellum.
-
A CSS Wishlist for 2025
by: Juan Diego Rodríguez Mon, 23 Dec 2024 15:07:41 +0000 2024 has been one of the greatest years for CSS: cross-document view transitions, scroll-driven animations, anchor positioning, animate to height: auto, and many others. It seems out of touch to ask, but what else do we want from CSS? Well, many things! We put our heads together and came up with a few ideas… including several of yours. Geoff’s wishlist I’m of the mind that we already have a BUNCH of wonderful CSS goodies these days. We have so many wonderful — and new! — things that I’m still wrapping my head around many of them. But! There’s always room for one more good thing, right? Or maybe room for four new things. If I could ask for any new CSS features, these are the ones I’d go for. 1. A conditional if() statement It’s coming! Or it’s already here if you consider that the CSS Working Group (CSSWG) resolved to add an if() conditional to the CSS Values Module Level 5 specification. That’s a big step forward, even if it takes a year or two (or more?!) to get a formal definition and make its way into browsers. My understanding about if() is that it’s a key component for achieving Container Style Queries, which is what I ultimately want from this. Being able to apply styles conditionally based on the styles of another element is the white whale of CSS, so to speak. We can already style an element based on what other elements it :has() so this would expand that magic to include conditional styles as well. 2. CSS mixins This is more of a “nice-to-have” feature because I feel its squarely in CSS Preprocessor Territory and believe it’s nice to have some tooling for light abstractions, such as writing functions or mixins in CSS. But I certainly wouldn’t say “no” to having mixins baked right into CSS if someone was offering it to me. That might be the straw that breaks the CSS preprocessor back and allows me to write plain CSS 100% of the time because right now I tend to reach for Sass when I need a mixin or function. I wrote up a bunch of notes about the mixins proposal and its initial draft in the specifications to give you an idea of why I’d want this feature. 3. // inline comments Yes, please! It’s a minor developer convenience that brings CSS up to par with writing comments in other languages. I’m pretty sure that writing JavaScript comments in my CSS should be in my list of dumbest CSS mistakes (even if I didn’t put it in there). 4. font-size: fit I just hate doing math, alright?! Sometimes I just want a word or short heading sized to the container it’s in. We can use things like clamp() for fluid typesetting, but again, that’s math I can’t be bothered with. You might think there’s a possible solution with Container Queries and using container query units for the font-size but that doesn’t work any better than viewport units. Ryan’s wishlist I’m just a simple, small-town CSS developer, and I’m quite satisfied with all the new features coming to browsers over the past few years, what more could I ask for? 5. Anchor positioning in more browsers! I don’t need anymore convincing on CSS anchor positioning, I’m sold! After spending much of the month of November learning how it works, I went into December knowing I won’t really get to use it for a while. As we close out 2024, only Chromium-based browsers have support, and fallbacks and progressive enhancements are not easy, unfortunately. There is a polyfill available (which is awesome), however, that does mean adding another chunk of JavaScript, contrasting what anchor positioning solves. I’m patient though, I waited a long time for :has to come to browsers, which has been “newly available” in Baseline for a year now (can you believe it?). 6. Promoting elements to the #top-layer without popover? I like anchor positioning, I like popovers, and they go really well together! The neat thing with popovers is how they appear in the #top-layer, so you get to avoid stacking issues related to z-index. This is probably all most would need with it, but having some other way to move an element there would be interesting. Also, now that I know that the #top-layer exists, I want to do more with it — I want to know what’s up there. What’s really going on? Well, I probably should have started at the spec. As it turns out, the CSS Position Layout Module Level 4 draft talks about the #top-layer, what it’s useful for, and ways to approach styling elements contained within it. Interestingly, the #top-layer is controlled by the user agent and seems to be a byproduct of the Fullscreen API. Dialogs and popovers are the way to go for now but, optimistically speaking, these features existing might mean it’s possible to promote elements to the #top-layer in future ways. This very well may be a coyote/roadrunner-type situation, as I’m not quite sure what I’d do with it once I get it. 7. Adding a layer attribute to <link> tags Personally speaking, Cascade Layers have changed how I write CSS. One thing I think would be ace is if we could include a layer attribute on a <link> tag. Imagine being able to include a CSS reset in your project like: <link rel="stylesheet" href="https://cdn.com/some/reset.css" layer="reset"> Or, depending on the page visited, dynamically add parts of CSS, blended into your cascade layers: <!-- Global styles with layers defined, such as: @layer reset, typography, components, utilities; --> <link rel="stylesheet" href="/styles/main.css"> <!-- Add only to pages using card components --> <link rel="stylesheet" href="/components/card.css" layer="components"> This feature was proposed over on the CSSWG’s repo, and like most things in life: it’s complicated. Browsers are especially finicky with attributes they don’t know, plus definite concerns around handling fallbacks. The topic was also brought over to the W3C Technical Architecture Group (TAG) for discussion as well, so there’s still hope! Juandi’s Wishlist I must admit this, I wasn’t around when the web was wild and people had hit counters. In fact, I think I am pretty young compared to your average web connoisseur. While I do know how to make a layout using float (the first web course I picked up was pretty outdated), I didn’t have to suffer long before using things like Flexbox or CSS Grid and never grinded my teeth against IE and browser support. So, the following wishes may seem like petty requests compared to the really necessary features the web needed in the past — or even some in the present. Regardless, here are my three petty requests I would wish to see in 2025: 8. Get the children count and index as an integer This is one of those things that you swear it should already be possible with just CSS. The situation is the following: I find myself wanting to know the index of an element between its siblings or the total number of children. I can’t use the counter() function since sometimes I need an integer instead of a string. The current approach is either hardcoding an index on the HTML: <ul> <li style="--index: 0">Milk</li> <li style="--index: 1">Eggs</li> <li style="--index: 2">Cheese</li> </ul> Or alternatively, write each index in CSS: li:nth-child(1) { --index: 0; } li:nth-child(2) { --index: 1; } li:nth-child(3) { --index: 2; } Either way, I always leave with the feeling that it should be easier to reference this number; the browser already has this info, it’s just a matter of exposing it to authors. It would make prettier and cleaner code for staggering animations, or simply changing the styles based on the total count. Luckily, there is a already proposal in Working Draft for sibling-count() and sibling-index() functions. While the syntax may change, I do hope to hear more about them in 2025. ul > li { background-color: hsl(sibling-count() 50% 50%); } ul > li { transition-delay: calc(sibling-index() * 500ms); } 9. A way to balance flex-wrap I’m stealing this one from Adam Argyle, but I do wish for a better way to balance flex-wrap layouts. When elements wrap one by one as their container shrinks, they either are left alone with empty space (which I don’t dislike) or grow to fill it (which hurts my soul): I wish for a more native way of balancing wrapping elements: It’s definitely annoying. 10. An easier way to read/research CSSWG discussions I am a big fan of the CSSWG and everything they do, so I spent a lot of time reading their working drafts, GitHub issues, or notes about their meetings. However, as much as I love jumping from link to link in their GitHub, it can be hard to find all the related issues to a specific discussion. I think this raises the barrier of entry to giving your opinion on some topics. If you want to participate in an issue, you should have the big picture of all the discussion (what has been said, why some things don’t work, others to consider, etc) but it’s usually scattered across several issues or meetings. While issues can be lengthy, that isn’t the problem (I love reading them), but rather not knowing part of a discussion existed somewhere in the first place. So, while it isn’t directly a CSS wish, I wish there was an easier way to get the full picture of the discussion before jumping in. What’s on your wishlist? We asked! You answered! Here are a few choice selections from the crowd: Rotate direct background-images, like background-rotate: 180deg CSS random(), with params for range, spread, and type A CSS anchor position mode that allows targeting the mouse cursor, pointer, or touch point positions A string selector to query a certain word in a block of text and apply styling every time that word occurs A native .visually-hidden class. position: sticky with a :stuck pseudo Wishing you a great 2025… CSS-Tricks trajectory hasn’t been the most smooth these last years, so our biggest wish for 2025 is to keep writing and sparking discussions about the web. Happy 2025! A CSS Wishlist for 2025 originally published on CSS-Tricks, which is part of the DigitalOcean family. You should get the newsletter.
-
Performance Optimization for Django-Powered Websites on Shared Hosting
by: Musfiqur Rahman Sat, 21 Dec 2024 10:54:44 GMT Running a Django site on shared hosting can be really agonizing. It's budget-friendly, sure, but it comes with strings attached: sluggish response time and unexpected server hiccups. It kind of makes you want to give up. Luckily, with a few fixes here and there, you can get your site running way smoother. It may not be perfect, but it gets the job done. Ready to level up your site? Let’s dive into these simple tricks that’ll make a huge difference. Know Your Limits, Play Your Strengths But before we dive deeper, let's do a quick intro to Django. A website that is built on the Django web framework is called a Django-powered website. Django is an open-source framework written in Python. It can easily handle spikes in traffic and large volumes of data. Platforms like Netflix, Spotify, and Instagram have a massive user base, and they have Django at their core. Shared hosting is a popular choice among users when it comes to Django websites, mostly because it's affordable and easy to set up. But since you're sharing resources with other websites, you are likely to struggle with: Limited resources (CPU, storage, etc.) Noisy neighbor effect However, that's not the end of the world. You can achieve a smoother run by– Reducing server load Regular monitoring Contacting your hosting provider These tricks help a lot, but shared hosting can only handle so much. If your site is still slow, it might be time to think about cheap dedicated hosting plans. But before you start looking for a new hosting plan, let's make sure your current setup doesn't have any loose ends. Flip the Debug Switch (Off!) Once your Django site goes live, the first thing you should do is turn DEBUG off. This setting shows detailed error texts and makes troubleshooting a lot easier. This tip is helpful for web development, but it backfires during production because it can reveal sensitive information to anyone who notices an error. To turn DEBUG off, simply set it to False in your settings.py file. DEBUG = False Next, don’t forget to configure ALLOWED_HOSTS. This setting controls which domains can access your Django site. Without it, your site might be vulnerable to unwanted traffic. Add your domain name to the list like this: ALLOWED_HOSTS =['yourdomain.com', 'www.yourdomain.com'] With DEBUG off and ALLOWED_HOSTS locked down, your Django site is already more secure and efficient. But there’s one more trick that can take your performance to the next level. Cache! Cache! Cache! Imagine every time someone visits your site, Django processes the request and renders a response. What if you could save those results and serve them instantly instead? That’s where caching comes in. Caching is like putting your site’s most frequently used data on the fast lane. You can use tools like Redis to keep your data in RAM. If it's just about API responses or database query results, in-memory caching can prove to be a game changer for you. To be more specific, there's also Django's built-in caching: Queryset caching: if your system is repeatedly running database queries, keep the query results. Template fragment caching: This feature caches the parts of your page that almost always remain the same (headers, sidebars, etc.) to avoid unnecessary rendering. Optimize Your Queries Your database is the backbone of your Django site. Django makes database interactions easy with its ORM (Object-Relational Mapping). But if you’re not careful, those queries can become a bone in your kebab. Use .select_related() and .prefetch_related() When querying related objects, Django can make multiple database calls without you even realizing it. These can pile up and slow your site. Instead of this: posts = Post.objects.all() for post in posts: print(post.author.name) # Multiple queries for each post's author Use this: posts = Post.objects.select_related('author') for post in posts: print(post.author.name) # One query for all authors Avoid the N+1 Query Problem: The N+1 query problem happens when you unknowingly run one query for the initial data and an additional query for each related object. Always check your queries using tools like Django Debug Toolbar to spot and fix these inefficiencies. Index Your Database: Indexes help your database find data faster. Identify frequently searched fields and ensure they’re indexed. In Django, you can add indexes like this: class Post(models.Model): title = models.CharField(max_length=200, db_index=True) Query Only What You Need: Fetching unnecessary data wastes time and memory. Use .only() or .values() to retrieve only the fields you actually need. Static Files? Offload and Relax Static files (images, CSS, and JavaScript) can put a heavy load on your server. But have you ever thought of offloading them to a Content Delivery Network (CDN)? CDN is a dedicated storage service. The steps are as follows: Set Up a CDN (e.g., Cloudflare, AWS CloudFront): A CDN will cache your static files and serve them from locations closest to your clients. Use Dedicated Storage (e.g., AWS S3, Google Cloud Storage): Store your files in a service designed for static content. Use Django’s storages library. Compress and Optimize Files: Minify your CSS and JavaScript files and compress images to reduce file sizes. Use tools like django-compressor to automate this process. By offloading static files, you’ll free up server storage and improve your site’s speed. It’s one more thing off your plate! Lightweight Middleware, Heavyweight Impact Middleware sits between your server and your application. It processes every request and response. Check your MIDDLEWARE setting and remove anything you don’t need. Use Django’s built-in middleware whenever you can because it’s faster and more reliable. If you create custom middleware, make sure it’s simple and only does what’s really necessary. Keeping middleware lightweight reduces server strain and uses fewer resources. Frontend First Aid Your frontend is the first thing users see, so a slow, clunky interface can leave a bad impression. Using your frontend the right way can dramatically improve the user experience. Minimize HTTP Requests: Combine CSS and JavaScript files to reduce the number of requests. Optimize Images: Use tools like TinyPNG or ImageOptim to compress images without losing quality. Lazy Load Content: Delay loading images or videos until they’re needed on the screen. Enable Gzip Compression: Compress files sent to the browser to reduce load times. Monitor, Measure, Master In the end, the key to maintaining a Django site is constant monitoring. By using tools like Django Debug Toolbar or Sentry, you can quickly identify performance issues. Once you have a clear picture of what’s happening under the radar, measure your site’s performance. Use tools like New Relic or Google Lighthouse. These tools will help you prioritize where to make improvements. With this knowledge, you can optimize your code, tweak settings, and ensure your site runs smoothly.
-
Top 7 Retailers Offering the Best Temp Work Opportunities This Christmas Season
Blogger posted a post in a topic in Women in Enterprise, Professional, and Business Careers's Job ReferralsLooking for flexible work this festive season? Temporary jobs peak during Christmas, offering great opportunities for job seekers to earn competitive wages, gain valuable skills, and explore new career paths. Discover the top 7 retailers for temp work this year, based on research from Oriel Partners, and see why seasonal roles are more rewarding than ever. View the full list of employers and perks to make the most of this year’s hiring boom!" Career Attraction Team
-
Chris’ Corner: Element-ary, My Dear Developer
by: Chris Coyier Mon, 16 Dec 2024 18:00:56 +0000 I coded a thingy the other day and I made it a web component because it occurred to me that was probably the correct approach. Not to mention they are on the mind a bit with the news of React 19 dropping with full support. My component is content-heavy HTML with a smidge of dynamic data and interactivity. So: I left the semantic, accessible, content-focused HTML inside the custom element. Server-side rendered, if you will. If the JavaScript executes, the dynamic/interactive stuff boots up. That’s a fine approach if you ask me, but I found a couple of other things kind of pleasant about the approach. One is that the JavaScript structure of the web component is confined to a class. I used LitElement for a few little niceties, but even it fairly closely mimics the native structure of a web component class. I like being nudged into how to structure code. Another is that, even though the component is “Light DOM” (e.g. style-able from the regular ol’ page) it’s still nice to have the name of the component to style under (with native CSS nesting) which acted as CSS scoping and some implied structure. The web component approach is nice for little bits, as it were. I mentioned I used LitElement. Should I have? On one hand, I’ve mentioned that going vanilla is what will really make a component last over time. On the other hand, there is an awful lot of boilerplate that way. A “7 KB landing pad” can deliver an awful lot of DX, and you might never need to “rip it out” when you change other technologies, like we felt like we had to with jQuery and even moreso with React. Or you could bring your own base class which could drop that size even lower and perhaps keep you a bit closer to that vanilla hometown. I’m curious if there is a good public list of base class examples for web components. The big ones are Lit and Fast, but I’ve just seen a new one Reactive Mastro, which has a focus on using signals for dynamic state and re-rendering. That’s an interesting focus, and it makes me wonder what other base class approaches focus on. Other features? Size? Special syntaxes? This one is only one KB. You could even write your own reactivity system if you wanted a fresh crack at that. I’m generally a fan of going Light DOM with web components and skipping all the drama of the Shadow DOM. But one of the things you give up is <slot /> which is a pretty nice feature for composing the final HTML of an element. Stencil, which is actually a compiler for web components (yet another interesting approach) makes slots work in the Light DOM which I think is great. If you do need to go Shadow DOM, and I get it if you do, the natural encapsulation could be quite valuable for a third-party component, you’ll be pleased to know I’m 10% less annoyed with the styling story lately. You can take any CSS you have a reference to from “the outside” and provide it to the Shadow DOM as an “adopted stylesheet”. That’s a “way in” for styles that seems pretty sensible and opt-in.
-
How to Change DPI: Adjusting Image Resolution
By: Joshua Njiru Wed, 11 Dec 2024 13:49:42 +0000 What is DPI and Why Does It Matter? DPI, or Dots Per Inch, is a critical measurement in digital and print imaging that determines the quality and clarity of your images. Whether you’re a photographer, graphic designer, or just someone looking to print high-quality photos, understanding how to change DPI is essential for achieving the best possible results. What are the Basics of DPI DPI refers to the number of individual dots that can be placed within a one-inch linear space. The higher the DPI, the more detailed and crisp your image will appear. Most digital images range from 72 DPI (standard for web) to 300 DPI (ideal for print). Top Methods to Change DPI in Linux 1. ImageMagick: The Command-Line Solution ImageMagick is a powerful, versatile tool for image manipulation in Linux. Here’s how to use it: <span class="token"># Install ImageMagick</span> <span class="token">sudo</span> <span class="token">apt-get</span> <span class="token">install</span> imagemagick <span class="token"># For Debian/Ubuntu</span> <span class="token">sudo</span> dnf <span class="token">install</span> ImageMagick <span class="token"># For Fedora</span> # Change DPI of a single image convert input.jpg -density 300 output.jpg # Batch convert multiple images for file in *.jpg; do convert “$file“ -density 300 “modified_${file}“ done 2. GIMP: Graphical Image Editing For those who prefer a visual interface, GIMP offers an intuitive approach: Open your image in GIMP Go to Image > Print Size Adjust the X and Y resolution Save the modified image 3. ExifTool: Precise Metadata Manipulation ExifTool provides granular control over image metadata: <span class="token"># Install ExifTool</span> <span class="token">sudo</span> <span class="token">apt-get</span> <span class="token">install</span> libimage-exiftool-perl <span class="token"># Debian/Ubuntu</span> <span class="token"># View current DPI</span> exiftool image.jpg <span class="token">|</span> <span class="token">grep</span> <span class="token">"X Resolution"</span> <span class="token"># Change DPI</span> exiftool -XResolution<span class="token">=</span><span class="token">300</span> -YResolution<span class="token">=</span><span class="token">300</span> image.jpg 4. Python Scripting: Automated DPI Changes For developers and automation enthusiasts: <span class="token">from</span> PIL <span class="token">import</span> Image <span class="token">import</span> os <span class="token"> def</span> <span class="token">change_dpi</span><span class="token">(</span>input_path<span class="token">,</span> output_path<span class="token">,</span> dpi<span class="token">)</span><span class="token">:</span> <span class="token">with</span> Image<span class="token">.</span><span class="token">open</span><span class="token">(</span>input_path<span class="token">)</span> <span class="token">as</span> img<span class="token">:</span> img<span class="token">.</span>save<span class="token">(</span>output_path<span class="token">,</span> dpi<span class="token">=</span><span class="token">(</span>dpi<span class="token">,</span> dpi<span class="token">)</span><span class="token">)</span> <span class="token"># Batch process images</span> input_directory <span class="token">=</span> <span class="token">'./images'</span> output_directory <span class="token">=</span> <span class="token">'./modified_images'</span> os<span class="token">.</span>makedirs<span class="token">(</span>output_directory<span class="token">,</span> exist_ok<span class="token">=</span><span class="token">True</span><span class="token">)</span> <span class="token">for</span> filename <span class="token">in</span> os<span class="token">.</span>listdir<span class="token">(</span>input_directory<span class="token">)</span><span class="token">:</span> <span class="token">if</span> filename<span class="token">.</span>endswith<span class="token">(</span><span class="token">(</span><span class="token">'.jpg'</span><span class="token">,</span> <span class="token">'.png'</span><span class="token">,</span> <span class="token">'.jpeg'</span><span class="token">)</span><span class="token">)</span><span class="token">:</span> input_path <span class="token">=</span> os<span class="token">.</span>path<span class="token">.</span>join<span class="token">(</span>input_directory<span class="token">,</span> filename<span class="token">)</span> output_path <span class="token">=</span> os<span class="token">.</span>path<span class="token">.</span>join<span class="token">(</span>output_directory<span class="token">,</span> filename<span class="token">)</span> change_dpi<span class="token">(</span>input_path<span class="token">,</span> output_path<span class="token">,</span> <span class="token">300</span><span class="token">)</span> Important Considerations When Changing DPI Increasing DPI doesn’t automatically improve image quality Original image resolution matters most For printing, aim for 300 DPI For web use, 72-96 DPI is typically sufficient Large increases in DPI can result in blurry or pixelated images DPI Change Tips for Different Purposes Print Requirements Photos: 300 DPI Magazines: 300-600 DPI Newspapers: 200-300 DPI Web and Digital Use Social media: 72 DPI Website graphics: 72-96 DPI Digital presentations: 96 DPI When Should You Change Your DPI? When Preparing Images for Print It is important to always check your printer’s specific requirements Use high-quality original images Resize before changing DPI to maintain quality When Optimizing for Web Reduce DPI to decrease file size Balance between image quality and load time Use compression tools alongside DPI adjustment How to Troubleshoot Issues with DPI Changes Blurry Images: Often result from significant DPI increases Large File Sizes: High DPI can create massive files Loss of Quality: Original image resolution is key Quick Fixes Use professional resampling methods Start with high-resolution original images Use vector graphics when possible for scalability More Articles from Unixmen. The post How to Change DPI: Adjusting Image Resolution appeared first on Unixmen.
-
Forum 2024 Role model blog: Lilly Vasanthini, Infosys
by: Tatiana P Lilly Vasanthini VP and Delivery Head – Eastern Europe, NORDICS and Switzerland, Infosys Even a tiny little thing that my teams win or do is a celebration for me, and this is how I stay prepared and not get scared. “Twenty-eight years ago”, I embarked on a journey with Infosys that has been nothing short of extraordinary. As the VP and Delivery Head for Eastern Europe, Nordics, and Switzerland, I’ve been blessed with countless opportunities to learn and evolve. I’m truly grateful for this incredible experience.” The beginnings in the field of technology Technology emerged as both a choice and an opportunity. In December 1984, I officially embarked on a career in Electronics and Communication Engineering. Upon graduation, I gained valuable experience in India’s prestigious defense sector, working on state-of-the-art telecommunications technology. This role provided an ideal blend of technical expertise and business acumen, aligning perfectly with my career aspirations. 2 years later, I was fortunate to join a leading telecom R&D organization in India. This early exposure to cutting-edge research and development was a significant boost to my career. The unwavering support of my family and in particular my husband, raising a young son, was instrumental in my success. Joining Infosys My career took a significant turn in 1997 when I joined Infosys. Starting as a Telekom technical training prime, I progressed to management training and eventually became a program manager. In this role, I led implementations for clients across geographies for close to seven years. My career at Infosys has been marked by a constant drive for change and innovation. Change brings both disruption and new opportunities Change is a catalyst for growth. Every technological advancement disrupts the status quo, presenting both challenges and opportunities. While traditional methods may be challenged, new products, work processes, and business models emerge. For example, the rise of e-commerce transformed retail, but it also spawned countless new opportunities. I embrace technological advancement as a positive challenge. As technology evolves, we’re compelled to think critically and build teams with the necessary skills. This continuous adaptation journey fosters innovation and accelerates progress, especially when we approach it with curiosity. Lilly’s strategy to adapt to a constantly changing field“Change” has never been something to fear. To navigate it effectively, I’ve focused on three key aspects: 1. Embrace Learning: Infosys is a dynamic organization that prioritizes continuous learning. By leveraging internal platforms and partnerships with renowned institutions like Stanford and Kellogg’s, I’ve cultivated a mindset of curiosity and a commitment to staying updated. This enables me to anticipate industry trends, adapt to evolving technologies, and empower my teams to excel. 2. Foster Strong Relationships: Building and nurturing a strong network is crucial. By connecting with colleagues, mentors, and industry experts, I gain diverse perspectives, receive valuable support, and collaborate effectively. This collaborative approach enhances my problem-solving abilities and fosters innovation. 3. Focus on Core Strengths and Celebrate Success: While adapting to change is essential, it’s equally important to build upon my core strengths. By honing my leadership skills and empowering my teams, I ensure we deliver exceptional results for our clients. Additionally, celebrating milestones, no matter how small, keeps me motivated and fosters a positive work environment. Ultimately, a positive mindset and a belief in one’s own abilities are paramount. By embracing change, building strong relationships, and focusing on core strengths, we can thrive in an ever-evolving landscape.” Find out more: Lilly Vasanthini: https://www.linkedin.com/in/lilly-vasanthini-882553/ Infosys: www.infosys.com/nordics The post Forum 2024 Role model blog: Lilly Vasanthini, Infosys first appeared on Women in Tech Finland.
-
NovelAI
by: aiparabellum.com Thu, 05 Dec 2024 04:40:38 +0000 NovelAI stands out as a revolutionary tool in the realm of digital storytelling, combining the power of advanced artificial intelligence with the creative impulses of its users. This platform is not just a simple writing assistant; it is an expansive environment where stories come to life through text and images. NovelAI offers unique features that cater to both seasoned writers and those who are just beginning to explore the art of storytelling. With its promise of no censorship and the freedom to explore any narrative, NovelAI invites you to delve into the world of creative possibilities. Features of NovelAI NovelAI provides a host of exciting features designed to enhance the storytelling experience: AI-Powered Storytelling: Utilize cutting-edge AI to craft stories with depth, maintaining your personal style and perspective. Image Generation: Bring characters and scenes to life with powerful image models, including the leading Anime Art AI. Customizable Editor: Tailor the writing space to your preferences with adjustable fonts, sizes, and color schemes. Text Adventure Module: For those who prefer structured gameplay, this feature adds an interactive dimension to your storytelling. Secure Writing: Ensures that all your stories are encrypted and private. AI Modules: Choose from various themes or emulate famous authors like Arthur Conan Doyle and H.P. Lovecraft. Lorebook: A feature to keep track of your world’s details and ensure consistency in your narratives. Multi-Device Accessibility: Continue your writing seamlessly on any device, anywhere. How It Works Using NovelAI is straightforward and user-friendly: Sign Up for Free: Start by signing up for a free trial to explore the basic features. Select a Subscription Plan: Choose from various subscription plans to unlock more features and capabilities. Customize Your Experience: Set up your editor and select preferred AI modules to tailor the AI to your writing style. Start Writing: Input your story ideas and let the AI expand upon them, or use the Text Adventure Module for a guided narrative. Visualize and Expand: Use the Image Generation feature to visualize scenes and characters. Save and Secure: All your work is automatically saved and encrypted for your eyes only. Benefits of NovelAI The benefits of using NovelAI are numerous, making it a versatile tool for any writer: Enhanced Creativity: Overcome writer’s block with AI-driven suggestions and scenarios. Customization: Fully customizable writing environment and AI behavior. Privacy and Security: Complete encryption of stories ensures privacy. Flexibility: Write anytime, anywhere, on any device. Interactive Storytelling: Engage with your story actively through the Text Adventure Module. Diverse Literary Styles: Experiment with different writing styles and genres. Visual Storytelling: Complement your narratives with high-quality images. Pricing NovelAI offers several pricing tiers to suit various needs and budgets: Paper (Free Trial): Includes 100 free text generations, 6144 tokens of memory, and basic features. Tablet ($10/month): Unlimited text generations, 3072 tokens of memory, and includes image generation and advanced AI TTS voices. Scroll ($15/month): Offers all Tablet features plus double the memory and monthly Anlas for custom AI training. Opus ($25/month): The most comprehensive plan with 8192 tokens of memory, unlimited image generations, and access to experimental features. NovelAI Review Users have praised NovelAI for its versatility and user-friendly interface. It’s been described as a “swiss army knife” for writers, providing tools that spark creativity and make writing more engaging. The ability to tailor the AI and the addition of a secure, customizable writing space are highlighted as particularly valuable features. Moreover, the advanced image generation offers a quick and effective way to visualize elements of the stories being created. Conclusion NovelAI redefines the landscape of digital storytelling by blending innovative AI technology with user-driven customization. Whether you’re a hobbyist looking to dabble in new forms of writing or a professional writer seeking a versatile assistant, NovelAI offers the tools and freedom necessary to explore the vast expanse of your imagination. With its flexible pricing plans and robust features, NovelAI is well worth considering for anyone passionate about writing and storytelling. The post NovelAI appeared first on AI Parabellum.
-
Cybersecurity Awareness Month: Protecting Our Youth in the Digital Age
by: Girls Who Code Tue, 29 Oct 2024 16:19:25 GMT As we wrap up October’s spooky season, let’s remember: the only things that should be creeping up on you are witches and vampires, not cyber threats lurking in the shadows! As many of you know, October is also Cybersecurity Awareness Month, which makes sense, because what could be scarier than having your personal information spread without your permission? At Girls Who Code, we’ve spent the last few weeks providing our students with resources, tools, and tricks to keep themselves safe online. But, we’re also committed to helping our community build a secure world all year long. Because cybersecurity is about more than making sure they have the strongest password possible (though, that’s extremely important, too). It’s also about making sure they have all the protection and knowledge they need to keep malicious actors from slithering into their digital world. Let’s be honest, all our lives are becoming more and more online. By the time our students reach high school, they’re using the internet for homework, for research, and for communicating with teachers and classmates. Hundreds of seemingly basic tasks are automated through apps, and social media has made students visible to millions of people around the world. While this has made the lives of so many young people easier, more exciting, and more expansive, it’s also made them vulnerable in ways we may not even realize. That’s why we were so excited to work withThe Achievery, created by AT&T, to roll out some essential cybersecurity Learning Units for 9th-10th grade students. In today’s tech-driven environment, understanding cybersecurity isn’t just a nice-to-have — it’s essential. Our students are diving into practical tips, like keeping software up to date and spotting phishing emails, while also learning the importance of visiting secure websites (you know, those with https:// instead of http://). We also want them to feel empowered to share this knowledge within their communities. Plus, they get useful checklists for adjusting browser settings on their devices. With units like “Online Privacy,” “Defend Against Malware and Viruses!,” and “DNS (Domain Name System) Uncovered,” we’re not just teaching them about cybersecurity; we’re helping them build a safer online future for themselves and others. We encourage our community to check out these, and so many other free and accessible tools, on The Achievery, which works to make digital learning more entertaining, engaging, and inspiring for K-12 students everywhere. As Cybersecurity Awareness Month wraps up, let’s keep empowering our students to embrace the internet’s benefits while confidently navigating its challenges. All young people deserve to protect themselves while enjoying a safer online experience that inspires them to thrive in the digital world.
-
Understanding Malware: A Guide for Software Developers and Security Professionals
by: Zainab Sutarwala Tue, 15 Oct 2024 17:25:10 +0000 Malware or malicious software brings significant threats to both individuals and organisations. It is important to understand why malware is critical for software developers and security professionals, as it helps to protect systems, safeguard sensitive information, and maintain effective operations. In this blog, we will provide detailed insights into malware, its impacts and other prevention strategies. Stay with us till the end. What is Malware? Malware refers to software designed intentionally to cause damage to the computer, server, computer network or client. The term includes a range of harmful software types including worms, viruses, Trojan horses, spyware, ransomware, and adware. Common Types of Malware Malware comes in different types and has the following unique features and characteristics: Viruses: A code that attaches itself for cleaning files and infects them, thus spreading to other files and systems. Worms: Malware that replicates and spreads to another computer system, and affects network vulnerabilities. Trojan Horses: Malicious and dangerous code disguised as legal software, often tricking users to install it. Ransomware: These programs encrypt the user’s files and demand payment to unlock them. Spyware: Software that monitors and gathers user information secretly. Adware or Scareware: A software serving unwanted ads on the user’s computer, mostly as pop-ups and banners. Scareware can be defined as an aggressive & deceptive adware version, “informing” users of upcoming cyber threats to “mitigate” for a fee. How Does Malware Spread? Malware will spread through different methods that includes: Phishing emails Infected hardware devices Malicious downloads Exploiting software vulnerabilities How Malware Attacks Software Development? Malware will attack software development in many ways including: Supply Chain Attacks: The supply chain targets third-party vendors and attacks the software that will be later used for attaching their customers. Software Vulnerabilities: Malware will exploit known and unknown weaknesses in software code to get unauthorized access and execute malicious code. Social Engineering Attacks: These attacks trick developers into installing malware and revealing sensitive information. Phishing Attacks: Phishing attacks engage in sending fraudulent messages or emails and trick developers into clicking on malicious links and downloading attachments. Practices to Prevent Malware Attacks Given are some of the best practices that will help to prevent malware attacks: Use Antimalware Software: Installing the antimalware application is important when protecting network devices and computers from malware infections. Use Email with Caution: Malware can be prevented by implementing safe behaviour on computers and other personal devices. Some steps include not accessing email attachments from any strange addresses that may have malware disguised as legitimate attachments. Network Firewalls – Firewalls on the router setups and connected to open Internet, enable data in and out in some circumstances. It keeps malicious traffic away from the network. System Update– Malware takes advantage of system vulnerabilities patched with time as discovered. “Zero-day” exploits take benefit of the unknown vulnerabilities, hence updating and patching any known vulnerabilities can make the system secure. It includes computers, mobile devices, and routers. How to Know You Have Malware? There are different signs your system will be infected by the malware: Changes to your search engine or homepage: Malware will change your homepage and search engine without your permission. Unusual pop-up windows: Malware will display annoying pop-up windows and alerts on your system. Strange programs and icons on the desktop. Sluggish computer performance. Trouble in shutting down and starting up the computer. Frequent and unexpected system crashes. If you find these issues present on your devices, they may be infected with malicious malware. How To Respond to Malware Attacks? The most effective security practice mainly uses the combination of the right technology and expertise to detect and respond to malware. Given below are some tried and proven methods: Security Monitoring: Certain tools are used to monitor network traffic and system activity for signs of malware. Intrusion Detection System or IDS: Detecting any suspicious activity and showing alerts. Antivirus Software: Protecting against any known malware threats. Incident Response Plan: Having a proper plan to respond to malware attacks efficiently. Regular Backups: Regular updates of significant data to reduce the impact of attacks. Conclusion The malware threat is evolving constantly, and software developers and security experts need to stay well-informed and take proactive measures. By checking out different kinds of malware, the way they attack software development, and best practices for prevention and detection, you will be able to help protect your data and system from attack and harm. FAQs What’s malware vs virus? Virus is one kind of malware and malware mainly refers to almost all code classes used to hard and disrupt your computing systems. How does the malware spread? There are a lot of malware attack vectors: installing infected programs, clicking infected links, opening malicious email attachments, and using corrupted output devices like a virus-infected USB. What action to take if your device gets infected by malware? Consider using an authentic malware removal tool for scanning your device, look for malware, and clean the infection. Restart your system and scan again to ensure the infection is removed completely. The post Understanding Malware: A Guide for Software Developers and Security Professionals appeared first on The Crazy Programmer.
-
Project Tazama, A Project Hosted by LF Charities With Support From the Gates Foundation, Receives Digital Public Good Designation.
By: Linux.com Editorial Staff Tue, 08 Oct 2024 13:50:45 +0000 Exciting news! The Tazama project is officially a Digital Public Good having met the criteria to be accepted to the Digital Public Goods Alliance ! Tazama is a groundbreaking open source software solution for real-time fraud prevention, and offers the first-ever open source platform dedicated to enhancing fraud management in digital payments. Historically, the financial industry has grappled with proprietary and often costly solutions that have limited access and adaptability for many, especially in developing economies. This challenge is underscored by the Global Anti-Scam Alliance, which reported that nearly $1 trillion was lost to online fraud in 2022. Tazama represents a significant shift in how financial monitoring and compliance have been approached globally, challenging the status quo by providing a powerful, scalable, and cost-effective alternative that democratizes access to advanced financial monitoring tools that can help combat fraud. Tazama addresses key concerns of government, civil society, end users, industry bodies, and the financial services industry, including fraud detection, AML Compliance, and the cost-effective monitoring of digital financial transactions. The solution’s architecture emphasizes data sovereignty, privacy, and transparency, aligning with the priorities of governments worldwide. Hosted by LF Charities, which will support the operation and function of the project, Tazama showcases the scalability and robustness of open source solutions, particularly in critical infrastructure like national payment switches. We are thrilled to be counted alongside many other incredible open source projects working to achieve the United Nations Sustainable Development Goals. For more information, visit the Digital Public Goods Alliance Registry. The post Project Tazama, A Project Hosted by LF Charities With Support From the Gates Foundation, Receives Digital Public Good Designation. appeared first on Linux.com.
-
Securing Your Email Sending With Python: Authentication and Encryption
by: Ivan Djuric Thu, 19 Sep 2024 02:29:13 GMT Email encryption and authentication are modern security techniques that you can use to protect your emails and their content from unauthorized access. Everyone, from individuals to business owners, uses emails for official communication, which may contain sensitive information. Therefore, securing emails is important, especially when cyberattacks like phishing, smishing, etc. are soaring high. In this article, I'll discuss how to send emails in Python securely using email encryption and authentication. Setting Up Your Python Environment Before you start creating the code for sending emails, set up your Python environment first with the configurations and libraries you'll need. You can send emails in Python using: Simple Mail Transfer Protocol (SMTP): This application-level protocol simplifies the process since Python offers an in-built library or module (smtplib) for sending emails. It's suitable for businesses of all sizes as well as individuals to automate secure email sending in Python. We're using the Gmail SMTP service in this article. An email API: You can leverage a third-party API like Mailtrap Python SDK, SendGrid, Gmail API, etc., to dispatch emails in Python. This method offers more features and high email delivery speeds, although it requires some investment. In this tutorial, we're opting for the first choice - sending emails in Python using SMTP, facilitated by the smtplib library. This library uses the RFC 821 protocol and interacts with SMTP and mail servers to streamline email dispatch from your applications. Additionally, you should install packages to enable Python email encryption, authentication, and formatting. Step 1: Install Python Install the Python programming language on your computer (Windows, macOS, Linux, etc.). You can visit the official Python website and download and install it from there. If you've already installed it, run this code to verify it: python --version Step 2: Install Necessary Modules and Libraries smtplib: This handles SMTP communications. Use the code below to import 'smtplib' and connect with your email server: import smtplib email module: This provides classes (Subject, To, From, etc.) to construct and parse emails. It also facilitates email encoding and decoding with Multipurpose Internet Mail Extensions (MIME). MIMEText: It's used for formatting your emails and supports sending emails with text and attachments like images, videos, etc. Import this using the code below: import MIMEText MIMEMultipart: Use this library to add attachments and text sections separately in your email. import MIMEMultipart ssl: It provides Secure Sockets Layer (SSL) encryption. Step 3: Create a Gmail Account To send emails using the Gmail SMTP email service, I recommend creating a test account to develop the code. Delete the account once you've tested the code. The reason is, you'll need to modify the security settings of your Gmail account to enable access from the Python code for sending emails. This might expose the login details, compromising security. In addition, it will flood your account with too many test emails. So, instead of using your own Gmail account, create a new one for creating and testing the code. Here's how to do this: Create a fresh Gmail account Set up your app password: Google Account > Security > Turn on 2-Step Verification > Security > Set up an App Password Next, define a name for the app password and click on "Generate". You'll get a 16-digit password after following some instructions on the screen. Store the password safely. Use this password while sending emails in Python. Here, we're using Gmail SMTP, but if you want to use another mail service provider, follow the same process. Alternatively, contact your company's IT team to seek support in accessing your SMTP server. Email Authentication With Python Email authentication is a security mechanism that verifies the sender's identity, ensuring the emails from a domain are legitimate. If you have no email authentication mechanism in place, your emails might land in spam folders, or malicious actors can spoof or intercept them. This could affect your email delivery rates and the sender's reputation. This is the reason you must enable Python email authentication mechanisms and protocols, such as: SMTP authentication: If you're sending emails using an SMTP server like Gmail SMTP, you can use this method of authentication. It verifies the sender's authenticity when sending emails via a specific mail server. SPF: Stands for Sender Policy Framework and checks whether the IP address of the sending server is among DKIM: Stands for DomainKeys Identified Mail and is used to add a digital signature to emails to ensure no one can alter the email's content while it's in transmission. The receiver's server will then verify the digital signature. Thus, all your emails and their content stay secure and unaltered. DMARC: Stands for Domain-based Message Authentication, Reporting, and Conformance. DMARC instructs mail servers what to do if an email fails authentication. In addition, it provides reports upon detecting any suspicious activities on your domain. How to Implement Email Authentication in Python To authenticate your email in Python using SMTP, the smtplib library is useful. Here's how Python SMTP security works: import smtplib server = smtplib.SMTP('smtp.domain1.com', 587) server.starttls() # Start TLS for secure connection server.login('my_email@domain1.com', 'my_password') message = "Subject: Test Email." server.sendmail('my_email@domain1.com', 'receiver@domain2.com', message) server.quit() Implementing email authentication will add an additional layer of security to your emails and protect them from attackers or from being marked as spam. Encrypting Emails With Python Encrypting emails enables you to protect your email's content so that only authorized senders and receivers can access or view the content. Encrypting emails with Python is done using encryption techniques to encode the email message and transform it into a secure and unreadable format (also known as ciphertext). This way, email encryption secures the message from unauthorized access or attackers even if they intercept the email. Here are different types of email encryption: SSL: This stands for Secure Sockets Layer, one of the most popular and widely used encryption protocols. SSL ensures email confidentiality by encrypting data transmitted between the mail server and the client. TLS: This stands for Transport Layer Security and is a common email encryption protocol today. Many consider it a great alternative to SSL. It encrypts the connection between an email client and the mail server to prevent anyone from intercepting the email during its transmission. E2EE: This stands for end-to-end encryption, ensuring only the intended recipient with valid credentials can decrypt the email content and read it. It aims to prevent email interception and secure the message. How to Implement Email Encryption in Python If your mail server requires SSL encryption, here's how to send an email in Python: import smtplib import ssl context = ssl.create_default_context() server = smtplib.SMTP_SSL('smtp.domain1.com', 465, context=context) # This is for SSL connections, requiring port number 465 server.login('my_email@domain1.com', 'my_password') message = "Subject: SSL Encrypted Email." server.sendmail('my_email@domain1.com', 'receiver@domain2.com', message) server.quit() For TLS connections, you'll need the smtplib library: import smtplib server = smtplib.SMTP('smtp.domain1.com', 587) # TLS requires 587 port number server.starttls() # Start TLS encryption server.login('my_email@domain1.com', 'my_password') message = "Subject: TLS Encrypted Email." server.sendmail('my_email@domain1.com', 'receiver@domain2.com', message) server.quit() For end-to-end encryption, you'll need more advanced libraries or tools such as GnuPG, OpenSSL, Signal Protocol, and more. Combining Authentication and Encryption Email Security with Python requires both encryption and authentication. This ensures that mail servers find the email legitimate and it stays safe from cyber attackers and unauthorized access during transmission. For email encryption, you can use either SSL or TLS and combine it with SMTP authentication to establish a robust email connection. Now that you know how to enable email encryption and authentication in your emails, let's examine some complete code examples to understand how you can send secure emails in Python using Gmail SMTP and email encryption (SSL). Code Examples 1. Sending a Plain Text Email import smtplib from email.mime.text import MIMEText subject = "Plain Text Email" body = "This is a plain text email using Gmail SMTP and SSL." sender = "sender1@gmail.com" receivers = ["receiver1@gmail.com", "receiver2@gmail.com"] password = "my_password" def send_email(subject, body, sender, receivers, password): msg = MIMEText(body) msg['Subject'] = subject msg['From'] = sender msg['To'] = ', '.join(receivers) with smtplib.SMTP_SSL('smtp.gmail.com', 465) as smtp_server: smtp_server.login(sender, password) smtp_server.sendmail(sender, receivers, msg.as_string()) print("The plain text email is sent successfully!") send_email(subject, body, sender, receivers, password) Explanation: sender: This contains the sender's address. receivers: This contains email addresses of receiver 1 and receiver 2. msg: This is the content of the email. sendmail(): This is the SMTP object's instance method. It takes three parameters - sender, receiver, and msg and sends the message. with: This is a context manager that is used to properly close an SMTP connection once an email is sent. MIMEText: This holds only plain text. 2. Sending an Email with Attachments To send an email in Python with attachments securely, you will need some additional libraries like MIMEBase and encoders. Here's the code for this case: import smtplib from email import encoders from email.mime.base import MIMEBase from email.mime.multipart import MIMEMultipart from email.mime.text import MIMEText sender = "sender1@gmail.com" password = "my_password" receiver = "receiver1@gmail.com" subject = "Email with Attachments" body = "This is an email with attachments created in Python using Gmail SMTP and SSL." with open("attachment.txt", "rb") as attachment: part = MIMEBase("application", "octet-stream") # Adding the attachment to the email part.set_payload(attachment.read()) encoders.encode_base64(part) part.add_header( "Content-Disposition", # The header indicates that the file name is an attachment. f"attachment; filename='attachment.txt'", ) message = MIMEMultipart() message['Subject'] = subject message['From'] = sender message['To'] = receiver html_part = MIMEText(body) message.attach(html_part) # To attach the file message.attach(part) with smtplib.SMTP_SSL('smtp.gmail.com', 465) as server: server.login(sender, password) server.sendmail(sender, receiver, message.as_string()) Explanation: MIMEMultipart: This library allows you to add text and attachments both to an email separately. 'rb': It represents binary mode for the attachment to be opened and the content to be read. MIMEBase: This object is applicable to any file type. Encode and Base64: The file will be encoded in Base64 for safe email sending. Sending an HTML Email in Python To send an HTML email in Python using Gmail SMTP, you need a class - MIMEText. Here's the full code for Python send HTML email: import smtplib from email.mime.text import MIMEText sender = "sender1@gmail.com" password = "my_password" receiver = "receiver1@gmail.com" subject = "HTML Email in Python" body = """ <html> <body> <p>HTML email created in Python with SSL and Gmail SMTP.</p> </body> </html> """ message = MIMEText(body, 'html') # To attach the HTML content to the email message['Subject'] = subject message['From'] = sender message['To'] = receiver with smtplib.SMTP_SSL('smtp.gmail.com', 465) as server: server.login(sender, password) server.sendmail(sender, receiver, message.as_string()) Testing Your Email With Authentication and Encryption Testing your emails before sending them to the recipients is important. It enables you to discover any issues or bugs in sending emails or with the formatting, content, etc. Thus, always test your emails on a staging server before delivering them to your target recipients, especially when sending emails in bulk. Testing emails provide the following advantages: Ensures the email sending functionality is working fine Emails have proper formatting and no broken links or attachments Prevents flooding the recipient's inbox with a large number of test emails Enhances email deliverability and reduces spam rates Ensures the email and its contents stay protected from attacks and unauthorized access To test this combined setup of sending emails in Python with authentication and encryption enabled, use an email testing server like Mailtrap Email Testing. This will capture all the SMTP traffic from the staging environment, and detect and debug your emails before sending them. It will also analyze the email content, validate CSS/HTML, and provide a spam score so you can improve your email sending. To get started: Open Mailtrap Email Testing Go to 'My Inbox' Click on 'Show Credentials' to get your test credentials - login and password details Here's the Full Code Example for Testing Your Emails: import smtplib from socket import gaierror port = 2525 # Define the SMTP server separately smtp_server = "sandbox.smtp.mailtrap.io" login = "xyz123" # Paste your Mailtrap login details password = "abc$$" # Paste your Mailtrap password sender = "test_sender@test.com" receiver = "test_receiver@example.com" message = f"""\ Subject: Hello There! To: {receiver} From: {sender} This is a test email.""" try: with smtplib.SMTP(smtp_server, port) as server: # Use Mailtrap-generated credentials for port, server name, login, and password server.login(login, password) server.sendmail(sender, receiver, message) print('Sent') except (gaierror, ConnectionRefusedError): # In case of errors print('Unable to connect to the server.') except smtplib.SMTPServerDisconnected: print('Server connection failed!') except smtplib.SMTPException as e: print('SMTP error: ' + str(e)) If there's no error, you should see this message in the receiver's inbox: This is a test email. Best Practices for Secure Email Sending Consider the below Python email best practices for secure email sending: Protect data: Take appropriate security measures to protect your sensitive data such as SMTP credentials, API keys, etc. Store them in a secure, private place like config files or environment variables, ensuring no one can access them publicly. Encryption and authentication: Always use email encryption and authentication so that only authorized individuals can access your emails and their content. For authentication, you can use advanced methods like API keys, two-factor authentication, single sign-on (SSO), etc. Similarly, use advanced encryption techniques like SSL, TLS, E2EE, etc. Error handling: Manage network issues, authentication errors, and other issues by handling errors effectively using except/try blocks in your code. Rate-Limiting: Maintain high email deliverability by rate-limiting the email sending functionality to prevent exceeding your service limits. Validate Emails: Validate email addresses from your list and remove invalid ones to enhance email deliverability and prevent your domain from getting marked as spam. You can use an email validation tool to do this. Educate: Keep your team updated with secure email practices and cybersecurity risks. Monitor your spam score and email deliverability rates, and work to improve them. Wrapping Up Secure email sending with Python using advanced email encryption methods like SSL, TLS, and end-to-end encryption, as well as authentication protocols and techniques such as SPF, DMARC, 2FA, and API keys. By combining these security measures, you can protect your confidential email information, improve email deliverability, and maintain trust with your target recipients. In this way, only individuals with appropriate credentials can access it. This will help prevent unauthorized access, data breaches, and other cybersecurity attacks.
-
Using Proxies in Web Scraping – All You Need to Know
by: Leonardo Rodriguez Thu, 12 Sep 2024 13:23:00 GMT Introduction Web scraping typically refers to an automated process of collecting data from websites. On a high level, you're essentially making a bot that visits a website, detects the data you're interested in, and then stores it into some appropriate data structure, so you can easily analyze and access it later. However, if you're concerned about your anonymity on the Internet, you should probably take a little more care when scraping the web. Since your IP address is public, a website owner could track it down and, potentially, block it. So, if you want to stay as anonymous as possible, and prevent being blocked from visiting a certain website, you should consider using proxies when scraping the web. Proxies, also referred to as proxy servers, are specialized servers that enable you not to directly access the websites you're scraping. Rather, you'll be routing your scraping requests via a proxy server. That way, your IP address gets "hidden" behind the IP address of the proxy server you're using. This can help you both stay as anonymous as possible, as well as not being blocked, so you can keep scraping as long as you want. In this comprehensive guide, you'll get a grasp of the basics of web scraping and proxies, you'll see the actual, working example of scraping a website using proxies in Node.js. Afterward, we'll discuss why you might consider using existing scraping solutions (like ScraperAPI) over writing your own web scraper. At the end, we'll give you some tips on how to overcome some of the most common issues you might face when scraping the web. Web Scraping Web scraping is the process of extracting data from websites. It automates what would otherwise be a manual process of gathering information, making the process less time-consuming and prone to errors. That way you can collect a large amount of data quickly and efficiently. Later, you can analyze, store, and use it. The primary reason you might scrape a website is to obtain data that is either unavailable through an existing API or too vast to collect manually. It's particularly useful when you need to extract information from multiple pages or when the data is spread across different websites. There are many real-world applications that utilize the power of web scraping in their business model. The majority of apps helping you track product prices and discounts, find cheapest flights and hotels, or even collect job posting data for job seekers, use the technique of web scraping to gather the data that provides you the value. Web Proxies Imagine you're sending a request to a website. Usually, your request is sent from your machine (with your IP address) to the server that hosts a website you're trying to access. That means that the server "knows" your IP address and it can block you based on your geo-location, the amount of traffic you're sending to the website, and many more factors. But when you send a request through a proxy, it routes the request through another server, hiding your original IP address behind the IP address of the proxy server. This not only helps in maintaining anonymity but also plays a crucial role in avoiding IP blocking, which is a common issue in web scraping. By rotating through different IP addresses, proxies allow you to distribute your requests, making them appear as if they're coming from various users. This reduces the likelihood of getting blocked and increases the chances of successfully scraping the desired data. Types of Proxies Typically, there are four main types of proxy servers - datacenter, residential, rotating, and mobile. Each of them has its pros and cons, and based on that, you'll use them for different purposes and at different costs. Datacenter proxies are the most common and cost-effective proxies, provided by third-party data centers. They offer high speed and reliability but are more easily detectable and can be blocked by websites more frequently. Residential proxies route your requests through real residential IP addresses. Since they appear as ordinary user connections, they are less likely to be blocked but are typically more expensive. Rotating proxies automatically change the IP address after each request or after a set period. This is particularly useful for large-scale scraping projects, as it significantly reduces the chances of being detected and blocked. Mobile proxies use IP addresses associated with mobile devices. They are highly effective for scraping mobile-optimized websites or apps and are less likely to be blocked, but they typically come at a premium cost. ISP proxies are a newer type that combines the reliability of datacenter proxies with the legitimacy of residential IPs. They use IP addresses from Internet Service Providers but are hosted in data centers, offering a balance between performance and detection avoidance. Example Web Scraping Project Let's walk through a practical example of a web scraping project, and demonstrate how to set up a basic scraper, integrate proxies, and use a scraping service like ScraperAPI. Setting up Before you dive into the actual scraping process, it's essential to set up your development environment. For this example, we'll be using Node.js since it's well-suited for web scraping due to its asynchronous capabilities. We'll use Axios for making HTTP requests, and Cheerio to parse and manipulate HTML (that's contained in the response of the HTTP request). First, ensure you have Node.js installed on your system. If you don't have it, download and install it from nodejs.org. Then, create a new directory for your project and initialize it: $ mkdir my-web-scraping-project $ cd my-web-scraping-project $ npm init -y Finally, install Axios and Cheerio since they are necessary for you to implement your web scraping logic: $ npm install axios cheerio Simple Web Scraping Script Now that your environment is set up, let's create a simple web scraping script. We'll scrape a sample website to gather famous quotes and their authors. So, create a JavaScript file named sample-scraper.js and write all the code inside of it. Import the packages you'll need to send HTTP requests and manipulate the HTML: const axios = require('axios'); const cheerio = require('cheerio'); Next, create a wrapper function that will contain all the logic you need to scrape data from a web page. It accepts the URL of a website you want to scrape as an argument and returns all the quotes found on the page: // Function to scrape data from a webpage async function scrapeWebsite(url) { try { // Send a GET request to the webpage const response = await axios.get(url); // Load the HTML into cheerio const $ = cheerio.load(response.data); // Extract all elements with the class 'quote' const quotes = []; $('div.quote').each((index, element) => { // Extracting text from span with class 'text' const quoteText = $(element).find('span.text').text().trim(); // Assuming there's a small tag for the author const author = $(element).find('small.author').text().trim(); quotes.push({ quote: quoteText, author: author }); }); // Output the quotes console.log("Quotes found on the webpage:"); quotes.forEach((quote, index) => { console.log(`${index + 1}: "${quote.quote}" - ${quote.author}`); }); } catch (error) { console.error(`An error occurred: ${error.message}`); } } Note: All the quotes are stored in a separate div element with a class of quote. Each quote has its text and author - text is stored under the span element with the class of text, and the author is within the small element with the class of author. Finally, specify the URL of the website you want to scrape - in this case, https://quotes.toscrape.com, and call the scrapeWebsite() function: // URL of the website you want to scrape const url = 'https://quotes.toscrape.com'; // Call the function to scrape the website scrapeWebsite(url); All that's left for you to do is to run the script from the terminal: $ node sample-scraper.js Integrating Proxies To use a proxy with axios, you specify the proxy settings in the request configuration. The axios.get() method can include the proxy configuration, allowing the request to route through the specified proxy server. The proxy object contains the host, port, and optional authentication details for the proxy: // Send a GET request to the webpage with proxy configuration const response = await axios.get(url, { proxy: { host: proxy.host, port: proxy.port, auth: { username: proxy.username, // Optional: Include if your proxy requires authentication password: proxy.password, // Optional: Include if your proxy requires authentication }, }, }); Note: You need to replace these placeholders with your actual proxy details. Other than this change, the entire script remains the same: // Function to scrape data from a webpage async function scrapeWebsite(url) { try { // Send a GET request to the webpage with proxy configuration const response = await axios.get(url, { proxy: { host: proxy.host, port: proxy.port, auth: { username: proxy.username, // Optional: Include if your proxy requires authentication password: proxy.password, // Optional: Include if your proxy requires authentication }, }, }); // Load the HTML into cheerio const $ = cheerio.load(response.data); // Extract all elements with the class 'quote' const quotes = []; $('div.quote').each((index, element) => { // Extracting text from span with class 'text' const quoteText = $(element).find('span.text').text().trim(); // Assuming there's a small tag for the author const author = $(element).find('small.author').text().trim(); quotes.push({ quote: quoteText, author: author }); }); // Output the quotes console.log("Quotes found on the webpage:"); quotes.forEach((quote, index) => { console.log(`${index + 1}: "${quote.quote}" - ${quote.author}`); }); } catch (error) { console.error(`An error occurred: ${error.message}`); } } Using Headless Browsers for Advanced Scraping For websites with complex JavaScript interactions, you might need to use a headless browser instead of simple HTTP requests. Tools like Puppeteer or Playwright allow you to automate a real browser, execute JavaScript, and interact with dynamic content. Here's a simple example using Puppeteer: const puppeteer = require('puppeteer'); async function scrapeWithPuppeteer(url) { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto(url, { waitUntil: 'networkidle2' }); // Extract data using page.evaluate const quotes = await page.evaluate(() => { const results = []; document.querySelectorAll('div.quote').forEach(quote => { results.push({ text: quote.querySelector('span.text').textContent, author: quote.querySelector('small.author').textContent }); }); return results; }); console.log(quotes); await browser.close(); } Headless browsers can also be configured to use proxies, making them powerful tools for scraping complex websites while maintaining anonymity. Integrating a Scraping Service Using a scraping service like ScraperAPI offers several advantages over manual web scraping since it's designed to tackle all of the major problems you might face when scraping websites: Automatically handles common web scraping obstacles such as CAPTCHAs, JavaScript rendering, and IP blocks. Automatically handles proxies - proxy configuration, rotation, and much more. Instead of building your own scraping infrastructure, you can leverage ScraperAPI's pre-built solutions. This saves significant development time and resources that can be better spent on analyzing the scraped data. ScraperAPI offers various customization options such as geo-location targeting, custom headers, and asynchronous scraping. You can personalize the service to suit your specific scraping needs. Using a scraping API like ScraperAPI is often more cost-effective than building and maintaining your own scraping infrastructure. The pricing is based on usage, allowing you to scale up or down as needed. ScraperAPI allows you to scale your scraping efforts by handling millions of requests concurrently. To implement the ScraperAPI proxy into the scraping script you've created so far, there are just a few tweaks you need to make in the axios configuration. First of all, ensure you have created a free ScraperAPI account. That way, you'll have access to your API key, which will be necessary in the following steps. Once you get the API key, use it as a password in the axios proxy configuration from the previous section: // Send a GET request to the webpage with ScraperAPI proxy configuration axios.get(url, { method: 'GET', proxy: { host: 'proxy-server.scraperapi.com', port: 8001, auth: { username: 'scraperapi', password: 'YOUR_API_KEY' // Paste your API key here }, protocol: 'http' } }); And, that's it, all of your requests will be routed through the ScraperAPI proxy servers. But to use the full potential of a scraping service you'll have to configure it using the service's dashboard - ScraperAPI is no different here. It has a user-friendly dashboard where you can set up the web scraping process to best fit your needs. You can enable proxy or async mode, JavaScript rendering, set a region from where the requests will be sent, set your own HTTP headers, timeouts, and much more. And the best thing is that ScraperAPI automatically generates a script containing all of the scraper settings, so you can easily integrate the scraper into your codebase. Best Practices for Using Proxies in Web Scraping Not every proxy provider and its configuration are the same. So, it's important to know what proxy service to choose and how to configure it properly. Let's take a look at some tips and tricks to help you with that! Rotate Proxies Regularly Implement a proxy rotation strategy that changes the IP address after a certain number of requests or at regular intervals. This approach can mimic human browsing behavior, making it less likely for websites to flag your activities as suspicious. Handle Rate Limits Many websites enforce rate limits to prevent excessive scraping. To avoid hitting these limits, you can: Introduce Delays: Add random delays between requests to simulate human behavior. Monitor Response Codes: Track HTTP response codes to detect when you are being rate-limited. If you receive a 429 (Too Many Requests) response, pause your scraping for a while before trying again. Implement Exponential Backoff: Rather than using fixed delays, implement exponential backoff that increases wait time after each failed request, which is more effective at handling rate limits. Use Quality Proxies Choosing high-quality proxies is crucial for successful web scraping. Quality proxies, especially residential ones, are less likely to be detected and banned by target websites. That's why it's crucial to understand how to use residential proxies for your business, enabling you to find valuable leads while avoiding website bans. Using a mix of high-quality proxies can significantly enhance your chances of successful scraping without interruptions. Quality proxy services often provide a wide range of IP addresses from different regions, enabling you to bypass geo-restrictions and access localized content. Reliable proxy services can offer faster response times and higher uptime, which is essential when scraping large amounts of data. As your scraping needs grow, having access to a robust proxy service allows you to scale your operations without the hassle of managing your own infrastructure. Using a reputable proxy service often comes with customer support and maintenance, which can save you time and effort in troubleshooting issues related to proxies. Handling CAPTCHAs and Other Challenges CAPTCHAs and anti-bot mechanisms are some of the most common obstacles you'll encounter while scraping a web. Websites use CAPTCHAs to prevent automated access by trying to differentiate real humans and automated bots. They're achieving that by prompting the users to solve various kinds of puzzles, identify distorted objects, and so on. That can make it really difficult for you to automatically scrape data. Even though there are many both manual and automated CAPTCHA solvers available online, the best strategy for handling CAPTCHAs is to avoid triggering them in the first place. Typically, they are triggered when non-human behavior is detected. For example, a large amount of traffic, sent from a single IP address, using the same HTTP configuration is definitely a red flag! So, when scraping a website, try mimicking human behavior as much as possible: Add delays between requests and spread them out as much as you can. Regularly rotate between multiple IP addresses using a proxy service. Randomize HTTP headers and user agents. Maintain and use cookies appropriately, as many websites track user sessions. Consider implementing browser fingerprint randomization to avoid tracking. Beyond CAPTCHAs, websites often use sophisticated anti-bot measures to detect and block scraping. Some websites use JavaScript to detect bots. Tools like Puppeteer can simulate a real browser environment, allowing your scraper to execute JavaScript and bypass these challenges. Websites sometimes add hidden form fields or links that only bots will interact with. So, try avoiding clicking on hidden elements or filling out forms with invisible fields. Advanced anti-bot systems go as far as tracking user behavior, such as mouse movements or time spent on a page. Mimicking these behaviors using browser automation tools can help bypass these checks. But the simplest and most efficient way to handle CAPTCHAs and anti-bot measures will definitely be to use a service like ScraperAPI. Sending your scraping requests through ScraperAPI's API will ensure you have the best chance of not being blocked. When the API receives the request, it uses advanced machine learning techniques to determine the best request configuration to prevent triggering CAPTCHAs and other anti-bot measures. Conclusion As websites became more sophisticated in their anti-scraping measures, the use of proxies has become increasingly important in maintaining your scraping project successful. Proxies help you maintain anonymity, prevent IP blocking, and enable you to scale your scraping efforts without getting obstructed by rate limits or geo-restrictions. In this guide, we've explored the fundamentals of web scraping and the crucial role that proxies play in this process. We've discussed how proxies can help maintain anonymity, avoid IP blocks, and distribute requests to mimic natural user behavior. We've also covered the different types of proxies available, each with its own strengths and ideal use cases. We demonstrated how to set up a basic web scraper and integrate proxies into your scraping script. We also explored the benefits of using a dedicated scraping service like ScraperAPI, which can simplify many of the challenges associated with web scraping at scale. In the end, we covered the importance of carefully choosing the right type of proxy, rotating them regularly, handling rate limits, and leveraging scraping services when necessary. That way, you can ensure that your web scraping projects will be efficient, reliable, and sustainable. Remember that while web scraping can be a powerful data collection technique, it should always be done responsibly and ethically, with respect for website terms of service and legal considerations.
-
Using Proxies in Web Scraping – All You Need to Know
by: Leonardo Rodriguez Thu, 12 Sep 2024 13:23:00 GMT Introduction Web scraping typically refers to an automated process of collecting data from websites. On a high level, you're essentially making a bot that visits a website, detects the data you're interested in, and then stores it into some appropriate data structure, so you can easily analyze and access it later. However, if you're concerned about your anonymity on the Internet, you should probably take a little more care when scraping the web. Since your IP address is public, a website owner could track it down and, potentially, block it. So, if you want to stay as anonymous as possible, and prevent being blocked from visiting a certain website, you should consider using proxies when scraping the web. Proxies, also referred to as proxy servers, are specialized servers that enable you not to directly access the websites you're scraping. Rather, you'll be routing your scraping requests via a proxy server. That way, your IP address gets "hidden" behind the IP address of the proxy server you're using. This can help you both stay as anonymous as possible, as well as not being blocked, so you can keep scraping as long as you want. In this comprehensive guide, you'll get a grasp of the basics of web scraping and proxies, you'll see the actual, working example of scraping a website using proxies in Node.js. Afterward, we'll discuss why you might consider using existing scraping solutions (like ScraperAPI) over writing your own web scraper. At the end, we'll give you some tips on how to overcome some of the most common issues you might face when scraping the web. Web Scraping Web scraping is the process of extracting data from websites. It automates what would otherwise be a manual process of gathering information, making the process less time-consuming and prone to errors. That way you can collect a large amount of data quickly and efficiently. Later, you can analyze, store, and use it. The primary reason you might scrape a website is to obtain data that is either unavailable through an existing API or too vast to collect manually. It's particularly useful when you need to extract information from multiple pages or when the data is spread across different websites. There are many real-world applications that utilize the power of web scraping in their business model. The majority of apps helping you track product prices and discounts, find cheapest flights and hotels, or even collect job posting data for job seekers, use the technique of web scraping to gather the data that provides you the value. Web Proxies Imagine you're sending a request to a website. Usually, your request is sent from your machine (with your IP address) to the server that hosts a website you're trying to access. That means that the server "knows" your IP address and it can block you based on your geo-location, the amount of traffic you're sending to the website, and many more factors. But when you send a request through a proxy, it routes the request through another server, hiding your original IP address behind the IP address of the proxy server. This not only helps in maintaining anonymity but also plays a crucial role in avoiding IP blocking, which is a common issue in web scraping. By rotating through different IP addresses, proxies allow you to distribute your requests, making them appear as if they're coming from various users. This reduces the likelihood of getting blocked and increases the chances of successfully scraping the desired data. Types of Proxies Typically, there are four main types of proxy servers - datacenter, residential, rotating, and mobile. Each of them has its pros and cons, and based on that, you'll use them for different purposes and at different costs. Datacenter proxies are the most common and cost-effective proxies, provided by third-party data centers. They offer high speed and reliability but are more easily detectable and can be blocked by websites more frequently. Residential proxies route your requests through real residential IP addresses. Since they appear as ordinary user connections, they are less likely to be blocked but are typically more expensive. Rotating proxies automatically change the IP address after each request or after a set period. This is particularly useful for large-scale scraping projects, as it significantly reduces the chances of being detected and blocked. Mobile proxies use IP addresses associated with mobile devices. They are highly effective for scraping mobile-optimized websites or apps and are less likely to be blocked, but they typically come at a premium cost. ISP proxies are a newer type that combines the reliability of datacenter proxies with the legitimacy of residential IPs. They use IP addresses from Internet Service Providers but are hosted in data centers, offering a balance between performance and detection avoidance. Example Web Scraping Project Let's walk through a practical example of a web scraping project, and demonstrate how to set up a basic scraper, integrate proxies, and use a scraping service like ScraperAPI. Setting up Before you dive into the actual scraping process, it's essential to set up your development environment. For this example, we'll be using Node.js since it's well-suited for web scraping due to its asynchronous capabilities. We'll use Axios for making HTTP requests, and Cheerio to parse and manipulate HTML (that's contained in the response of the HTTP request). First, ensure you have Node.js installed on your system. If you don't have it, download and install it from nodejs.org. Then, create a new directory for your project and initialize it: $ mkdir my-web-scraping-project $ cd my-web-scraping-project $ npm init -y Finally, install Axios and Cheerio since they are necessary for you to implement your web scraping logic: $ npm install axios cheerio Simple Web Scraping Script Now that your environment is set up, let's create a simple web scraping script. We'll scrape a sample website to gather famous quotes and their authors. So, create a JavaScript file named sample-scraper.js and write all the code inside of it. Import the packages you'll need to send HTTP requests and manipulate the HTML: const axios = require('axios'); const cheerio = require('cheerio'); Next, create a wrapper function that will contain all the logic you need to scrape data from a web page. It accepts the URL of a website you want to scrape as an argument and returns all the quotes found on the page: // Function to scrape data from a webpage async function scrapeWebsite(url) { try { // Send a GET request to the webpage const response = await axios.get(url); // Load the HTML into cheerio const $ = cheerio.load(response.data); // Extract all elements with the class 'quote' const quotes = []; $('div.quote').each((index, element) => { // Extracting text from span with class 'text' const quoteText = $(element).find('span.text').text().trim(); // Assuming there's a small tag for the author const author = $(element).find('small.author').text().trim(); quotes.push({ quote: quoteText, author: author }); }); // Output the quotes console.log("Quotes found on the webpage:"); quotes.forEach((quote, index) => { console.log(`${index + 1}: "${quote.quote}" - ${quote.author}`); }); } catch (error) { console.error(`An error occurred: ${error.message}`); } } Note: All the quotes are stored in a separate div element with a class of quote. Each quote has its text and author - text is stored under the span element with the class of text, and the author is within the small element with the class of author. Finally, specify the URL of the website you want to scrape - in this case, https://quotes.toscrape.com, and call the scrapeWebsite() function: // URL of the website you want to scrape const url = 'https://quotes.toscrape.com'; // Call the function to scrape the website scrapeWebsite(url); All that's left for you to do is to run the script from the terminal: $ node sample-scraper.js Integrating Proxies To use a proxy with axios, you specify the proxy settings in the request configuration. The axios.get() method can include the proxy configuration, allowing the request to route through the specified proxy server. The proxy object contains the host, port, and optional authentication details for the proxy: // Send a GET request to the webpage with proxy configuration const response = await axios.get(url, { proxy: { host: proxy.host, port: proxy.port, auth: { username: proxy.username, // Optional: Include if your proxy requires authentication password: proxy.password, // Optional: Include if your proxy requires authentication }, }, }); Note: You need to replace these placeholders with your actual proxy details. Other than this change, the entire script remains the same: // Function to scrape data from a webpage async function scrapeWebsite(url) { try { // Send a GET request to the webpage with proxy configuration const response = await axios.get(url, { proxy: { host: proxy.host, port: proxy.port, auth: { username: proxy.username, // Optional: Include if your proxy requires authentication password: proxy.password, // Optional: Include if your proxy requires authentication }, }, }); // Load the HTML into cheerio const $ = cheerio.load(response.data); // Extract all elements with the class 'quote' const quotes = []; $('div.quote').each((index, element) => { // Extracting text from span with class 'text' const quoteText = $(element).find('span.text').text().trim(); // Assuming there's a small tag for the author const author = $(element).find('small.author').text().trim(); quotes.push({ quote: quoteText, author: author }); }); // Output the quotes console.log("Quotes found on the webpage:"); quotes.forEach((quote, index) => { console.log(`${index + 1}: "${quote.quote}" - ${quote.author}`); }); } catch (error) { console.error(`An error occurred: ${error.message}`); } } Using Headless Browsers for Advanced Scraping For websites with complex JavaScript interactions, you might need to use a headless browser instead of simple HTTP requests. Tools like Puppeteer or Playwright allow you to automate a real browser, execute JavaScript, and interact with dynamic content. Here's a simple example using Puppeteer: const puppeteer = require('puppeteer'); async function scrapeWithPuppeteer(url) { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto(url, { waitUntil: 'networkidle2' }); // Extract data using page.evaluate const quotes = await page.evaluate(() => { const results = []; document.querySelectorAll('div.quote').forEach(quote => { results.push({ text: quote.querySelector('span.text').textContent, author: quote.querySelector('small.author').textContent }); }); return results; }); console.log(quotes); await browser.close(); } Headless browsers can also be configured to use proxies, making them powerful tools for scraping complex websites while maintaining anonymity. Integrating a Scraping Service Using a scraping service like ScraperAPI offers several advantages over manual web scraping since it's designed to tackle all of the major problems you might face when scraping websites: Automatically handles common web scraping obstacles such as CAPTCHAs, JavaScript rendering, and IP blocks. Automatically handles proxies - proxy configuration, rotation, and much more. Instead of building your own scraping infrastructure, you can leverage ScraperAPI's pre-built solutions. This saves significant development time and resources that can be better spent on analyzing the scraped data. ScraperAPI offers various customization options such as geo-location targeting, custom headers, and asynchronous scraping. You can personalize the service to suit your specific scraping needs. Using a scraping API like ScraperAPI is often more cost-effective than building and maintaining your own scraping infrastructure. The pricing is based on usage, allowing you to scale up or down as needed. ScraperAPI allows you to scale your scraping efforts by handling millions of requests concurrently. To implement the ScraperAPI proxy into the scraping script you've created so far, there are just a few tweaks you need to make in the axios configuration. First of all, ensure you have created a free ScraperAPI account. That way, you'll have access to your API key, which will be necessary in the following steps. Once you get the API key, use it as a password in the axios proxy configuration from the previous section: // Send a GET request to the webpage with ScraperAPI proxy configuration axios.get(url, { method: 'GET', proxy: { host: 'proxy-server.scraperapi.com', port: 8001, auth: { username: 'scraperapi', password: 'YOUR_API_KEY' // Paste your API key here }, protocol: 'http' } }); And, that's it, all of your requests will be routed through the ScraperAPI proxy servers. But to use the full potential of a scraping service you'll have to configure it using the service's dashboard - ScraperAPI is no different here. It has a user-friendly dashboard where you can set up the web scraping process to best fit your needs. You can enable proxy or async mode, JavaScript rendering, set a region from where the requests will be sent, set your own HTTP headers, timeouts, and much more. And the best thing is that ScraperAPI automatically generates a script containing all of the scraper settings, so you can easily integrate the scraper into your codebase. Best Practices for Using Proxies in Web Scraping Not every proxy provider and its configuration are the same. So, it's important to know what proxy service to choose and how to configure it properly. Let's take a look at some tips and tricks to help you with that! Rotate Proxies Regularly Implement a proxy rotation strategy that changes the IP address after a certain number of requests or at regular intervals. This approach can mimic human browsing behavior, making it less likely for websites to flag your activities as suspicious. Handle Rate Limits Many websites enforce rate limits to prevent excessive scraping. To avoid hitting these limits, you can: Introduce Delays: Add random delays between requests to simulate human behavior. Monitor Response Codes: Track HTTP response codes to detect when you are being rate-limited. If you receive a 429 (Too Many Requests) response, pause your scraping for a while before trying again. Implement Exponential Backoff: Rather than using fixed delays, implement exponential backoff that increases wait time after each failed request, which is more effective at handling rate limits. Use Quality Proxies Choosing high-quality proxies is crucial for successful web scraping. Quality proxies, especially residential ones, are less likely to be detected and banned by target websites. That's why it's crucial to understand how to use residential proxies for your business, enabling you to find valuable leads while avoiding website bans. Using a mix of high-quality proxies can significantly enhance your chances of successful scraping without interruptions. Quality proxy services often provide a wide range of IP addresses from different regions, enabling you to bypass geo-restrictions and access localized content. A proxy extension for Chrome also helps manage these IPs easily through your browser, offering a seamless way to switch locations on the fly. Reliable proxy services can offer faster response times and higher uptime, which is essential when scraping large amounts of data. As your scraping needs grow, having access to a robust proxy service allows you to scale your operations without the hassle of managing your own infrastructure. Using a reputable proxy service often comes with customer support and maintenance, which can save you time and effort in troubleshooting issues related to proxies. Handling CAPTCHAs and Other Challenges CAPTCHAs and anti-bot mechanisms are some of the most common obstacles you'll encounter while scraping a web. Websites use CAPTCHAs to prevent automated access by trying to differentiate real humans and automated bots. They're achieving that by prompting the users to solve various kinds of puzzles, identify distorted objects, and so on. That can make it really difficult for you to automatically scrape data. Even though there are many both manual and automated CAPTCHA solvers available online, the best strategy for handling CAPTCHAs is to avoid triggering them in the first place. Typically, they are triggered when non-human behavior is detected. For example, a large amount of traffic, sent from a single IP address, using the same HTTP configuration is definitely a red flag! So, when scraping a website, try mimicking human behavior as much as possible: Add delays between requests and spread them out as much as you can. Regularly rotate between multiple IP addresses using a proxy service. Randomize HTTP headers and user agents. Maintain and use cookies appropriately, as many websites track user sessions. Consider implementing browser fingerprint randomization to avoid tracking. Beyond CAPTCHAs, websites often use sophisticated anti-bot measures to detect and block scraping. Some websites use JavaScript to detect bots. Tools like Puppeteer can simulate a real browser environment, allowing your scraper to execute JavaScript and bypass these challenges. Websites sometimes add hidden form fields or links that only bots will interact with. So, try avoiding clicking on hidden elements or filling out forms with invisible fields. Advanced anti-bot systems go as far as tracking user behavior, such as mouse movements or time spent on a page. Mimicking these behaviors using browser automation tools can help bypass these checks. But the simplest and most efficient way to handle CAPTCHAs and anti-bot measures will definitely be to use a service like ScraperAPI. Sending your scraping requests through ScraperAPI's API will ensure you have the best chance of not being blocked. When the API receives the request, it uses advanced machine learning techniques to determine the best request configuration to prevent triggering CAPTCHAs and other anti-bot measures. Conclusion As websites became more sophisticated in their anti-scraping measures, the use of proxies has become increasingly important in maintaining your scraping project successful. Proxies help you maintain anonymity, prevent IP blocking, and enable you to scale your scraping efforts without getting obstructed by rate limits or geo-restrictions. In this guide, we've explored the fundamentals of web scraping and the crucial role that proxies play in this process. We've discussed how proxies can help maintain anonymity, avoid IP blocks, and distribute requests to mimic natural user behavior. We've also covered the different types of proxies available, each with its own strengths and ideal use cases. We demonstrated how to set up a basic web scraper and integrate proxies into your scraping script. We also explored the benefits of using a dedicated scraping service like ScraperAPI, which can simplify many of the challenges associated with web scraping at scale. In the end, we covered the importance of carefully choosing the right type of proxy, rotating them regularly, handling rate limits, and leveraging scraping services when necessary. That way, you can ensure that your web scraping projects will be efficient, reliable, and sustainable. Remember that while web scraping can be a powerful data collection technique, it should always be done responsibly and ethically, with respect for website terms of service and legal considerations.
-
Using Proxies in Web Scraping – All You Need to Know
by: Leonardo Rodriguez Thu, 12 Sep 2024 13:23:00 GMT Introduction Web scraping typically refers to an automated process of collecting data from websites. On a high level, you're essentially making a bot that visits a website, detects the data you're interested in, and then stores it into some appropriate data structure, so you can easily analyze and access it later. However, if you're concerned about your anonymity on the Internet, you should probably take a little more care when scraping the web. Since your IP address is public, a website owner could track it down and, potentially, block it. So, if you want to stay as anonymous as possible, and prevent being blocked from visiting a certain website, you should consider using proxies when scraping the web. Proxies, also referred to as proxy servers, are specialized servers that enable you not to directly access the websites you're scraping. Rather, you'll be routing your scraping requests via a proxy server. That way, your IP address gets "hidden" behind the IP address of the proxy server you're using. This can help you both stay as anonymous as possible, as well as not being blocked, so you can keep scraping as long as you want. In this comprehensive guide, you'll get a grasp of the basics of web scraping and proxies, you'll see the actual, working example of scraping a website using proxies in Node.js. Afterward, we'll discuss why you might consider using existing scraping solutions (like ScraperAPI) over writing your own web scraper. At the end, we'll give you some tips on how to overcome some of the most common issues you might face when scraping the web. Web Scraping Web scraping is the process of extracting data from websites. It automates what would otherwise be a manual process of gathering information, making the process less time-consuming and prone to errors. That way you can collect a large amount of data quickly and efficiently. Later, you can analyze, store, and use it. The primary reason you might scrape a website is to obtain data that is either unavailable through an existing API or too vast to collect manually. It's particularly useful when you need to extract information from multiple pages or when the data is spread across different websites. There are many real-world applications that utilize the power of web scraping in their business model. The majority of apps helping you track product prices and discounts, find cheapest flights and hotels, or even find a job, use the technique of web scraping to gather the data that provides you the value. Web Proxies Imagine you're sending a request to a website. Usually, your request is sent from your machine (with your IP address) to the server that hosts a website you're trying to access. That means that the server "knows" your IP address and it can block you based on your geo-location, the amount of traffic you're sending to the website, and many more factors. But when you send a request through a proxy, it routes the request through another server, hiding your original IP address behind the IP address of the proxy server. This not only helps in maintaining anonymity but also plays a crucial role in avoiding IP blocking, which is a common issue in web scraping. By rotating through different IP addresses, proxies allow you to distribute your requests, making them appear as if they're coming from various users. This reduces the likelihood of getting blocked and increases the chances of successfully scraping the desired data. Types of Proxies Typically, there are four main types of proxy servers - datacenter, residential, rotating, and mobile. Each of them has its pros and cons, and based on that, you'll use them for different purposes and at different costs. Datacenter proxies are the most common and cost-effective proxies, provided by third-party data centers. They offer high speed and reliability but are more easily detectable and can be blocked by websites more frequently. Residential proxies route your requests through real residential IP addresses. Since they appear as ordinary user connections, they are less likely to be blocked but are typically more expensive. Rotating proxies automatically change the IP address after each request or after a set period. This is particularly useful for large-scale scraping projects, as it significantly reduces the chances of being detected and blocked. Mobile proxies use IP addresses associated with mobile devices. They are highly effective for scraping mobile-optimized websites or apps and are less likely to be blocked, but they typically come at a premium cost. Example Web Scraping Project Let's walk through a practical example of a web scraping project, and demonstrate how to set up a basic scraper, integrate proxies, and use a scraping service like ScraperAPI. Setting up Before you dive into the actual scraping process, it's essential to set up your development environment. For this example, we'll be using Node.js since it's well-suited for web scraping due to its asynchronous capabilities. We'll use Axios for making HTTP requests, and Cheerio to parse and manipulate HTML (that's contained in the response of the HTTP request). First, ensure you have Node.js installed on your system. If you don't have it, download and install it from nodejs.org. Then, create a new directory for your project and initialize it: $ mkdir my-web-scraping-project $ cd my-web-scraping-project $ npm init -y Finally, install Axios and Cheerio since they are necessary for you to implement your web scraping logic: $ npm install axios cheerio Simple Web Scraping Script Now that your environment is set up, let's create a simple web scraping script. We'll scrape a sample website to gather famous quotes and their authors. So, create a JavaScript file named sample-scraper.js and write all the code inside of it. Import the packages you'll need to send HTTP requests and manipulate the HTML: const axios = require('axios'); const cheerio = require('cheerio'); Next, create a wrapper function that will contain all the logic you need to scrape data from a web page. It accepts the URL of a website you want to scrape as an argument and returns all the quotes found on the page: // Function to scrape data from a webpage async function scrapeWebsite(url) { try { // Send a GET request to the webpage const response = await axios.get(url); // Load the HTML into cheerio const $ = cheerio.load(response.data); // Extract all elements with the class 'quote' const quotes = []; $('div.quote').each((index, element) => { // Extracting text from span with class 'text' const quoteText = $(element).find('span.text').text().trim(); // Assuming there's a small tag for the author const author = $(element).find('small.author').text().trim(); quotes.push({ quote: quoteText, author: author }); }); // Output the quotes console.log("Quotes found on the webpage:"); quotes.forEach((quote, index) => { console.log(`${index + 1}: "${quote.quote}" - ${quote.author}`); }); } catch (error) { console.error(`An error occurred: ${error.message}`); } } Note: All the quotes are stored in a separate div element with a class of quote. Each quote has its text and author - text is stored under the span element with the class of text, and the author is within the small element with the class of author. Finally, specify the URL of the website you want to scrape - in this case, https://quotes.toscrape.com, and call the scrapeWebsite() function: // URL of the website you want to scrape const url = 'https://quotes.toscrape.com'; // Call the function to scrape the website scrapeWebsite(url); All that's left for you to do is to run the script from the terminal: $ node sample-scraper.js Integrating Proxies To use a proxy with axios, you specify the proxy settings in the request configuration. The axios.get() method can include the proxy configuration, allowing the request to route through the specified proxy server. The proxy object contains the host, port, and optional authentication details for the proxy: // Send a GET request to the webpage with proxy configuration const response = await axios.get(url, { proxy: { host: proxy.host, port: proxy.port, auth: { username: proxy.username, // Optional: Include if your proxy requires authentication password: proxy.password, // Optional: Include if your proxy requires authentication }, }, }); Note: You need to replace these placeholders with your actual proxy details. Other than this change, the entire script remains the same: // Function to scrape data from a webpage async function scrapeWebsite(url) { try { // Send a GET request to the webpage with proxy configuration const response = await axios.get(url, { proxy: { host: proxy.host, port: proxy.port, auth: { username: proxy.username, // Optional: Include if your proxy requires authentication password: proxy.password, // Optional: Include if your proxy requires authentication }, }, }); // Load the HTML into cheerio const $ = cheerio.load(response.data); // Extract all elements with the class 'quote' const quotes = []; $('div.quote').each((index, element) => { // Extracting text from span with class 'text' const quoteText = $(element).find('span.text').text().trim(); // Assuming there's a small tag for the author const author = $(element).find('small.author').text().trim(); quotes.push({ quote: quoteText, author: author }); }); // Output the quotes console.log("Quotes found on the webpage:"); quotes.forEach((quote, index) => { console.log(`${index + 1}: "${quote.quote}" - ${quote.author}`); }); } catch (error) { console.error(`An error occurred: ${error.message}`); } } Integrating a Scraping Service Using a scraping service like ScraperAPI offers several advantages over manual web scraping since it's designed to tackle all of the major problems you might face when scraping websites: Automatically handles common web scraping obstacles such as CAPTCHAs, JavaScript rendering, and IP blocks. Automatically handles proxies - proxy configuration, rotation, and much more. Instead of building your own scraping infrastructure, you can leverage ScraperAPI's pre-built solutions. This saves significant development time and resources that can be better spent on analyzing the scraped data. ScraperAPI offers various customization options such as geo-location targeting, custom headers, and asynchronous scraping. You can personalize the service to suit your specific scraping needs. Using a scraping API like ScraperAPI is often more cost-effective than building and maintaining your own scraping infrastructure. The pricing is based on usage, allowing you to scale up or down as needed. ScraperAPI allows you to scale your scraping efforts by handling millions of requests concurrently. To implement the ScraperAPI proxy into the scraping script you've created so far, there are just a few tweaks you need to make in the axios configuration. First of all, ensure you have created a free ScraperAPI account. That way, you'll have access to your API key, which will be necessary in the following steps. Once you get the API key, use it as a password in the axios proxy configuration from the previous section: // Send a GET request to the webpage with ScraperAPI proxy configuration axios.get(url, { method: 'GET', proxy: { host: 'proxy-server.scraperapi.com', port: 8001, auth: { username: 'scraperapi', password: 'YOUR_API_KEY' // Paste your API key here }, protocol: 'http' } }); And, that's it, all of your requests will be routed through the ScraperAPI proxy servers. But to use the full potential of a scraping service you'll have to configure it using the service's dashboard - ScraperAPI is no different here. It has a user-friendly dashboard where you can set up the web scraping process to best fit your needs. You can enable proxy or async mode, JavaScript rendering, set a region from where the requests will be sent, set your own HTTP headers, timeouts, and much more. And the best thing is that ScraperAPI automatically generates a script containing all of the scraper settings, so you can easily integrate the scraper into your codebase. Best Practices for Using Proxies in Web Scraping Not every proxy provider and its configuration are the same. So, it's important to know what proxy service to choose and how to configure it properly. Let's take a look at some tips and tricks to help you with that! Rotate Proxies Regularly Implement a proxy rotation strategy that changes the IP address after a certain number of requests or at regular intervals. This approach can mimic human browsing behavior, making it less likely for websites to flag your activities as suspicious. Handle Rate Limits Many websites enforce rate limits to prevent excessive scraping. To avoid hitting these limits, you can: Introduce Delays: Add random delays between requests to simulate human behavior. Monitor Response Codes: Track HTTP response codes to detect when you are being rate-limited. If you receive a 429 (Too Many Requests) response, pause your scraping for a while before trying again. Use Quality Proxies Choosing high-quality proxies is crucial for successful web scraping. Quality proxies, especially residential ones, are less likely to be detected and banned by target websites. That's why it's crucial to understand how to use residential proxies for your business, enabling you to find valuable leads while avoiding website bans. Using a mix of high-quality proxies can significantly enhance your chances of successful scraping without interruptions. Quality proxy services often provide a wide range of IP addresses from different regions, enabling you to bypass geo-restrictions and access localized content. Reliable proxy services can offer faster response times and higher uptime, which is essential when scraping large amounts of data. As your scraping needs grow, having access to a robust proxy service allows you to scale your operations without the hassle of managing your own infrastructure. Using a reputable proxy service often comes with customer support and maintenance, which can save you time and effort in troubleshooting issues related to proxies. Handling CAPTCHAs and Other Challenges CAPTCHAs and anti-bot mechanisms are some of the most common obstacles you'll encounter while scraping a web. Websites use CAPTCHAs to prevent automated access by trying to differentiate real humans and automated bots. They're achieving that by prompting the users to solve various kinds of puzzles, identify distorted objects, and so on. That can make it really difficult for you to automatically scrape data. Even though there are many both manual and automated CAPTCHA solvers available online, the best strategy for handling CAPTCHAs is to avoid triggering them in the first place. Typically, they are triggered when non-human behavior is detected. For example, a large amount of traffic, sent from a single IP address, using the same HTTP configuration is definitely a red flag! So, when scraping a website, try mimicking human behavior as much as possible: Add delays between requests and spread them out as much as you can. Regularly rotate between multiple IP addresses using a proxy service. Randomize HTTP headers and user agents. Beyond CAPTCHAs, websites often use sophisticated anti-bot measures to detect and block scraping. Some websites use JavaScript to detect bots. Tools like Puppeteer can simulate a real browser environment, allowing your scraper to execute JavaScript and bypass these challenges. Websites sometimes add hidden form fields or links that only bots will interact with. So, try avoiding clicking on hidden elements or filling out forms with invisible fields. Advanced anti-bot systems go as far as tracking user behavior, such as mouse movements or time spent on a page. Mimicking these behaviors using browser automation tools can help bypass these checks. But the simplest and most efficient way to handle CAPTCHAs and anti-bot measures will definitely be to use a service like ScraperAPI. Sending your scraping requests through ScraperAPI's API will ensure you have the best chance of not being blocked. When the API receives the request, it uses advanced machine learning techniques to determine the best request configuration to prevent triggering CAPTCHAs and other anti-bot measures. Conclusion As websites became more sophisticated in their anti-scraping measures, the use of proxies has become increasingly important in maintaining your scraping project successful. Proxies help you maintain anonymity, prevent IP blocking, and enable you to scale your scraping efforts without getting obstructed by rate limits or geo-restrictions. In this guide, we've explored the fundamentals of web scraping and the crucial role that proxies play in this process. We've discussed how proxies can help maintain anonymity, avoid IP blocks, and distribute requests to mimic natural user behavior. We've also covered the different types of proxies available, each with its own strengths and ideal use cases. We demonstrated how to set up a basic web scraper and integrate proxies into your scraping script. We also explored the benefits of using a dedicated scraping service like ScraperAPI, which can simplify many of the challenges associated with web scraping at scale. In the end, we covered the importance of carefully choosing the right type of proxy, rotating them regularly, handling rate limits, and leveraging scraping services when necessary. That way, you can ensure that your web scraping projects will be efficient, reliable, and sustainable.
-
Using Proxies in Web Scraping – All You Need to Know
by: Leonardo Rodriguez Thu, 12 Sep 2024 13:23:00 GMT Introduction Web scraping typically refers to an automated process of collecting data from websites. On a high level, you're essentially making a bot that visits a website, detects the data you're interested in, and then stores it into some appropriate data structure, so you can easily analyze and access it later. However, if you're concerned about your anonymity on the Internet, you should probably take a little more care when scraping the web. Since your IP address is public, a website owner could track it down and, potentially, block it. So, if you want to stay as anonymous as possible, and prevent being blocked from visiting a certain website, you should consider using proxies when scraping the web. Proxies, also referred to as proxy servers, are specialized servers that enable you not to directly access the websites you're scraping. Rather, you'll be routing your scraping requests via a proxy server. That way, your IP address gets "hidden" behind the IP address of the proxy server you're using. This can help you both stay as anonymous as possible, as well as not being blocked, so you can keep scraping as long as you want. In this comprehensive guide, you'll get a grasp of the basics of web scraping and proxies, you'll see the actual, working example of scraping a website using proxies in Node.js. Afterward, we'll discuss why you might consider using existing scraping solutions (like ScraperAPI) over writing your own web scraper. At the end, we'll give you some tips on how to overcome some of the most common issues you might face when scraping the web. Web Scraping Web scraping is the process of extracting data from websites. It automates what would otherwise be a manual process of gathering information, making the process less time-consuming and prone to errors. That way you can collect a large amount of data quickly and efficiently. Later, you can analyze, store, and use it. The primary reason you might scrape a website is to obtain data that is either unavailable through an existing API or too vast to collect manually. It's particularly useful when you need to extract information from multiple pages or when the data is spread across different websites. There are many real-world applications that utilize the power of web scraping in their business model. The majority of apps helping you track product prices and discounts, find cheapest flights and hotels, or even collect job posting data for job seekers, use the technique of web scraping to gather the data that provides you the value. Web Proxies Imagine you're sending a request to a website. Usually, your request is sent from your machine (with your IP address) to the server that hosts a website you're trying to access. That means that the server "knows" your IP address and it can block you based on your geo-location, the amount of traffic you're sending to the website, and many more factors. But when you send a request through a proxy, it routes the request through another server, hiding your original IP address behind the IP address of the proxy server. This not only helps in maintaining anonymity but also plays a crucial role in avoiding IP blocking, which is a common issue in web scraping. By rotating through different IP addresses, proxies allow you to distribute your requests, making them appear as if they're coming from various users. This reduces the likelihood of getting blocked and increases the chances of successfully scraping the desired data. Types of Proxies Typically, there are four main types of proxy servers - datacenter, residential, rotating, and mobile. Each of them has its pros and cons, and based on that, you'll use them for different purposes and at different costs. Datacenter proxies are the most common and cost-effective proxies, provided by third-party data centers. They offer high speed and reliability but are more easily detectable and can be blocked by websites more frequently. Residential proxies route your requests through real residential IP addresses. Since they appear as ordinary user connections, they are less likely to be blocked but are typically more expensive. Rotating proxies automatically change the IP address after each request or after a set period. This is particularly useful for large-scale scraping projects, as it significantly reduces the chances of being detected and blocked. Mobile proxies use IP addresses associated with mobile devices. They are highly effective for scraping mobile-optimized websites or apps and are less likely to be blocked, but they typically come at a premium cost. ISP proxies are a newer type that combines the reliability of datacenter proxies with the legitimacy of residential IPs. They use IP addresses from Internet Service Providers but are hosted in data centers, offering a balance between performance and detection avoidance. Example Web Scraping Project Let's walk through a practical example of a web scraping project, and demonstrate how to set up a basic scraper, integrate proxies, and use a scraping service like ScraperAPI. Setting up Before you dive into the actual scraping process, it's essential to set up your development environment. For this example, we'll be using Node.js since it's well-suited for web scraping due to its asynchronous capabilities. We'll use Axios for making HTTP requests, and Cheerio to parse and manipulate HTML (that's contained in the response of the HTTP request). First, ensure you have Node.js installed on your system. If you don't have it, download and install it from nodejs.org. Then, create a new directory for your project and initialize it: $ mkdir my-web-scraping-project $ cd my-web-scraping-project $ npm init -y Finally, install Axios and Cheerio since they are necessary for you to implement your web scraping logic: $ npm install axios cheerio Simple Web Scraping Script Now that your environment is set up, let's create a simple web scraping script. We'll scrape a sample website to gather famous quotes and their authors. So, create a JavaScript file named sample-scraper.js and write all the code inside of it. Import the packages you'll need to send HTTP requests and manipulate the HTML: const axios = require('axios'); const cheerio = require('cheerio'); Next, create a wrapper function that will contain all the logic you need to scrape data from a web page. It accepts the URL of a website you want to scrape as an argument and returns all the quotes found on the page: // Function to scrape data from a webpage async function scrapeWebsite(url) { try { // Send a GET request to the webpage const response = await axios.get(url); // Load the HTML into cheerio const $ = cheerio.load(response.data); // Extract all elements with the class 'quote' const quotes = []; $('div.quote').each((index, element) => { // Extracting text from span with class 'text' const quoteText = $(element).find('span.text').text().trim(); // Assuming there's a small tag for the author const author = $(element).find('small.author').text().trim(); quotes.push({ quote: quoteText, author: author }); }); // Output the quotes console.log("Quotes found on the webpage:"); quotes.forEach((quote, index) => { console.log(`${index + 1}: "${quote.quote}" - ${quote.author}`); }); } catch (error) { console.error(`An error occurred: ${error.message}`); } } Note: All the quotes are stored in a separate div element with a class of quote. Each quote has its text and author - text is stored under the span element with the class of text, and the author is within the small element with the class of author. Finally, specify the URL of the website you want to scrape - in this case, https://quotes.toscrape.com, and call the scrapeWebsite() function: // URL of the website you want to scrape const url = 'https://quotes.toscrape.com'; // Call the function to scrape the website scrapeWebsite(url); All that's left for you to do is to run the script from the terminal: $ node sample-scraper.js Integrating Proxies To use a proxy with axios, you specify the proxy settings in the request configuration. The axios.get() method can include the proxy configuration, allowing the request to route through the specified proxy server. The proxy object contains the host, port, and optional authentication details for the proxy: // Send a GET request to the webpage with proxy configuration const response = await axios.get(url, { proxy: { host: proxy.host, port: proxy.port, auth: { username: proxy.username, // Optional: Include if your proxy requires authentication password: proxy.password, // Optional: Include if your proxy requires authentication }, }, }); Note: You need to replace these placeholders with your actual proxy details. Other than this change, the entire script remains the same: // Function to scrape data from a webpage async function scrapeWebsite(url) { try { // Send a GET request to the webpage with proxy configuration const response = await axios.get(url, { proxy: { host: proxy.host, port: proxy.port, auth: { username: proxy.username, // Optional: Include if your proxy requires authentication password: proxy.password, // Optional: Include if your proxy requires authentication }, }, }); // Load the HTML into cheerio const $ = cheerio.load(response.data); // Extract all elements with the class 'quote' const quotes = []; $('div.quote').each((index, element) => { // Extracting text from span with class 'text' const quoteText = $(element).find('span.text').text().trim(); // Assuming there's a small tag for the author const author = $(element).find('small.author').text().trim(); quotes.push({ quote: quoteText, author: author }); }); // Output the quotes console.log("Quotes found on the webpage:"); quotes.forEach((quote, index) => { console.log(`${index + 1}: "${quote.quote}" - ${quote.author}`); }); } catch (error) { console.error(`An error occurred: ${error.message}`); } } Using Headless Browsers for Advanced Scraping For websites with complex JavaScript interactions, you might need to use a headless browser instead of simple HTTP requests. Tools like Puppeteer or Playwright allow you to automate a real browser, execute JavaScript, and interact with dynamic content. Here's a simple example using Puppeteer: const puppeteer = require('puppeteer'); async function scrapeWithPuppeteer(url) { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto(url, { waitUntil: 'networkidle2' }); // Extract data using page.evaluate const quotes = await page.evaluate(() => { const results = []; document.querySelectorAll('div.quote').forEach(quote => { results.push({ text: quote.querySelector('span.text').textContent, author: quote.querySelector('small.author').textContent }); }); return results; }); console.log(quotes); await browser.close(); } Headless browsers can also be configured to use proxies, making them powerful tools for scraping complex websites while maintaining anonymity. Integrating a Scraping Service Using a scraping service like ScraperAPI offers several advantages over manual web scraping since it's designed to tackle all of the major problems you might face when scraping websites: Automatically handles common web scraping obstacles such as CAPTCHAs, JavaScript rendering, and IP blocks. Automatically handles proxies - proxy configuration, rotation, and much more. Instead of building your own scraping infrastructure, you can leverage ScraperAPI's pre-built solutions. This saves significant development time and resources that can be better spent on analyzing the scraped data. ScraperAPI offers various customization options such as geo-location targeting, custom headers, and asynchronous scraping. You can personalize the service to suit your specific scraping needs. Using a scraping API like ScraperAPI is often more cost-effective than building and maintaining your own scraping infrastructure. The pricing is based on usage, allowing you to scale up or down as needed. ScraperAPI allows you to scale your scraping efforts by handling millions of requests concurrently. To implement the ScraperAPI proxy into the scraping script you've created so far, there are just a few tweaks you need to make in the axios configuration. First of all, ensure you have created a free ScraperAPI account. That way, you'll have access to your API key, which will be necessary in the following steps. Once you get the API key, use it as a password in the axios proxy configuration from the previous section: // Send a GET request to the webpage with ScraperAPI proxy configuration axios.get(url, { method: 'GET', proxy: { host: 'proxy-server.scraperapi.com', port: 8001, auth: { username: 'scraperapi', password: 'YOUR_API_KEY' // Paste your API key here }, protocol: 'http' } }); And, that's it, all of your requests will be routed through the ScraperAPI proxy servers. But to use the full potential of a scraping service you'll have to configure it using the service's dashboard - ScraperAPI is no different here. It has a user-friendly dashboard where you can set up the web scraping process to best fit your needs. You can enable proxy or async mode, JavaScript rendering, set a region from where the requests will be sent, set your own HTTP headers, timeouts, and much more. And the best thing is that ScraperAPI automatically generates a script containing all of the scraper settings, so you can easily integrate the scraper into your codebase. Best Practices for Using Proxies in Web Scraping Not every proxy provider and its configuration are the same. So, it's important to know what proxy service to choose and how to configure it properly. Let's take a look at some tips and tricks to help you with that! Rotate Proxies Regularly Implement a proxy rotation strategy that changes the IP address after a certain number of requests or at regular intervals. This approach can mimic human browsing behavior, making it less likely for websites to flag your activities as suspicious. Handle Rate Limits Many websites enforce rate limits to prevent excessive scraping. To avoid hitting these limits, you can: Introduce Delays: Add random delays between requests to simulate human behavior. Monitor Response Codes: Track HTTP response codes to detect when you are being rate-limited. If you receive a 429 (Too Many Requests) response, pause your scraping for a while before trying again. Implement Exponential Backoff: Rather than using fixed delays, implement exponential backoff that increases wait time after each failed request, which is more effective at handling rate limits. Use Quality Proxies Choosing high-quality proxies is crucial for successful web scraping. Quality proxies, especially residential ones, are less likely to be detected and banned by target websites. That's why it's crucial to understand how to use residential proxies for your business, enabling you to find valuable leads while avoiding website bans. Using a mix of high-quality proxies can significantly enhance your chances of successful scraping without interruptions. Quality proxy services often provide a wide range of IP addresses from different regions, enabling you to bypass geo-restrictions and access localized content. A proxy extension for Chrome also helps manage these IPs easily through your browser, offering a seamless way to switch locations on the fly. Reliable proxy services can offer faster response times and higher uptime, which is essential when scraping large amounts of data. As your scraping needs grow, having access to a robust proxy service allows you to scale your operations without the hassle of managing your own infrastructure. Using a reputable proxy service often comes with customer support and maintenance, which can save you time and effort in troubleshooting issues related to proxies. Handling CAPTCHAs and Other Challenges CAPTCHAs and anti-bot mechanisms are some of the most common obstacles you'll encounter while scraping a web. Websites use CAPTCHAs to prevent automated access by trying to differentiate real humans and automated bots. They're achieving that by prompting the users to solve various kinds of puzzles, identify distorted objects, and so on. That can make it really difficult for you to automatically scrape data. Even though there are many both manual and automated CAPTCHA solvers available online, the best strategy for handling CAPTCHAs is to avoid triggering them in the first place. Typically, they are triggered when non-human behavior is detected. For example, a large amount of traffic, sent from a single IP address, using the same HTTP configuration is definitely a red flag! So, when scraping a website, try mimicking human behavior as much as possible: Add delays between requests and spread them out as much as you can. Regularly rotate between multiple IP addresses using a proxy service. Randomize HTTP headers and user agents. Maintain and use cookies appropriately, as many websites track user sessions. Consider implementing browser fingerprint randomization to avoid tracking. Beyond CAPTCHAs, websites often use sophisticated anti-bot measures to detect and block scraping. Some websites use JavaScript to detect bots. Tools like Puppeteer can simulate a real browser environment, allowing your scraper to execute JavaScript and bypass these challenges. Websites sometimes add hidden form fields or links that only bots will interact with. So, try avoiding clicking on hidden elements or filling out forms with invisible fields. Advanced anti-bot systems go as far as tracking user behavior, such as mouse movements or time spent on a page. Mimicking these behaviors using browser automation tools can help bypass these checks. But the simplest and most efficient way to handle CAPTCHAs and anti-bot measures will definitely be to use a service like ScraperAPI. Sending your scraping requests through ScraperAPI's API will ensure you have the best chance of not being blocked. When the API receives the request, it uses advanced machine learning techniques to determine the best request configuration to prevent triggering CAPTCHAs and other anti-bot measures. Conclusion As websites became more sophisticated in their anti-scraping measures, the use of proxies has become increasingly important in maintaining your scraping project successful. Proxies help you maintain anonymity, prevent IP blocking, and enable you to scale your scraping efforts without getting obstructed by rate limits or geo-restrictions. In this guide, we've explored the fundamentals of web scraping and the crucial role that proxies play in this process. We've discussed how proxies can help maintain anonymity, avoid IP blocks, and distribute requests to mimic natural user behavior. We've also covered the different types of proxies available, each with its own strengths and ideal use cases. We demonstrated how to set up a basic web scraper and integrate proxies into your scraping script. We also explored the benefits of using a dedicated scraping service like ScraperAPI, which can simplify many of the challenges associated with web scraping at scale. In the end, we covered the importance of carefully choosing the right type of proxy, rotating them regularly, handling rate limits, and leveraging scraping services when necessary. That way, you can ensure that your web scraping projects will be efficient, reliable, and sustainable. Remember that while web scraping can be a powerful data collection technique, it should always be done responsibly and ethically, with respect for website terms of service and legal considerations.
-
Using Proxies in Web Scraping – All You Need to Know
by: Leonardo Rodriguez Thu, 12 Sep 2024 13:23:00 GMT Introduction Web scraping typically refers to an automated process of collecting data from websites. On a high level, you're essentially making a bot that visits a website, detects the data you're interested in, and then stores it into some appropriate data structure, so you can easily analyze and access it later. However, if you're concerned about your anonymity on the Internet, you should probably take a little more care when scraping the web. Since your IP address is public, a website owner could track it down and, potentially, block it. So, if you want to stay as anonymous as possible, and prevent being blocked from visiting a certain website, you should consider using proxies when scraping the web. Proxies, also referred to as proxy servers, are specialized servers that enable you not to directly access the websites you're scraping. Rather, you'll be routing your scraping requests via a proxy server. That way, your IP address gets "hidden" behind the IP address of the proxy server you're using. This can help you both stay as anonymous as possible, as well as not being blocked, so you can keep scraping as long as you want. In this comprehensive guide, you'll get a grasp of the basics of web scraping and proxies, you'll see the actual, working example of scraping a website using proxies in Node.js. Afterward, we'll discuss why you might consider using existing scraping solutions (like ScraperAPI) over writing your own web scraper. At the end, we'll give you some tips on how to overcome some of the most common issues you might face when scraping the web. Web Scraping Web scraping is the process of extracting data from websites. It automates what would otherwise be a manual process of gathering information, making the process less time-consuming and prone to errors. That way you can collect a large amount of data quickly and efficiently. Later, you can analyze, store, and use it. The primary reason you might scrape a website is to obtain data that is either unavailable through an existing API or too vast to collect manually. It's particularly useful when you need to extract information from multiple pages or when the data is spread across different websites. There are many real-world applications that utilize the power of web scraping in their business model. The majority of apps helping you track product prices and discounts, find cheapest flights and hotels, or even collect job posting data for job seekers, use the technique of web scraping to gather the data that provides you the value. Web Proxies Imagine you're sending a request to a website. Usually, your request is sent from your machine (with your IP address) to the server that hosts a website you're trying to access. That means that the server "knows" your IP address and it can block you based on your geo-location, the amount of traffic you're sending to the website, and many more factors. But when you send a request through a proxy, it routes the request through another server, hiding your original IP address behind the IP address of the proxy server. This not only helps in maintaining anonymity but also plays a crucial role in avoiding IP blocking, which is a common issue in web scraping. By rotating through different IP addresses, proxies allow you to distribute your requests, making them appear as if they're coming from various users. This reduces the likelihood of getting blocked and increases the chances of successfully scraping the desired data. Types of Proxies Typically, there are four main types of proxy servers - datacenter, residential, rotating, and mobile. Each of them has its pros and cons, and based on that, you'll use them for different purposes and at different costs. Datacenter proxies are the most common and cost-effective proxies, provided by third-party data centers. They offer high speed and reliability but are more easily detectable and can be blocked by websites more frequently. Residential proxies route your requests through real residential IP addresses. Since they appear as ordinary user connections, they are less likely to be blocked but are typically more expensive. Rotating proxies automatically change the IP address after each request or after a set period. This is particularly useful for large-scale scraping projects, as it significantly reduces the chances of being detected and blocked. Mobile proxies use IP addresses associated with mobile devices. They are highly effective for scraping mobile-optimized websites or apps and are less likely to be blocked, but they typically come at a premium cost. ISP proxies are a newer type that combines the reliability of datacenter proxies with the legitimacy of residential IPs. They use IP addresses from Internet Service Providers but are hosted in data centers, offering a balance between performance and detection avoidance. Example Web Scraping Project Let's walk through a practical example of a web scraping project, and demonstrate how to set up a basic scraper, integrate proxies, and use a scraping service like ScraperAPI. Setting up Before you dive into the actual scraping process, it's essential to set up your development environment. For this example, we'll be using Node.js since it's well-suited for web scraping due to its asynchronous capabilities. We'll use Axios for making HTTP requests, and Cheerio to parse and manipulate HTML (that's contained in the response of the HTTP request). First, ensure you have Node.js installed on your system. If you don't have it, download and install it from nodejs.org. Then, create a new directory for your project and initialize it: $ mkdir my-web-scraping-project $ cd my-web-scraping-project $ npm init -y Finally, install Axios and Cheerio since they are necessary for you to implement your web scraping logic: $ npm install axios cheerio Simple Web Scraping Script Now that your environment is set up, let's create a simple web scraping script. We'll scrape a sample website to gather famous quotes and their authors. So, create a JavaScript file named sample-scraper.js and write all the code inside of it. Import the packages you'll need to send HTTP requests and manipulate the HTML: const axios = require('axios'); const cheerio = require('cheerio'); Next, create a wrapper function that will contain all the logic you need to scrape data from a web page. It accepts the URL of a website you want to scrape as an argument and returns all the quotes found on the page: // Function to scrape data from a webpage async function scrapeWebsite(url) { try { // Send a GET request to the webpage const response = await axios.get(url); // Load the HTML into cheerio const $ = cheerio.load(response.data); // Extract all elements with the class 'quote' const quotes = []; $('div.quote').each((index, element) => { // Extracting text from span with class 'text' const quoteText = $(element).find('span.text').text().trim(); // Assuming there's a small tag for the author const author = $(element).find('small.author').text().trim(); quotes.push({ quote: quoteText, author: author }); }); // Output the quotes console.log("Quotes found on the webpage:"); quotes.forEach((quote, index) => { console.log(`${index + 1}: "${quote.quote}" - ${quote.author}`); }); } catch (error) { console.error(`An error occurred: ${error.message}`); } } Note: All the quotes are stored in a separate div element with a class of quote. Each quote has its text and author - text is stored under the span element with the class of text, and the author is within the small element with the class of author. Finally, specify the URL of the website you want to scrape - in this case, https://quotes.toscrape.com, and call the scrapeWebsite() function: // URL of the website you want to scrape const url = 'https://quotes.toscrape.com'; // Call the function to scrape the website scrapeWebsite(url); All that's left for you to do is to run the script from the terminal: $ node sample-scraper.js Integrating Proxies To use a proxy with axios, you specify the proxy settings in the request configuration. The axios.get() method can include the proxy configuration, allowing the request to route through the specified proxy server. The proxy object contains the host, port, and optional authentication details for the proxy: // Send a GET request to the webpage with proxy configuration const response = await axios.get(url, { proxy: { host: proxy.host, port: proxy.port, auth: { username: proxy.username, // Optional: Include if your proxy requires authentication password: proxy.password, // Optional: Include if your proxy requires authentication }, }, }); Note: You need to replace these placeholders with your actual proxy details. Other than this change, the entire script remains the same: // Function to scrape data from a webpage async function scrapeWebsite(url) { try { // Send a GET request to the webpage with proxy configuration const response = await axios.get(url, { proxy: { host: proxy.host, port: proxy.port, auth: { username: proxy.username, // Optional: Include if your proxy requires authentication password: proxy.password, // Optional: Include if your proxy requires authentication }, }, }); // Load the HTML into cheerio const $ = cheerio.load(response.data); // Extract all elements with the class 'quote' const quotes = []; $('div.quote').each((index, element) => { // Extracting text from span with class 'text' const quoteText = $(element).find('span.text').text().trim(); // Assuming there's a small tag for the author const author = $(element).find('small.author').text().trim(); quotes.push({ quote: quoteText, author: author }); }); // Output the quotes console.log("Quotes found on the webpage:"); quotes.forEach((quote, index) => { console.log(`${index + 1}: "${quote.quote}" - ${quote.author}`); }); } catch (error) { console.error(`An error occurred: ${error.message}`); } } Using Headless Browsers for Advanced Scraping For websites with complex JavaScript interactions, you might need to use a headless browser instead of simple HTTP requests. Tools like Puppeteer or Playwright allow you to automate a real browser, execute JavaScript, and interact with dynamic content. Here's a simple example using Puppeteer: const puppeteer = require('puppeteer'); async function scrapeWithPuppeteer(url) { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto(url, { waitUntil: 'networkidle2' }); // Extract data using page.evaluate const quotes = await page.evaluate(() => { const results = []; document.querySelectorAll('div.quote').forEach(quote => { results.push({ text: quote.querySelector('span.text').textContent, author: quote.querySelector('small.author').textContent }); }); return results; }); console.log(quotes); await browser.close(); } Headless browsers can also be configured to use proxies, making them powerful tools for scraping complex websites while maintaining anonymity. Integrating a Scraping Service Using a scraping service like ScraperAPI offers several advantages over manual web scraping since it's designed to tackle all of the major problems you might face when scraping websites: Automatically handles common web scraping obstacles such as CAPTCHAs, JavaScript rendering, and IP blocks. Automatically handles proxies - proxy configuration, rotation, and much more. Instead of building your own scraping infrastructure, you can leverage ScraperAPI's pre-built solutions. This saves significant development time and resources that can be better spent on analyzing the scraped data. ScraperAPI offers various customization options such as geo-location targeting, custom headers, and asynchronous scraping. You can personalize the service to suit your specific scraping needs. Using a scraping API like ScraperAPI is often more cost-effective than building and maintaining your own scraping infrastructure. The pricing is based on usage, allowing you to scale up or down as needed. ScraperAPI allows you to scale your scraping efforts by handling millions of requests concurrently. To implement the ScraperAPI proxy into the scraping script you've created so far, there are just a few tweaks you need to make in the axios configuration. First of all, ensure you have created a free ScraperAPI account. That way, you'll have access to your API key, which will be necessary in the following steps. Once you get the API key, use it as a password in the axios proxy configuration from the previous section: // Send a GET request to the webpage with ScraperAPI proxy configuration axios.get(url, { method: 'GET', proxy: { host: 'proxy-server.scraperapi.com', port: 8001, auth: { username: 'scraperapi', password: 'YOUR_API_KEY' // Paste your API key here }, protocol: 'http' } }); And, that's it, all of your requests will be routed through the ScraperAPI proxy servers. But to use the full potential of a scraping service you'll have to configure it using the service's dashboard - ScraperAPI is no different here. It has a user-friendly dashboard where you can set up the web scraping process to best fit your needs. You can enable proxy or async mode, JavaScript rendering, set a region from where the requests will be sent, set your own HTTP headers, timeouts, and much more. And the best thing is that ScraperAPI automatically generates a script containing all of the scraper settings, so you can easily integrate the scraper into your codebase. Best Practices for Using Proxies in Web Scraping Not every proxy provider and its configuration are the same. So, it's important to know what proxy service to choose and how to configure it properly. Let's take a look at some tips and tricks to help you with that! Rotate Proxies Regularly Implement a proxy rotation strategy that changes the IP address after a certain number of requests or at regular intervals. This approach can mimic human browsing behavior, making it less likely for websites to flag your activities as suspicious. Handle Rate Limits Many websites enforce rate limits to prevent excessive scraping. To avoid hitting these limits, you can: Introduce Delays: Add random delays between requests to simulate human behavior. Monitor Response Codes: Track HTTP response codes to detect when you are being rate-limited. If you receive a 429 (Too Many Requests) response, pause your scraping for a while before trying again. Implement Exponential Backoff: Rather than using fixed delays, implement exponential backoff that increases wait time after each failed request, which is more effective at handling rate limits. Use Quality Proxies Choosing high-quality proxies is crucial for successful web scraping. Quality proxies, especially residential ones, are less likely to be detected and banned by target websites. That's why it's crucial to understand how to use residential proxies for your business, enabling you to find valuable leads while avoiding website bans. Using a mix of high-quality proxies can significantly enhance your chances of successful scraping without interruptions. Quality proxy services often provide a wide range of IP addresses from different regions, enabling you to bypass geo-restrictions and access localized content. A proxy extension for Chrome also helps manage these IPs easily through your browser, offering a seamless way to switch locations on the fly. Reliable proxy services can offer faster response times and higher uptime, which is essential when scraping large amounts of data. However, avoid using a proxy that is publicly accessible without authentication, commonly referred to as an open proxy. These are often slow, easily detected, banned, and may pose security threats. They can originate from hacked devices or misconfigured servers, making them unreliable and potentially dangerous. As your scraping needs grow, having access to a robust proxy service allows you to scale your operations without the hassle of managing your own infrastructure. Using a reputable proxy service often comes with customer support and maintenance, which can save you time and effort in troubleshooting issues related to proxies. Handling CAPTCHAs and Other Challenges CAPTCHAs and anti-bot mechanisms are some of the most common obstacles you'll encounter while scraping a web. Websites use CAPTCHAs to prevent automated access by trying to differentiate real humans and automated bots. They're achieving that by prompting the users to solve various kinds of puzzles, identify distorted objects, and so on. That can make it really difficult for you to automatically scrape data. Even though there are many both manual and automated CAPTCHA solvers available online, the best strategy for handling CAPTCHAs is to avoid triggering them in the first place. Typically, they are triggered when non-human behavior is detected. For example, a large amount of traffic, sent from a single IP address, using the same HTTP configuration is definitely a red flag! So, when scraping a website, try mimicking human behavior as much as possible: Add delays between requests and spread them out as much as you can. Regularly rotate between multiple IP addresses using a proxy service. Randomize HTTP headers and user agents. Maintain and use cookies appropriately, as many websites track user sessions. Consider implementing browser fingerprint randomization to avoid tracking. Beyond CAPTCHAs, websites often use sophisticated anti-bot measures to detect and block scraping. Some websites use JavaScript to detect bots. Tools like Puppeteer can simulate a real browser environment, allowing your scraper to execute JavaScript and bypass these challenges. Websites sometimes add hidden form fields or links that only bots will interact with. So, try avoiding clicking on hidden elements or filling out forms with invisible fields. Advanced anti-bot systems go as far as tracking user behavior, such as mouse movements or time spent on a page. Mimicking these behaviors using browser automation tools can help bypass these checks. But the simplest and most efficient way to handle CAPTCHAs and anti-bot measures will definitely be to use a service like ScraperAPI. Sending your scraping requests through ScraperAPI's API will ensure you have the best chance of not being blocked. When the API receives the request, it uses advanced machine learning techniques to determine the best request configuration to prevent triggering CAPTCHAs and other anti-bot measures. Conclusion As websites became more sophisticated in their anti-scraping measures, the use of proxies has become increasingly important in maintaining your scraping project successful. Proxies help you maintain anonymity, prevent IP blocking, and enable you to scale your scraping efforts without getting obstructed by rate limits or geo-restrictions. In this guide, we've explored the fundamentals of web scraping and the crucial role that proxies play in this process. We've discussed how proxies can help maintain anonymity, avoid IP blocks, and distribute requests to mimic natural user behavior. We've also covered the different types of proxies available, each with its own strengths and ideal use cases. We demonstrated how to set up a basic web scraper and integrate proxies into your scraping script. We also explored the benefits of using a dedicated scraping service like ScraperAPI, which can simplify many of the challenges associated with web scraping at scale. In the end, we covered the importance of carefully choosing the right type of proxy, rotating them regularly, handling rate limits, and leveraging scraping services when necessary. That way, you can ensure that your web scraping projects will be efficient, reliable, and sustainable. Remember that while web scraping can be a powerful data collection technique, it should always be done responsibly and ethically, with respect for website terms of service and legal considerations.
-
Building Community at the Workplace
by: Always Sia Strike 2024-07-31T23:29:54-07:00 My the year is flying by. I haven’t written in a while - not for a lack of thoughts, but because time, life, probably could be time managing better but oh :whale:. We’re back though - so let’s talk work community. During this year’s Black in Data Week, there was a question during my session about how to get to know people organically and ask questions without fear when you start a new job. After sharing what has worked for me, the lady with the question came back to me with positive feedback that all the ideas were helpful. I didn’t think anything of it until Wellington, one of my friendlies from the app whose mama named it Twitter, twote this and had me thinking: He’s so right. No one is going to care about your career more than you do. However, one of the people who can make the effort to drive your development is your manager. Wellington and I had an additional exchange in which he echoed how important community is. This brought me back to June and that lady’s question during BID week - so I thought to share, in a less ephemeral format, what building a community at work looks like. About Chasing Management Before I share some tips, one sword I always fall on is - chase great management. If you can afford to extend a job search because you think you could get a better manager than the one who is offering you a job, do it. Managers are like a great orchestra during a fancy event. You don’t think about the background music when it’s playing and you’re eating your food (this is what I imagine from all those movies :joy:). But you will KNOW if it’s bad because something will sound off and irk your ears. When you are flying high and your manager is unblocking things, providing you chances to contribute, and running a smooth operation, you hardly think of them when you wake up in the morning - you just do your job. But if they’re not good at what they do, you could wake up in the morning thinking “ugh - I gotta go work with/for this person?”. It changes the temperature in the room. So if you can afford an extra two weeks on a job search to ask questions and get the best available manager on the market, consider investing in your mentals for the long term :heavy_exclamation_mark: I’m sure you’re like yeah great, Sia - how do I do that? Well not to toot toot, but here are some questions I like asking to learn a bit more about my potential new culture. Additionally, listen to one of my favorite humans and leaders, Taylor Poindexter, in this episode of the Code Newbie podcast talking about creating psychological safety at work (shout out to Saron and the team!). Taylor has been one of my champions at work and such a great manager for her team - I’m always a little envious I’m not on it :pleading_face: but I digress. Keep winning, my girl! Additionally, I’ll start here a list of the best leaders I know - either from personal experience working with and/or for them, interviewing to work on their teams, or from second hand knowledge of someone (I trust) else’s 1st hand experience. As of this writing, they will be listed with a workplace they’re currently in and only if they publicly share it on the internet. Taylor Poindexter, Engineering Manager II @ Spotify (Web, Full Stack Engineering) Angie Jones, Global VP of Developer Relations @ TBD/Block Kamana Sharma (Full Stack, Web, and Data Engineering) Nivia Henry, Director of Engineering Bryan Bischof (Data/ML/AI) Jasmine Vasandani (Data Science, Data Products) Dee Wolter (Accounting, Tax) Divya Narayanan (Engineering, ML) Dr. Russell Pierce (Data/ML/Computer Vision) Marlena Alhayani (Engineering) Andrew Cheong (Backend Engineering) - I’m still trying to convince him he’ll be the best leader ever, still an IC :joy: This is off the top of my head at 1:12am while watching a badminton match between Spain and US women round of 16, so I may have forgotten someone, my bad - will keep revisiting and updating as I remember and learn about more humans I aspire to work with. Now the kinda maybe not so good news - you cannot control your manager circumstances all the time. Reorgs, layoffs, people advancing and leaving companies happen. And if you’ve had the privilege of working with great managers, they will leave because they are top of the line so everyone wants to work with them. That’s where community matters. You can’t put all your career development eggs in one managerial basket. Noooooow let’s talk about how you can do that!! (I know, loooong tangent, but we’re getting there). Building Community at Work (Finally :roll_eyes:) Let’s start with the (should be but not always) obvious here - you are building genuine relationships. They therefore can’t be transactional. This is about creating a sustainable community that carries the load together, and not giving you tips on how to be the tick that takes from everyone without giving back. With that,… Find onboarding buddies There are people you started working on the same day with. They will likely have the most in common with you from a workplace perspective. If you happen to run into one of these folks, check in about what’s working and share tips that may have worked for you. When I first started working at my current job, I e-met Andy - a senior backend engineer. We chatted randomly in Slack the first few weeks while working on onboarding projects and found out that we would be working in sister orgs. Whenever I had questions, I’d ask him what he’s learning and every so often we’d “run into each other” in our team work slacks. Sometimes Andy would even help review PRs for me because I had to write Java, and ya girl does not live there. How sweet is that? Medium story short, that’s my work friend he a real good eng … you know the rest! Ask all the questions!! Remember that lady I told you about in the beginning? She had said (paraphrasing) Sia - I just got hired, how do I not look dumb asking questions and they just hired me? My response was they hired you for your skill on the market, not your knowledge of the company. You are expected to have a learning curve so take advantage of that to meet people by asking questions. If you have a Slack channel, activate those hidden helpers - they exist. You may know a lot about the coolest framework, but what about the review and releases process? What about how request for changes are handled? Maybe you see some code that seems off to you - it could be that it’s an intentional patch. The only way to know these idiosyncracies is to ask. I promise you someone else is also wondering, and by asking, you are Making it less scary for others to ask Increasing the knowledge sharing culture at your org/team/company Learning faster than you would if you tried to be your own hero (there’s a place and time, don’t overdo it when you’re new and waste time recreating a wheel) One of the best pieces of feedback I ever received at a workplace was that my curiosity and pace of learning is so fast. And to keep asking the questions. I’m summarizing here but that note was detailed and written so beautifully, it made me cry :sob:. It came from one of my favorite people who I have a 1:1 with in a few hours and who started out as … my first interviewer! Who interviewed you? Remember Andrew from my list of favorite leaders above? That’s who wrote that tearjerking note (one of many by the way). He was the person who gave my first technical screen when I was applying for my current job. After I got hired, I reached out and thanked him and hoped we would cross paths. And from above, you know now that he is also one of the best Slack helpers ever. Whenever I ask a question and see “Andrew is typing…”, I grab some tea and a snack because I’m about to learn something soooo well, the experience needs to be savoured. That first note to say, hey thank you for a great interview experience I made it has led to one of the best work sibling I’ve ever had. I also did the same with the recruiter and the engineering manager who did my behavioral interview. I should note - at my job, you don’t necessarily get interviewed with the teammates you’ll potentially work with. None of these folks have been my actual teammates, but we check in from time to time, and look out for each other. The manager was a machine learning engineering manager, Andrew is a backend person, I’m a data engineer - none of that matters. Community is multi-dimensional :heart: I got all my sister teams and me When you’re learning and onboarding, you get to meet your teammates and learn about your domain. It is likely your team is not working in a vacuum. Your customers are either other teams, or customers - which means you have to verify things with other teams to serve external customers. That’s a great way to form relationships. You are going to be seeing these folks a lot when you work together, you may as well set up a 1:1 for 20 minutes to meet and greet. It may not go anywhere in the beginning, but as you work on different projects, your conversations add up, you learn about each other’s ways of working and values (subconciously sometimes), and trade stories. It all adds up - that’s :sparkles: community :sparkles: Be nosy, Rosie Ok this last one is for the brave. As a hermit, I’m braver in writing vs in person so I use that to my advantage. This is an extension of asking all the questions beyond onboarding questions. You ever run into a document or see a presentation shared in a meeting, and you want to know more? You could reach out to the presenters and ask follow up questions, check in with your teammates about how said thing impacts/touches your team, or just learn something new that increases your t-shaped (breadth of) knowledge. Over time, this practice has a two-fold benefit. You get more context beyond your team which makes you more valuable in the long run because you end up living at the intersection of things and understand how everyone is connected. For me, whenever I’m in a meeting and someone says “our team is working on changing system X to start doing Y”, I’m able to see how that change affects multiple systems and teams, if there are folks who are not aware of the change who should know about it to plan ahead, and also how it changes planning for your team. This leads us back to our community thing because… You inadvertently build community by becoming someone your teammates and other teams (even leaders!) trust to translate information between squads or assist in unblocking inter-team or inter-org efforts. This is how I’ve been able to keep people in mind when thinking of projects and in turn they do the same. It also helped me get promoted as far as I’m concerned (earlier this year). You see, reader, I switched managers and teams a few months before performance review season. And the people in the room deciding on promotions were never my managers. They were all folks from other teams that I’d worked on projects with and because of the curiosity of understanding our intersections and being able to contribute to connected work, they knew enough about me to put their names on paper and say get that girl a bonus, promo, and title upgrade. I appreciate them dearly :heart: So what did we learn? All these things boil down to Finding your tribe from common contexts Leading with gratitude and having a teamwork mindset Staying curious a.k.a always be learning Play the long game and don’t be transactional in your interactions. Works every time. So as we now watch the 1500M men’s qualifiers of track and field at 3:13am, I hope you keep driving the car on your career and finding your tribe wherever it is you land. And congratulations to all your favorite Olympians!!
-
How to Install Steam on Ubuntu 24.04
Even on Linux, you can enjoy gaming and interact with fellow gamers via Steam. As a Linux gamer, Steam is a handy game distribution platform that allows you to install different games, including purchased ones. Moreover, with Steam, you can connect with other games and play multiplayer titles.Steam is a cross-platform game distribution platform that offers games the option of purchasing and installing games on any device through a Steam account. This post gives different options for installing Steam on Ubuntu 24.04. Different Methods of Installing Steam on Ubuntu 24.04No matter the Ubuntu version that you use, there are three easy ways of installing Steam. For our guide, we are working on Ubuntu 24.04, and we’ve detailed the steps to follow for each method. Take a look! Method 1: Install Steam via Ubuntu RepositoryOn your Ubuntu, Steam can be installed through the multiverse repository by following the steps below. Step 1: Add the Multiverse Repository The multiverse repository isn’t added on Ubuntu by default but executing the following command will add it. $ sudo add-apt-multiverse Step 2: Refresh the Package Index After adding the new repository, we must refresh the package index before we can install Steam. $ sudo apt update Step 3: Install Steam Lastly, install Steam from the repository by running the APT command below. $ sudo apt install steam Method 2: Install Steam as a SnapSteam is available as a snap package and you can install it by accessing the Ubuntu 24.04 App Center or by installing via command-line. To install it via GUI, use the below steps. Step 1: Search for Steam on App CenterOn your Ubuntu, open the App Center and search for “Steam” in the search box. Different results will open and the first one is what we want to install. Step 2: Install SteamOn the search results page, click on Steam to open a window showing a summary of its information. Locate the green Install button and click on it. You will get prompted to enter your password before the installation can begin. Once you do so, a window showing the progress bar of the installation process will appear. Once the process completes, you will have Steam installed and ready for use on your Ubuntu 24.04. Alternatively, if you prefer using the command-line option to install Steam from App Center, you can do so using the snap command. Specify the package when running your command as shown below. $ sudo snap install steam On the output, the download and installation progress will be shown and once it completes, Steam will be available from your applications. You can open it and set it up for your gaming. Method 3: Download and Install the Steam PackageSteam releases a .deb package for Linux and by downloading it, you can use it to install Steam. Unlike the previous methods, this method requires downloading the Steam package from its website using command line utilities such as wget or curl. Step 1: Install wgetTo download the Steam .deb package, we will use wget. You can skip this step if you already have it installed. Otherwise, execute the below command. $ sudo apt install wget Step 2: Download the Steam PackageWith wget installed, run the following command to download the Steam .deb package. $ wget https://steamcdn-a.akamaihd.net/client/installer/steam.deb Step 3: Install SteamTo install the .deb package, we will use the dpkg command below. $ sudo dpkg -i steam.deb Once Steam completes installing, verify that you can access it by searching for it on your Ubuntu 24.04. With that, you now have Steam installed on Ubuntu. ConclusionSteam is handy tool for any gamer and its cross-platform nature means you can install it on Ubuntu 24.04. we’ve given three installation methods you can use depending on your preference. Once you’ve installed Steam, configure it and create your account to start utilizing it. Happy gaming!
-
How to Configure Proxmox VE 8 for PCI/PCIE and NVIDIA GPU Passthrough
Proxmox VE 8 is one of the best open-source and free Type-I hypervisors out there for running QEMU/KVM virtual machines (VMs) and LXC containers. It has a nice web management interface and a lot of features. One of the most amazing features of Proxmox VE is that it can passthrough PCI/PCIE devices (i.e. an NVIDIA GPU) from your computer to Proxmox VE virtual machines (VMs). The PCI/PCIE passthrough is getting better and better with newer Proxmox VE releases. At the time of this writing, the latest version of Proxmox VE is Proxmox VE v8.1 and it has great PCI/PCIE passthrough support. In this article, I am going to show you how to configure your Proxmox VE 8 host/server for PCI/PCIE passthrough and configure your NVIDIA GPU for PCIE passthrough on Proxmox VE 8 virtual machines (VMs). Table of Contents Enabling Virtualization from the BIOS/UEFI Firmware of Your Motherboard Installing Proxmox VE 8 Enabling Proxmox VE 8 Community Repositories Installing Updates on Proxmox VE 8 Enabling IOMMU from the BIOS/UEFI Firmware of Your Motherboard Enabling IOMMU on Proxmox VE 8 Verifying if IOMMU is Enabled on Proxmox VE 8 Loading VFIO Kernel Modules on Proxmox VE 8 Listing IOMMU Groups on Proxmox VE 8 Checking if Your NVIDIA GPU Can Be Passthrough to a Proxmox VE 8 Virtual Machine (VM) Checking for the Kernel Modules to Blacklist for PCI/PCIE Passthrough on Proxmox VE 8 Blacklisting Required Kernel Modules for PCI/PCIE Passthrough on Proxmox VE 8 Configuring Your NVIDIA GPU to Use the VFIO Kernel Module on Proxmox VE 8 Passthrough the NVIDIA GPU to a Proxmox VE 8 Virtual Machine (VM) Still Having Problems with PCI/PCIE Passthrough on Proxmox VE 8 Virtual Machines (VMs)? Conclusion References Enabling Virtualization from the BIOS/UEFI Firmware of Your Motherboard Before you can install Proxmox VE 8 on your computer/server, you must enable the hardware virtualization feature of your processor from the BIOS/UEFI firmware of your motherboard. The process is different for different motherboards. So, if you need any assistance in enabling hardware virtualization on your motherboard, read this article. Installing Proxmox VE 8 Proxmox VE 8 is free to download, install, and use. Before you get started, make sure to install Proxmox VE 8 on your computer. If you need any assistance on that, read this article. Enabling Proxmox VE 8 Community Repositories Once you have Proxmox VE 8 installed on your computer/server, make sure to enable the Proxmox VE 8 community package repositories. By default, Proxmox VE 8 enterprise package repositories are enabled and you won’t be able to get/install updates and bug fixes from the enterprise repositories unless you have bought Proxmox VE 8 enterprise licenses. So, if you want to use Proxmox VE 8 for free, make sure to enable the Proxmox VE 8 community package repositories to get the latest updates and bug fixes from Proxmox for free. Installing Updates on Proxmox VE 8 Once you’ve enabled the Proxmox VE 8 community package repositories, make sure to install all the available updates on your Proxmox VE 8 server. Enabling IOMMU from the BIOS/UEFI Firmware of Your Motherboard The IOMMU configuration is found in different locations in different motherboards. To enable IOMMU on your motherboard, read this article. Enabling IOMMU on Proxmox VE 8 Once the IOMMU is enabled on the hardware side, you also need to enable IOMMU from the software side (from Proxmox VE 8). To enable IOMMU from Proxmox VE 8, you have the add the following kernel boot parameters: Processor Vendor Kernel boot parameters to add Intel intel_iommu=on, iommu=pt AMD iommu=pt To modify the kernel boot parameters of Proxmox VE 8, open the /etc/default/grub file with the nano text editor as follows: $ nano /etc/default/grub At the end of the GRUB_CMDLINE_LINUX_DEFAULT, add the required kernel boot parameters for enabling IOMMU depending on the processor you’re using. As I am using an AMD processor, I have added only the kernel boot parameter iommu=pt at the end of the GRUB_CMDLINE_LINUX_DEFAULT line in the /etc/default/grub file. Once you’re done, press <Ctrl> + X followed by Y and <Enter> to save the /etc/default/grub file. Now, update the GRUB boot configurations with the following command: $ update-grub2 Once the GRUB boot configurations are updated, click on Reboot to restart your Proxmox VE 8 server for the changes to take effect. Verifying if IOMMU is Enabled on Proxmox VE 8 To verify whether IOMMU is enabled on Proxmox VE 8, run the following command: $ dmesg | grep -e DMAR -e IOMMU If IOMMU is enabled, you will see some outputs confirming that IOMMU is enabled. If IOMMU is not enabled, you may not see any outputs. You also need to have the IOMMU Interrupt Remapping enabled for PCI/PCIE passthrough to work. To check if IOMMU Interrupt Remapping is enabled on your Proxmox VE 8 server, run the following command: $ dmesg | grep 'remapping' As you can see, IOMMU Interrupt Remapping is enabled on my Proxmox VE 8 server. NOTE: Most modern AMD and Intel processors will have IOMMU Interrupt Remapping enabled. If for any reason, you don’t have IOMMU Interrupt Remapping enabled, there’s a workaround. You have to enable Unsafe Interrupts for VFIO. Read this article for more information on enabling Unsafe Interrupts on your Proxmox VE 8 server. Loading VFIO Kernel Modules on Proxmox VE 8 The PCI/PCIE passthrough is done mainly by the VFIO (Virtual Function I/O) kernel modules on Proxmox VE 8. The VFIO kernel modules are not loaded at boot time by default on Proxmox VE 8. But, it’s easy to load the VFIO kernel modules at boot time on Proxmox VE 8. First, open the /etc/modules-load.d/vfio.conf file with the nano text editor as follows: $ nano /etc/modules-load.d/vfio.conf Type in the following lines in the /etc/modules-load.d/vfio.conf file. vfio vfio_iommu_type1 vfio_pci Once you’re done, press <Ctrl> + X followed by Y and <Enter> to save the changes. Now, update the initramfs of your Proxmox VE 8 installation with the following command: $ update-initramfs -u -k all Once the initramfs is updated, click on Reboot to restart your Proxmox VE 8 server for the changes to take effect. Once your Proxmox VE 8 server boots, you should see that all the required VFIO kernel modules are loaded. $ lsmod | grep vfio Listing IOMMU Groups on Proxmox VE 8 To passthrough PCI/PCIE devices on Proxmox VE 8 virtual machines (VMs), you will need to check the IOMMU groups of your PCI/PCIE devices quite frequently. To make checking for IOMMU groups easier, I decided to write a shell script (I got it from GitHub, but I can’t remember the name of the original poster) in the path /usr/local/bin/print-iommu-groups so that I can just run print-iommu-groups command and it will print the IOMMU groups on the Proxmox VE 8 shell. First, create a new file print-iommu-groups in the path /usr/local/bin and open it with the nano text editor as follows: $ nano /usr/local/bin/print-iommu-groups Type in the following lines in the print-iommu-groups file: #!/bin/bash shopt -s nullglob for g in `find /sys/kernel/iommu_groups/* -maxdepth 0 -type d | sort -V`; do echo "IOMMU Group ${g##*/}:" for d in $g/devices/*; do echo -e "\t$(lspci -nns ${d##*/})" done; done; Once you’re done, press <Ctrl> + X followed by Y and <Enter> to save the changes to the print-iommu-groups file. Make the print-iommu-groups script file executable with the following command: $ chmod +x /usr/local/bin/print-iommu-groups Now, you can run the print-iommu-groups command as follows to print the IOMMU groups of the PCI/PCIE devices installed on your Proxmox VE 8 server: $ print-iommu-groups As you can see, the IOMMU groups of the PCI/PCIE devices installed on my Proxmox VE 8 server are printed. Checking if Your NVIDIA GPU Can Be Passthrough to a Proxmox VE 8 Virtual Machine (VM) To passthrough a PCI/PCIE device to a Proxmox VE 8 virtual machine (VM), it must be in its own IOMMU group. If 2 or more PCI/PCIE devices share an IOMMU group, you can’t passthrough any of the PCI/PCIE devices of that IOMMU group to any Proxmox VE 8 virtual machines (VMs). So, if your NVIDIA GPU and its audio device are on its own IOMMU group, you can passthrough the NVIDIA GPU to any Proxmox VE 8 virtual machines (VMs). On my Proxmox VE 8 server, I am using an MSI X570 ACE motherboard paired with a Ryzen 3900X processor and Gigabyte RTX 4070 NVIDIA GPU. According to the IOMMU groups of my system, I can passthrough the NVIDIA RTX 4070 GPU (IOMMU Group 21), RTL8125 2.5Gbe Ethernet Controller (IOMMU Group 20), Intel I211 Gigabit Ethernet Controller (IOMMU Group 19), a USB 3.0 controller (IOMMU Group 24), and the Onboard HD Audio Controller (IOMMU Group 25). $ print-iommu-groups As the main focus of this article is configuring Proxmox VE 8 for passing through the NVIDIA GPU to Proxmox VE 8 virtual machines, the NVIDIA GPU and its Audio device must be in its own IOMMU group. Checking for the Kernel Modules to Blacklist for PCI/PCIE Passthrough on Proxmox VE 8 To passthrough a PCI/PCIE device on a Proxmox VE 8 virtual machine (VM), you must make sure that Proxmox VE forces it to use the VFIO kernel module instead of its original kernel module. To find out the kernel module your PCI/PCIE devices are using, you will need to know the vendor ID and device ID of these PCI/PCIE devices. You can find the vendor ID and device ID of the PCI/PCIE devices using the print-iommu-groups command. $ print-iommu-groups For example, the vendor ID and device ID of my NVIDIA RTX 4070 GPU is 10de:2786 and it’s audio device is 10de:22bc. To find the kernel module a PCI/PCIE device 10de:2786 (my NVIDIA RTX 4070 GPU) is using, run the lspci command as follows: $ lspci -v -d 10de:2786 As you can see, my NVIDIA RTX 4070 GPU is using the nvidiafb and nouveau kernel modules by default. So, they can’t be passed to a Proxmox VE 8 virtual machine (VM) at this point. The Audio device of my NVIDIA RTX 4070 GPU is using the snd_hda_intel kernel module. So, it can’t be passed on a Proxmox VE 8 virtual machine at this point either. $ lspci -v -d 10de:22bc So, to passthrough my NVIDIA RTX 4070 GPU and its audio device on a Proxmox VE 8 virtual machine (VM), I must blacklist the nvidiafb, nouveau, and snd_hda_intel kernel modules and configure my NVIDIA RTX 4070 GPU and its audio device to use the vfio-pci kernel module. Blacklisting Required Kernel Modules for PCI/PCIE Passthrough on Proxmox VE 8 To blacklist kernel modules on Proxmox VE 8, open the /etc/modprobe.d/blacklist.conf file with the nano text editor as follows: $ nano /etc/modprobe.d/blacklist.conf To blacklist the kernel modules nouveau, nvidiafb, and snd_hda_intel kernel modules (to passthrough NVIDIA GPU), add the following lines in the /etc/modprobe.d/blacklist.conf file: blacklist nouveau blacklist nvidiafb blacklist snd_hda_intel Once you’re done, press <Ctrl> + X followed by Y and <Enter> to save the /etc/modprobe.d/blacklist.conf file. Configuring Your NVIDIA GPU to Use the VFIO Kernel Module on Proxmox VE 8 To configure the PCI/PCIE device (i.e. your NVIDIA GPU) to use the VFIO kernel module, you need to know their vendor ID and device ID. In this case, the vendor ID and device ID of my NVIDIA RTX 4070 GPU and its audio device are 10de:2786 and 10de:22bc. To configure your NVIDIA GPU to use the VFIO kernel module, open the /etc/modprobe.d/vfio.conf file with the nano text editor as follows: $ nano /etc/modprobe.d/vfio.conf To configure your NVIDIA GPU and its audio device with the <vendor-id>:<device-id> 10de:2786 and 10de:22bc (let’s say) respectively to use the VFIO kernel module, add the following line to the /etc/modprobe.d/vfio.conf file. options vfio-pci ids=10de:2786,10de:22bc Once you’re done, press <Ctrl> + X followed by Y and <Enter> to save the /etc/modprobe.d/vfio.conf file. Now, update the initramfs of Proxmove VE 8 with the following command: $ update-initramfs -u -k all Once initramfs is updated, click on Reboot to restart your Proxmox VE 8 server for the changes to take effect. Once your Proxmox VE 8 server boots, you should see that your NVIDIA GPU and its audio device (10de:2786 and 10de:22bc in my case) are using the vfio-pci kernel module. Now, your NVIDIA GPU is ready to be passed to a Proxmox VE 8 virtual machine. $ lspci -v -d 10de:2786 $ lspci -v -d 10de:22bc Passthrough the NVIDIA GPU to a Proxmox VE 8 Virtual Machine (VM) Now that your NVIDIA GPU is ready for passthrough on Proxmox VE 8 virtual machines (VMs), you can passthrough your NVIDIA GPU on your desired Proxmox VE 8 virtual machine and install the NVIDIA GPU drivers depending on the operating system that you’re using on that virtual machine as usual. For detailed information on how to passthrough your NVIDIA GPU on a Proxmox VE 8 virtual machine (VM) with different operating systems installed, read one of the following articles: How to Passthrough an NVIDIA GPU to a Windows 11 Proxmox VE 8 Virtual Machine (VM) How to Passthrough an NVIDIA GPU to a Ubuntu 24.04 LTS Proxmox VE 8 Virtual Machine (VM) How to Passthrough an NVIDIA GPU to a LinuxMint 21 Proxmox VE 8 Virtual Machine (VM) How to Passthrough an NVIDIA GPU to a Debian 12 Proxmox VE 8 Virtual Machine (VM) How to Passthrough an NVIDIA GPU to an Elementary OS 8 Proxmox VE 8 Virtual Machine (VM) How to Passthrough an NVIDIA GPU to a Fedora 39+ Proxmox VE 8 Virtual Machine (VM) How to Passthrough an NVIDIA GPU on an Arch Linux Proxmox VE 8 Virtual Machine (VM) How to Passthrough an NVIDIA GPU on a Red Hat Enterprise Linux 9 (RHEL 9) Proxmox VE 8 Virtual Machine (VM) Still Having Problems with PCI/PCIE Passthrough on Proxmox VE 8 Virtual Machines (VMs)? Even after trying everything listed in this article correctly, if PCI/PCIE passthrough still does not work for you, be sure to try out some of the Proxmox VE PCI/PCIE passthrough tricks and/or workarounds that you can use to get PCI/PCIE passthrough work on your hardware. Conclusion In this article, I have shown you how to configure your Proxmox VE 8 server for PCI/PCIE passthrough so that you can passthrough PCI/PCIE devices (i.e. your NVIDIA GPU) to your Proxmox VE 8 virtual machines (VMs). I have also shown you how to find out the kernel modules that you need to blacklist and how to blacklist them for a successful passthrough of your desired PCI/PCIE devices (i.e. your NVIDIA GPU) to a Proxmox VE 8 virtual machine. Finally, I have shown you how to configure your desired PCI/PCIE devices (i.e. your NVIDIA GPU) to use the VFIO kernel modules, which is also an essential step for a successful passthrough of your desired PCI/PCIE devices (i.e. your NVIDIA GPU) to a Proxmox VE 8 virtual machine (VM). References PCI(e) Passthrough – Proxmox VE PCI Passthrough – Proxmox VE The ultimate gaming virtual machine on proxmox – YouTube
-
How to Install VirtualBox on Ubuntu 24.04
Anyone can easily run multiple operating systems on one host simultaneously, provided they have VirtualBox installed. Even for Ubuntu 24.04, you can install VirtualBox and utilize it to run any supported operating system.The best part about VirtualBox is that it is open-source virtualization software, and you can install and use it anytime. Whether you are stuck on how to install VirtualBox on Ubuntu 24.04 or looking to advance with other operating systems on top of your host, this post gives you two easy methods. Two Methods of Installing VirtualBox on Ubuntu 24.04 There are different ways of installing VirtualBox on Ubuntu 24.04. For instance, you can retrieve a stable VirtualBox version from Ubuntu’s repository or add Oracle’s VirtualBox repository to install a specific version. Which method to use will depend on your requirements, and we’ve discussed the methods in the sections below. Method 1: Install VirtualBox via APT The easiest way of installing VirtualBox on Ubuntu 24.04 is by sourcing it from the official Ubuntu repository using APT. Below are the steps you should follow. Step 1: Update the Repository In every installation, the first step involves refreshing the source list to update the package index by executing the following command. $ sudo apt update Step 2: Install VirtualBox Once you’ve updated your package index, the next task is to run the install command below to fetch and install the VirtualBox package. $ sudo apt install virtualbox Step 3: Verify the Installation After the installation, use the following command to check the installed version. The output also confirms that you successfully installed VirtualBox on Ubuntu 24.04. $ VboxManage --version Method 2: Install VirtualBox from Oracle’s Repository The previous method shows that we installed VirtualBox version 7.0.14. However, if you visit the VirtualBox website, depending on when you read this post, it’s likely that the version we’ve installed may not be the latest. Although the older VirtualBox versions are okay, installing the latest version is always the better option as it contains all patches and fixes. However, to install the latest version, you must add Oracle’s repository to your Ubuntu before you can execute the install command. Step 1: Install Prerequisites All the dependencies you require before you can add the Oracle VirtualBox repository can be installed when you install the software-properties-common package. $ sudo apt install software-properties-common Step 2: Add GPG Keys GPG keys help verify the authenticity of repositories before we can add them to the system. The Oracle repository is a third-party repository, and by installing the GPG keys, it will be checked for integrity and authenticity. Here’s how you add the GPG keys. $ wget -q https://www.virtualbox.org/download/oracle_vbox_2016.asc -O- | sudo apt-key add - You will receive an output on your terminal showing that the key has been downloaded and installed. Step 3: Add Oracle’s VirtualBox Repository Oracle has a VirtualBox repository for all supported Operating Systems. To fetch this repository and add it to your /etc/apt/sources.list.d/, execute the following command. $ echo "deb [arch=amd64] https://download.virtualbox.org/virtualbox/debian $(lsb_release -cs) contrib" | sudo tee /etc/apt/sources.list.d/virtualbox.list The output shows that a new repository entry has been created from which we will source VirtualBox when we execute the install command. Step 4: Install VirtualBox With the repository added, let’s first refresh the package index by updating it. $ sudo apt update Next, specify which VirtualBox you want to install using the below syntax. $ sudo apt install virtualbox-[version] For instance, if the latest version when reading this post is version 7.1, you would replace version in the above command with 7.1. However, ensure that the specified version is available on the VirtualBox website. Otherwise, you will get an error as you can’t install something that can’t be found. Conclusion VirtualBox is an effective way of running numerous Operating Systems on one host simultaneously. This post shares two methods of installing VirtualBox on Ubuntu 24.04. First, you can install it via APT by sourcing it from the Ubuntu repository. Alternatively, you can add the Oracle repository and specify a specific version number for the VirtualBox you want to install.
-
Important Proxmox VE 8 PCI/PCIE Passthrough Tweaks, Fixes, and Workarounds
In recent years, support for PCI/PCIE (i.e. GPU passthrough) has improved a lot in newer hardware. So, the regular Proxmox VE PCI/PCIE and GPU passthrough guide should work in most new hardware. Still, you may face many problems passing through GPUs and other PCI/PCIE devices on a Proxmox VE virtual machine. There are many tweaks/fixes/workarounds for some of the common Proxmox VE GPU and PCI/PCIE passthrough problems. In this article, I am going to discuss some of the most common Proxmox VE PCI/PCIE passthrough and GPU passthrough problems and the steps you can take to solve those problems. Table of Contents What to do if IOMMU Interrupt Remapping is not Supported? What to do if My GPU (or PCI/PCIE Device) is not in its own IOMMU Group? How do I Blacklist AMD GPU Drivers on Proxmox VE? How do I Blacklist NVIDIA GPU Drivers on Proxmox VE? How do I Blacklist Intel GPU Drivers on Proxmox VE? How to Check if my GPU (or PCI/PCIE Device) is Using the VFIO Driver on Proxmox VE? I Have Blacklisted the AMU GPU Drivers, Still, the GPU is not Using the VFIO Driver, What to Do? I Have Blacklisted the NVIDIA GPU Drivers, Still, the GPU is not Using the VFIO Driver, What to Do? I Have Blacklisted the Intel GPU Drivers, Still, the GPU is not Using the VFIO Driver, What to Do? Single GPU Used VFIO Driver, But When Configured a Second GPU, it Didn’t Work, Why? Why Disable VGA Arbitration for the GPUs and How to Do It? What if my GPU is Still not Using the VFIO Driver Even After Configuring VFIO? GPU Passthrough Showed No Errors, But I’m Getting a Black Screen on the Monitor Connected to the GPU Passed to the Proxmox VE VM, Why? What is AMD Vendor Reset Bug and How to Solve it? How to Provide a vBIOS for the Passed GPU on a Proxmox VE Virtual Machine? What to do If Some Apps Crash the Proxmox VE Windows Virtual Machine? How to Solve HDMI Audio Crackling/Broken Problems on Proxmox VE Linux Virtual Machines?. How to Update Proxmox VE initramfs? How to Update Proxmox VE GRUB Bootloader? Conclusion References What to do If IOMMU Interrupt Remapping is not Supported? For PCI/PCIE passthrough, IOMMU interrupt remapping is essential. To check whether your processor supports IOMMU interrupt remapping, run the command below: $ dmesg | grep -i remap If your processor supports IOMMU interrupt remapping, you will see some sort of output confirming that interrupt remapping is enabled. Otherwise, you will see no outputs. If IOMMU interrupt remapping is not supported on your processor, you will have to configure unsafe interrupts on your Proxmox VE server to passthrough PCI/PCIE devices on Proxmox VE virtual machines. To configure unsafe interrupts on Proxmox VE, create a new file iommu_unsafe_interrupts.conf in the /etc/modprobe.d directory and open it with the nano text editor as follows: $ nano /etc/modprobe.d/iommu_unsafe_interrupts.conf Add the following line in the iommu_unsafe_interrupts.conf file and press <Ctrl> + X followed by Y and <Enter> to save the file. options vfio_iommu_type1 allow_unsafe_interrupts=1 Once you’re done, you must update the initramfs of your Proxmox VE server. What to do if my GPU (or PCI/PCIE Device) is not in its own IOMMU Group? If your server has multiple PCI/PCIE slots, you can move the GPU to a different PCI/PCIE slot and see if the GPU is in its own IOMMU group. If that does not work, you can try enabling the ACS override kernel patch on Proxmox VE. To try enabling the ACS override kernel patch on Proxmox VE, open the /etc/default/grub file with the nano text editor as follows: $ nano /etc/default/grub Add the kernel boot option pcie_acs_override=downstream at the end of the GRUB_CMDLINE_LINUX_DEFAULT. Once you’re done, press <Ctrl> + X followed by Y and <Enter> to save the file and make sure to update the Proxmox VE GRUB bootloader for the changes to take effect. You should have better IOMMU grouping once your Proxmox VE server boots. If your GPU still does not have its own IOMMU group, you can go one step further by using the pcie_acs_override=downstream,multifunction instead. You should have an even better IOMMU grouping. If pcie_acs_override=downstream,multifunction results in better IOMMU grouping that pcie_acs_override=downstream, then why use pcie_acs_override=downstream at all? Well, the purpose of PCIE ACS override is to fool the kernel into thinking that the PCIE devices are isolated when they are not in reality. So, PCIE ACS override comes with security and stability issues. That’s why you should try using a less aggressive PCIE ACS override option pcie_acs_override=downstream first and see if your problem is solved. If pcie_acs_override=downstream does not work, only then you should use the more aggressive option pcie_acs_override=downstream,multifunction. How do I Blacklist AMD GPU Drivers on Proxmox VE? If you want to passthrough an AMD GPU on Proxmox VE virtual machines, you must blacklist the AMD GPU drivers and make sure that it uses the VFIO driver instead. First, open the /etc/modprobe.d/blacklist.conf file with the nano text editor as follows: $ nano /etc/modprobe.d/blacklist.conf To blacklist the AMD GPU drivers, add the following lines to the /etc/modprobe.d/blacklist.conf file and press <Ctrl> + X followed by Y and <Enter> to save the file. blacklist radeon blacklist amdgpu Once you’re done, you must update the initramfs of your Proxmox VE server for the changes to take effect. How do I Blacklist NVIDIA GPU Drivers on Proxmox VE? If you want to passthrough an NVIDIA GPU on Proxmox VE virtual machines, you must blacklist the NVIDIA GPU drivers and make sure that it uses the VFIO driver instead. First, open the /etc/modprobe.d/blacklist.conf file with the nano text editor as follows: $ nano /etc/modprobe.d/blacklist.conf To blacklist the NVIDIA GPU drivers, add the following lines to the /etc/modprobe.d/blacklist.conf file and press <Ctrl> + X followed by Y and <Enter> to save the file. blacklist nouveau blacklist nvidia blacklist nvidiafb blacklist nvidia_drm Once you’re done, you must update the initramfs of your Proxmox VE server for the changes to take effect. How do I Blacklist Intel GPU Drivers on Proxmox VE? If you want to passthrough an Intel GPU on Proxmox VE virtual machines, you must blacklist the Intel GPU drivers and make sure that it uses the VFIO driver instead. First, open the /etc/modprobe.d/blacklist.conf file with the nano text editor as follows: $ nano /etc/modprobe.d/blacklist.conf To blacklist the Intel GPU drivers, add the following lines to the /etc/modprobe.d/blacklist.conf file and press <Ctrl> + X followed by Y and <Enter> to save the file. blacklist snd_hda_intel blacklist snd_hda_codec_hdmi blacklist i915 Once you’re done, you must update the initramfs of your Proxmox VE server for the changes to take effect. How to Check if my GPU (or PCI/PCIE Device) is Using the VFIO Driver on Proxmox VE? To check if your GPU or desired PCI/PCIE devices are using the VFIO driver, run the following command: $ lspci -v If your GPU or PCI/PCIE device is using the VFIO driver, you should see the line Kernel driver in use: vfio-pci as marked in the screenshot below. I Have Blacklisted the AMU GPU Drivers, Still, the GPU is not Using the VFIO Driver, What to Do? At times, blacklisting the AMD GPU drivers is not enough, you also have to configure the AMD GPU drivers to load after the VFIO driver. To do that, open the /etc/modprobe.d/vfio.conf file with the nano text editor as follows: $ nano /etc/modprobe.d/vfio.conf To configure the AMD GPU drivers to load after the VFIO driver, add the following lines to the /etc/modprobe.d/vfio.conf file and press <Ctrl> + X followed by Y and <Enter> to save the file. softdep radeon pre: vfio-pci softdep amdgpu pre: vfio-pci Once you’re done, you must update the initramfs of your Proxmox VE server for the changes to take effect. I Have Blacklisted the NVIDIA GPU Drivers, Still, the GPU is not Using the VFIO Driver, What to Do? At times, blacklisting the NVIDIA GPU drivers is not enough, you also have to configure the NVIDIA GPU drivers to load after the VFIO driver. To do that, open the /etc/modprobe.d/vfio.conf file with the nano text editor as follows: $ nano /etc/modprobe.d/vfio.conf To configure the NVIDIA GPU drivers to load after the VFIO driver, add the following lines to the /etc/modprobe.d/vfio.conf file and press <Ctrl> + X followed by Y and <Enter> to save the file. softdep nouveau pre: vfio-pci softdep nvidia pre: vfio-pci softdep nvidiafb pre: vfio-pci softdep nvidia_drm pre: vfio-pci softdep drm pre: vfio-pci Once you’re done, you must update the initramfs of your Proxmox VE server for the changes to take effect. I Have Blacklisted the Intel GPU Drivers, Still, the GPU is not Using the VFIO Driver, What to Do? At times, blacklisting the Intel GPU drivers is not enough, you also have to configure the Intel GPU drivers to load after the VFIO driver. To do that, open the /etc/modprobe.d/vfio.conf file with the nano text editor as follows: $ nano /etc/modprobe.d/vfio.conf To configure the Intel GPU drivers to load after the VFIO driver, add the following lines to the /etc/modprobe.d/vfio.conf file and press <Ctrl> + X followed by Y and <Enter> to save the file. softdep snd_hda_intel pre: vfio-pci softdep snd_hda_codec_hdmi pre: vfio-pci softdep i915 pre: vfio-pci Once you’re done, you must update the initramfs of your Proxmox VE server for the changes to take effect. Single GPU Used VFIO Driver, But When Configured a Second GPU, it Didn’t Work, Why? In the /etc/modprobe.d/vfio.conf file, you must add the IDs of all the PCI/PCIE devices that you want to use the VFIO driver in a single line. One device per line won’t work. For example, if you have 2 GPUs that you want to configure to use the VFIO driver, you must add their IDs in a single line in the /etc/modprobe.d/vfio.conf file as follows: options vfio-pci ids=<GPU-1>,<GPU-1-Audio>,<GPU-2>,<GPU-2-Audio> If you want to add another GPU to the list, just append it at the end of the existing vfio-pci line in the /etc/modprobe.d/vfio.conf file as follows: options vfio-pci ids=<GPU-1>,<GPU-1-Audio>,<GPU-2>,<GPU-2-Audio>,<GPU-3>,<GPU-3-Audio> Never do this. Although it looks much cleaner, it won’t work. I do wish we could specify PCI/PCIE IDs this way. options vfio-pci ids=<GPU-1>,<GPU-1-Audio> options vfio-pci ids=<GPU-2>,<GPU-2-Audio> options vfio-pci ids=<GPU-3>,<GPU-3-Audio> Why Disable VGA Arbitration for the GPUs and How to Do It? If you’re using UEFI/OVMF BIOS on the Proxmox VE virtual machine where you want to passthrough the GPU, you can disable VGA arbitration which will reduce the legacy codes required during boot. To disable VGA arbitration for the GPUs, add disable_vga=1 at the end of the vfio-pci option in the /etc/modprobe.d/vfio.conf file as shown below: options vfio-pci ids=<GPU-1>,<GPU-1-Audio>,<GPU-2>,<GPU-2-Audio> disable_vga=1 What if my GPU is Still not Using the VFIO Driver Even After Configuring VFIO? Even after doing everything correctly, if your GPU still does not use the VFIO driver, you will need to try booting Proxmox VE with kernel options that disable the video framebuffer. On Proxmox VE 7.1 and older, the nofb nomodeset video=vesafb:off video=efifb:off video=simplefb:off kernel options disable the GPU framebuffer for your Proxmox VE server. On Proxmox VE 7.2 and newer, the initcall_blacklist=sysfb_init kernel option does a better job at disabling the GPU framebuffer for your Proxmox VE server. Open the GRUB bootloader configuration file /etc/default/grub file with the nano text editor with the following command: $ nano /etc/default/grub Add the kernel option initcall_blacklist=sysfb_init at the end of the GRUB_CMDLINE_LINUX_DEFAULT. Once you’re done, press <Ctrl> + X followed by Y and <Enter> to save the file and make sure to update the Proxmox VE GRUB bootloader for the changes to take effect. GPU Passthrough Showed No Errors, But I’m Getting a Black Screen on the Monitor Connected to the GPU Passed to the Proxmox VE VM, Why? Once you’ve passed a GPU to a Proxmox VE virtual machine, make sure to use the Default Graphics card before you start the virtual machine. This way, you will be able to access the display of the virtual machine from the Proxmox VE web management UI, download the GPU driver installer on the virtual machine, and install it on the virtual machine. Once the GPU driver is installed on the virtual machine, the screen of the virtual machine will be displayed on the monitor connected to the GPU that you’ve passed to the virtual machine as well. Once the GPU driver is installed on the virtual machine and the screen of the virtual machine is displayed on the monitor connected to the GPU (passed to the virtual machine), power off the virtual machine and set the Display Graphic card of the virtual machine to none. Once you’re set, the next time you power on the virtual machine, the screen of the virtual machine will be displayed on the monitor connected to the GPU (passed to the virtual machine) only, nothing will be displayed on the Proxmox VE web management UI. This way, you will have the same experience as using a real computer even though you’re using a virtual machine. Remember, never use SPICE, VirtIO GPU, and VirGL GPU Display Graphic card on the Proxmox VE virtual machine that you’re configuring for GPU passthrough as it has a high chance of failure. What is AMD Vendor Reset Bug and How to Solve it? AMD GPUs have a well-known bug called “vendor reset bug”. Once an AMD GPU is passed to a Proxmox VE virtual machine, and you power off this virtual machine, you won’t be able to use the AMD GPU in another Proxmox VE virtual machine. At times, your Proxmox VE server will become unresponsive as a result. This is called the “vendor reset bug” of AMD GPUs. The reason this happens is that AMD GPUs can’t reset themselves correctly after being passed to a virtual machine. To fix this problem, you will have to reset your AMD GPU properly. For more information on installing the AMD vendor reset on Proxmox VE, read this article and read this thread on Proxmox VE forum. Also, check the vendor reset GitHub page. How to Provide a vBIOS for the Passed GPU on a Proxmox VE Virtual Machine? If you’ve installed the GPU on the first slot of your motherboard, you might not be able to passthrough the GPU in a Proxmox VE virtual machine by default. Some motherboards shadow the vBIOS of the GPU installed on the first slot by default which is the reason the GPU installed on the first slot of those motherboards can’t be passed to virtual machines. The solution to this problem is to install the GPU on the second slot of the motherboard, extract the vBIOS of the GPU, install the GPU on the first slot of the motherboard, and passthrough the GPU to a Proxmox VE virtual machine along with the extracted vBIOS of the GPU. NOTE: To learn how to extract the vBIOS of your GPU, read this article. Once you’ve obtained the vBIOS for your GPU, you must store the vBIOS file in the /usr/share/kvm/ directory of your Proxmox VE server to access it. Once the vBIOS file for your GPU is stored in the /usr/share/kvm/ directory, you need to configure your virtual machine to use it. Currently, there is no way to specify the vBIOS file for PCI/PCIE devices of Proxmox VE virtual machines from the Proxmox VE web management UI. So, you will have to do everything from the Proxmox VE shell/command-line. You can find the Proxmox VE virtual machine configuration files in the /etc/pve/qemu-server/ directory of your Proxmox VE server. Each Proxmox VE virtual machine has one configuration file in this directory in the format <VM-ID>.conf. For example, to open the Proxmox VE virtual machine configuration file (for editing) for the virtual machine ID 100, you will need to run the following command: $ nano /etc/pve/qemu-server/100.conf In the virtual machine configuration file, you will need to append romfile=<vBIOS-filename> in the hostpciX line which is responsible for passing the GPU on the virtual machine. For example, if the vBIOS filename for my GPU is gigabyte-nvidia-1050ti.bin, and I have passed the GPU on the first slot (slot 0) of the virtual machine (hostpci0), then in the 100.conf file, the line should be as follows: hostpci0: <PCI-ID-of-GPU>,x-vga=on,romfile=gigabyte-nvidia-1050ti.bin Once you’re done, save the virtual machine configuration file by pressing <Ctrl> + X followed by Y and <Enter>, start the virtual machine, and check if the GPU passthrough is working. What to do if Some Apps Crash the Proxmox VE Windows Virtual Machine? Some apps such as GeForce Experience, Passmark, etc. might crash Proxmox VE Windows virtual machines. You might also experience a sudden blue screen of death (BSOD) on your Proxmox VE Windows virtual machines. The reason it happens is that the Windows virtual machine might try to access the model-specific registers (MSRs) that are not actually available and depending on how your hardware handles MSRs requests, your system might crash. The solution to this problem is ignoring MSRs messages on your Proxmox VE server. To configure MSRs on your Proxmox VE server, open the /etc/modprobe.d/kvm.conf file with the nano text editor as follows: $ nano /etc/modprobe.d/kvm.conf To ignore MSRs on your Proxmox VE server, add the following line to the /etc/modprobe.d/kvm.conf file. options kvm ignore_msrs=1 Once MSRs are ignored, you might see a lot of MSRs warning messages in your dmesg system log. To avoid that, you can ignore MSRs as well as disable logging MSRs warning messages by adding the following line instead: options kvm ignore_msrs=1 report_ignored_msrs=0 Once you’re done, press <Ctrl> + X followed by Y and <Enter> to save the /etc/modprobe.d/kvm.conf file and update the initramfs of your Proxmox VE server for the changes to take effect. How to Solve HDMI Audio Crackling/Broken Problems on Proxmox VE Linux Virtual Machines? If you’ve passed the GPU to a Linux Proxmox VE virtual machine and you’re getting bad audio quality on the virtual machine, you will need to enable MSI (Message Signal Interrupt) for the audio device on the Proxmox VE virtual machine. To enable MSI on the Linux Proxmox VE virtual machine, open the /etc/modprobe.d/snd-hda-intel.conf file with the nano text editor on the virtual machine with the following command: $ sudo nano /etc/modprobe.d/snd-had-intel.conf Add the following line and save the file by pressing <Ctrl> + X followed by Y and <Enter>. options snd-hda-intel enable_msi=1 For the changes to take effect, reboot the Linux virtual machine with the following command: $ sudo reboot Once the virtual machine boots, check if MSI is enabled for the audio device with the following command: $ sudo lspci -vv If MSI is enabled for the audio device on the virtual machine, you should see the marked line in the audio device information. How to Update Proxmox VE initramfs? Every time you make any changes to files in the /etc/modules-load.d/ and /etc/modprobe.d/ directories, you must update the initramfs of your Proxmox VE 8 installation with the following command: $ update-initramfs -u -k all Once Proxmox VE initramfs is updated, reboot your Proxmox VE server for the changes to take effect. $ reboot How to Update Proxmox VE GRUB Bootloader? Every time you update the Proxmox VE GRUB boot configuration file /etc/default/grub, you must update the GRUB bootloader for the changes to take effect. To update the Proxmox VE GRUB bootloader with the new configurations, run the following command: $ update-grub2 Once the GRUB bootloader is updated with the new configuration, reboot your Proxmox VE server for the changes to take effect. $ reboot Conclusion In this article, have discussed some of the most common Proxmox VE PCI/PCIE passthrough and GPU passthrough problems and the steps you can take to solve those problems. References [TUTORIAL] – PCI/GPU Passthrough on Proxmox VE 8 : Installation and configuration | Proxmox Support Forum Ultimate Beginner’s Guide to Proxmox GPU Passthrough Reading and Writing Model Specific Registers in Linux The MSI Driver Guide HOWTO — The Linux Kernel documentation
-
How to Install Proxmox VE 8 on Your Server
Proxmox VE (Virtualization Environment) is an open-source enterprise virtualization and containerization platform. It has a built-in user-friendly web interface for managing virtual machines and LXC containers. It has other features such as Ceph software-defined storage (SDS), software-defined networking (SDN), high availability (HA) clustering, and many more. After the recent Broadcom acquisition of VMware, the cost of VMware products has risen to the point that many small to medium-sized companies are/will be forced to switch to alternate products. Even the free VMware ESXi is discontinued which is bad news for homelab users. Proxmox VE is one of the best alternatives to VMware vSphere and it has the same set of features as VMware vSphere (with a few exceptions of course). Proxmox VE is open-source and free, which is great for home labs as well as businesses. Proxmox VE also has an optional enterprise subscription option that you can purchase if needed. In this article, I will show you how to install Proxmox VE 8 on your server. I will cover Graphical UI-based installation methods of Proxmox VE and Terminal UI-based installation for systems having problems with the Graphical UI-based installer. Table of Contents Booting Proxmox VE 8 from a USB Thumb Drive Installing Proxmox VE 8 using Graphical UI Installing Proxmox VE 8 using Terminal UI Accessing Proxmox VE 8 Management UI from a Web Browser Enabling Proxmox VE Community Package Repositories Keeping Proxmox VE Up-to-date Conclusion References Booting Proxmox VE 8 from a USB Thumb Drive First, you need to download the Proxmox VE 8 ISO image and create a bootable USB thumb drive of Proxmox VE 8. If you need any assistance on that, read this article. Once you’ve created a bootable USB thumb drive of Proxmox VE 8, power off your server, insert the bootable USB thumb drive on your server, and boot the Proxmox VE 8 installer from it. Depending on the motherboard manufacturer, you need to press a certain key after pressing the power button to boot from the USB thumb drive. If you need any assistance on booting your server from a USB thumb drive, read this article. Once you’ve successfully booted from the USB thumb drive, the Proxmox VE GRUB menu should be displayed. Installing Proxmox VE 8 using Graphical UI To install Proxmox VE 8 using a graphical user interface, select Install Proxmox VE (Graphical) from the Proxmox VE GRUB menu and press <Enter>. The Proxmox VE installer should be displayed. Click on I agree. Now, you have to configure the disk for the Proxmox VE installation. You can configure the disk for Proxmox VE installation in different ways: If you have a single 500GB/1TB (or larger capacity) SSD/HDD on your server, you can use it for Proxmox VE installation as well as storing virtual machine images, container images, snapshots, backups, ISO images, and so on. That’s not very safe, but you can try out Proxmox this way without needing a lot of hardware resources. You can use a small 64GB or 128GB SSD for Proxmox VE installation only. Once Proxmox VE is installed, you can create additional storage pools for storing virtual machine images, container images, snapshots, backups, ISO images, and so on. You can create a big ZFS or BTRFS RAID for Proxmox VE installation which will also be used for storing virtual machine images, container images, snapshots, backups, ISO images, and so on. a) To install Proxmox VE on a single SSD/HDD and also use the SSD/HDD for storing virtual machine and container images, ISO images, virtual machine and container snapshots, virtual machine and container backups, etc., select the SSD/HDD from the Target Harddisk dropdown menu[1] and click on Next[2]. Proxmox VE will use a small portion of the free disk space for the Proxmox VE root filesystem and the rest of the disk space will be used for storing virtual machine and container data. If you want to change the filesystem of your Proxmox VE installation or configure the size of different Proxmox VE partitions/storages, select the HDD/SSD you want to use for your Proxmox VE installation from the Target Harddisk dropdown menu and click on Options. An advanced disk configuration window should be displayed. From the Filesystem dropdown menu, select your desired filesystem. ext4 and xfs filesystems are supported for single-disk Proxmox VE installation at the time of this writing[1]. Other storage configuration parameters are: hdsize[2]: By default Proxmox VE will use all the disk space of the selected HDD/SSD. To keep some disk space free on the selected HDD/SSD, type in the amount of disk space (in GB) that you want Proxmox VE to use and the rest of the disk space should be free. swapsize[3]: By default, Proxmox VE will use 4GB to 8GB of disk space for swap depending on the amount of memory/RAM you have installed on the server. To set a custom swap size for Proxmox VE, type in your desired swap size (in GB unit) here. maxroot[4]: Defines the maximum disk space to use for the Proxmox VE LVM root volume/filesystem. minfree[5]: Defines the minimum disk space that must be free in the Proxmox VE LVM volume group (VG). This space will be used for LVM snapshots. maxvz[6]: Defines the maximum disk space to use for the Proxmox VE LVM data volume where virtual machine and container data/images will be stored. Once you’re done with the disk configuration, click on OK[7]. To install Proxmox VE on disk with your desired storage configuration, click on Next. b) To install Proxmox VE on a small SSD and create the necessary storage for the virtual machine and container data later, select the SSD from the Target Harddisk dropdown menu[1] and click on Options[2]. Set maxvz to 0 to disable virtual machine and container storage on the SSD where Proxmox VE will be installed and click on OK. Once you’re done, click on Next. c) To create a ZFS or BTRFS RAID and install Proxmox VE on the RAID, click on Options. You can pick different ZFS and BTRFS RAID types from the Filesystem dropdown menu. Each of these RAID types works differently and requires a different number of disks. For more information on how different RAID types work, their requirements, features, data safety, etc, read this article. RAID0, RAID1, and RAID10 are discussed in this article thoroughly. RAIDZ-1 and RAIDZ-2 work in the same way as RAID5 and RAID6 respectively. RAID5 and RAID6 are also discussed in this article. RAIDZ-1 requires at least 2 disks (3 disks recommended), uses a single parity, and can sustain only 1 disk failure. RAIDZ-2 requires at least 3 disks (4 disks recommended), uses double parity, and can sustain 2 disks failure. RAIDZ-3 requires at least 4 disks (5 disks recommended), uses triple parity, and can sustain 3 disks failure. Although you can create BTRFS RAIDs on Proxmox VE, at the time of this writing, BTRFS on Proxmox VE is still in technology preview. So, I don’t recommend using it in production systems. I will demonstrate ZFS RAID configuration on Proxmox VE in this article. To create a ZFS RAID for Proxmox VE installation, select your desired ZFS RAID type from the Filesystem dropdown menu[1]. From the Disk Setup tab, select the disks that you want to use for the ZFS RAID using the Harddisk X dropdown menus[2]. If you don’t want to use a disk for the ZFS RAID, select – do not use – from the respective Harddisk X dropdown menu[3]. From the Advanced Options tab, you can configure different ZFS filesystem parameters. ashift[1]: You can set ZFS block size using this option. The block size is calculated using the formula 2ashift. The default ashift value is 12, which is 212 = 4096 = 4 KB block size. 4KB block size is good for SSDs. If you’re using a mechanical hard drive (HDD), you need to set ashift to 9 (29 = 512 bytes) as HDDs use 512 bytes block size. compress[2]: You can enable/disable ZFS compression from this dropdown menu. To enable compression, set compression to on. To disable compression, set compression to off. When compression is on, the default ZFS compression algorithm (lz4 at the time of this writing) is used. You can select other ZFS compression algorithms (i.e. lzjb, zle, gzip, zstd) as well if you have such preferences. checksum[3]: ZFS checksums are used to detect corrupted files so that they can be repaired. You can enable/disable ZFS checksum from this dropdown menu. To enable ZFS checksum, set checksum to on. To disable ZFS checksum, set checksum to off. When checksum is on, the fletcher4 algorithm is used for non-deduped (deduplication disabled) datasets and sha256 algorithm is used for deduped (deduplication enabled) datasets by default. copies[4]: You can set the number of redundant copies of the data you want to keep in your ZFS RAID. This is in addition to the RAID level redundancy and provides extra data protection. The default number of copy is 1 and you can store 3 copies of data at max in your ZFS RAID. This feature is also known as ditto blocks. ARC max size[5]: You can set the maximum amount of memory ZFS is allowed to use for the Adaptive Replacement Cache (ARC) from here. hdsize[6]: By default all the free disk space is used for the ZFS RAID. If you want to keep some portion of the disk space of each SSD free and use the rest for the ZFS RAID, type in the disk space you want to use (in GB) here. For example, if you have 40GB disks and you want to use 35GB of each disk for the ZFS RAID and keep 5GB of disk space free on each disks, you will need to type in 35GB in here. Once you’re done with the ZFS RAID configuration, click on OK[7]. Once you’re done with the ZFS storage configuration, click on Next to continue. Type in the name of your country[1], select your time zone[2], select your keyboard layout[3], and click on Next[4]. Type in your Proxmox VE root password[1] and your email[2]. Once you’re done, click on Next[3]. If you have multiple network interfaces available on your server, select the one you want to use for accessing the Proxmox VE web management UI from the Management Interface dropdown menu[1]. If you have only a single network interface available on your server, it will be selected automatically. Type in the domain name that you want to use for Proxmox VE in the Hostname (FQDN) section[2]. Type in your desired IP information for the Proxmox VE server[3] and click on Next[4]. An overview of your Proxmox VE installation should be displayed. If everything looks good, click on Install to start the Proxmox VE installation. NOTE: If anything seems wrong or you want to change certain information, you can always click on Previous to go back and fix it. So, make sure to check everything before clicking on Install. The Proxmox VE installation should start. It will take a while to complete. Once the Proxmox VE installation is complete, you will see the following window. Your server should restart within a few seconds. On the next boot, you will see the Proxmox VE GRUB boot menu. Once Proxmox VE is booted, you will see the Proxmox VE command-line login prompt. You will also see the access URL of the Proxmox VE web-based management UI. Installing Proxmox VE 8 using Terminal UI In some hardware, the Proxmox VE graphical installer may not work. In that case, you can always use the Proxmox VE terminal installer. You will find the same options in the Proxmox VE terminal installer as in the graphical installer. So, you should not have any problems installing Proxmox VE on your server using the terminal installer. To use the Proxmox VE terminal installer, select Install Proxmox VE (Terminal UI) from the Proxmox VE GRUB boot menu and press <Enter>. Select <I agree> and press <Enter>. To install Proxmox VE on a single disk, select an HDD/SSD from the Target harddisk section, select <Next>, and press <Enter>. For advanced disk configuration or ZFS/BTRFS RAID setup, select <Advanced options> and press <Enter>. You will find the same disk configuration options as in the Proxmox VE graphical installer. I have already discussed all of them in the Proxmox VE Graphical UI installation section. Make sure to check it out for detailed information on all of those disk configuration options. Once you’ve configured the disk/disks for the Proxmox VE installation, select <Ok> and press <Enter>. Once you’re done with advanced disk configuration for your Proxmox VE installation, select <Next> and press <Enter>. Select your country, timezone, and keyboard layout. Once you’re done, select <Next> and press <Enter>. Type in your Proxmox VE root password and email address. Once you’re done, select <Next> and press <Enter>. Configure the management network interface for Proxmox VE, select <Next>, and press <Enter>. An overview of your Proxmox VE installation should be displayed. If everything looks good, select <Install> and press <Enter> to start the Proxmox VE installation. NOTE: If anything seems wrong or you want to change certain information, you can always select <Previous> and press <Enter> to go back and fix it. So, make sure to check everything before installing Proxmox VE. The Proxmox VE installation should start. It will take a while to complete. Once the Proxmox VE installation is complete, you will see the following window. Your server should restart within a few seconds. Once Proxmox VE is booted, you will see the Proxmox VE command-line login prompt. You will also see the access URL of the Proxmox VE web-based management UI. Accessing Proxmox VE 8 Management UI from a Web Browser To access the Proxmox VE web-based management UI from a web browser, you need a modern web browser (i.e. Google Chrome, Microsoft Edge, Mozilla Firefox, Opera, Apple Safari). Open a web browser of your choice and visit the Proxmox VE access URL (i.e. https://192.168.0.105:8006) from the web browser. By default, Proxmox VE uses a self-signed SSL certificate which your web browser will not trust. So, you will see a similar warning. To accept the Proxmox VE self-signed SSL certificate, click on Advanced. Then, click on Accept the Risk and Continue. You will see the Proxmox VE login prompt. Type in your Proxmox VE login username (root) and password[1] and click on Login[2]. You should be logged in to your Proxmox VE web-management UI. As you’re using the free version of Proxmox VE, you will see a No valid subscription warning message every time you log in to Proxmox VE. To ignore this warning and continue using Proxmox VE for free, just click on OK. The No valid subscription warning should be gone. Proxmox VE is now ready to use. Enabling Proxmox VE Community Package Repositories If you want to use Proxmox VE for free, after installing Proxmox VE on your server, one of the first things you want to do is disable the Proxmox VE enterprise package repositories and enable the Proxmox VE community package repositories. This way, you can get access to the Proxmox VE package repositories for free and keep your Proxmox VE server up-to-date. To learn how to enable the Proxmox VE community package repositories, read this article. Keeping Proxmox VE Up-to-date After installing Proxmox VE on your server, you should check if new updates are available for your Proxmox VE server. If new updates are available, you should install them as it will improve the performance, stability, and security of your Proxmox VE server. For more information on keeping your Proxmox VE server up-to-date, read this article. Conclusion In this article, I have shown you how to install Proxmox VE on your server using the Graphical installer UI and the Terminal installer UI. The Proxmox VE Terminal installer UI installer is for systems that don’t support the Proxmox VE Graphical installer UI. So, if you’re having difficulty with the Proxmox VE Graphical installer UI, the Terminal installer UI will still work and save your day. I have also discussed and demonstrated different disk/storage configuration methods for Proxmox VE as well as configuring ZFS RAID for Proxmox VE and installing Proxmox VE on the ZFS RAID as well. References RAIDZ Types Reference ZFS/Virtual disks – ArchWiki ZFS Tuning Recommendations | High Availability The copies Property Checksums and Their Use in ZFS — OpenZFS documentation ZFS ARC Parameters – Oracle Solaris Tunable Parameters Reference Manual
-
How to Upload/Download ISO Images on Proxmox VE Server
Most of the operating system distributes their installer program in ISO image format. So, the most common way of installing an operating system on a Proxmox VE virtual machine is using an ISO image of that operating system. You can obtain the ISO image file of your favorite operating systems from their official website. To install your favorite operating system on a Proxmox VE virtual machine, the ISO image of that operating system must be available in a proper storage location on your Proxmox VE server. The Proxmox VE storage that supports ISO image files has a section ISO Images and has options for uploading and downloading ISO images. In this article, I will show you how to upload an ISO image to your Proxmox VE server from your computer. I will show you how to download an ISO image directly on your Proxmox VE server using the download links or URL of that ISO image. Table of Contents Uploading an ISO Image on Proxmox VE Server from Your Computer Downloading an ISO Image on Proxmox VE Server using URL Conclusion Uploading an ISO Image on Proxmox VE Server from Your Computer To upload an ISO image on your Proxmox VE server from your computer, navigate to the ISO Images section of an ISO image-supported storage from the Proxmox VE web management UI and click on Upload. Click on Select File from the Upload window. Select the ISO image file that you want to upload on your Proxmox VE server from the filesystem of your computer[1] and click on Open[2]. Once the ISO image file is selected, the ISO image file name will be displayed in the File name section. If you want, you can modify the ISO image file name which will be stored on your Proxmox VE server once it’s uploaded[1]. The size of the ISO image file will be displayed in the File size section[2]. Once you’re ready to upload the ISO image on your Proxmox VE server, click on Upload[3]. The ISO image file is being uploaded to the Proxmox VE server. It will take a few seconds to complete. If for some reason you want to stop the upload process, click on Abort. Once the ISO image file is uploaded to your Proxmox VE server, you will see the following window. Just close it. Shortly, the ISO image that you’ve uploaded to your Proxmox VE server should be listed in the ISO Images section of the selected Proxmox VE storage. Downloading an ISO Image on Proxmox VE Server using URL To upload an ISO image on your Proxmox VE server using a URL or download link, visit the official website of the operating system that you want to download and copy the download link or URL of the ISO image from the website. For example, to download the ISO image of Debian 12, visit the official website of Debian from a web browser[1], right-click on Download, and click on Copy Link[2]. Then, navigate to the ISO Images section of an ISO image-supported storage from the Proxmox VE web management UI and click on Download from URL. Paste the download link or URL of the ISO image in the URL section and click on Query URL. Proxmox VE should check the ISO file URL and obtain the necessary information like the File name[1] and File size[2] of the ISO image file. If you want to save the ISO image file in a different name on your Proxmox VE server, just type it in the File name section[1]. Once you’re ready, click on Download[3]. Proxmox VE should start downloading the ISO image file from the URL. It will take a while to complete. Once the ISO image file is downloaded on your Proxmox VE server, you will see the following window. Just close it. The downloaded ISO image file should be listed in the ISO Images section of the selected Proxmox VE storage. Conclusion In this article, I have shown you how to upload an ISO image from your computer on the Proxmox VE server. I have also shown you how to download an ISO image using a URL directly on your Proxmox VE server.
-
How to Keep Proxmox VE 8 Server Up-to-date
Keeping your Proxmox VE server up-to-date is important as newer updates come with bug fixes and improved security. If you’re using the Proxmox VE community version (the free version of Proxmox VE without an enterprise subscription), installing new updates will also add new features to your Proxmox VE server as they are released. In this article, I am going to show you how to check if new updates are available on your Proxmox VE server. If updates are available, I will also show you how to install the available updates on your Proxmox VE server. Table of Contents Enabling the Proxmox VE Community Package Repositories Checking for Available Updates on Proxmox VE Installing Available Updates on Proxmox VE Conclusion Enabling the Proxmox VE Community Package Repositories If you don’t have an enterprise subscription on your Proxmox VE server, you need to disable the Proxmox VE enterprise package repositories and enable the Proxmox VE community package repositories to receive software updates on your Proxmox VE server. If you want to keep using Proxmox VE for free, make sure to enable the Proxmox VE community package repositories. Checking for Available Updates on Proxmox VE To check if new updates are available on your Proxmox VE server, log in to your Proxmox VE web-management UI, navigate to the Updates section of your Proxmox VE server, and click on Refresh. If you’re using the Proxmox VE community version (free version), you will see a No valid subscription warning. Click on OK to ignore the warning. The Proxmox VE package database should be updated. Close the Task viewer window. If newer updates are not available, then you will see the No updates available message after the Proxmox VE package database is updated. If newer updates are available for your Proxmox VE server, you will see a list of packages that can be updated as shown in the screenshot below. Installing Available Updates on Proxmox VE To install all the available updates on your Proxmox VE server, click on Upgrade. A new NoVNC window should be displayed. Press Y and then press <Enter> to confirm the installation. The Proxmox VE updates are being downloaded. It will take a while to complete. The Proxmox VE updates are being installed. It will take a while to complete. At this point, the Proxmox VE updates should be installed. Close the NoVNC window. If you check for Proxmox VE updates, you should see the No updates available message. Your Proxmox VE server should be up-to-date[1]. After the updates are installed, it’s best to reboot your Proxmox VE server. To reboot your Proxmox VE server, click on Reboot[2]. Conclusion In this article, I have shown you how to check if new updates are available for your Proxmox VE server. If new updates are available, I have also shown you how to install the available updates on your Proxmox VE server. You should always keep your Proxmox VE server up-to-date so that you get the latest bug fixes and security updates.