Planet SciPy

Filipe Saraiva's blog 2024-01-31 19:35:37

O governo precisa fazer uso de redes sociais abertas

As redes sociais do, lá no rodapé da página Afastado não apenas do Twitter mas das notícias sobre aquela rede, apenas recentemente descobri que os perfis por lá não são mais públicos. Todos os posts de todos os perfis agora estão inacessíveis para quem não tem uma conta na rede social. Além de óbvios… Continue a ler »O governo precisa fazer uso de redes sociais abertas
ListenData 2024-01-30 17:10:00

4 Ways to Correct Grammar with Python

This tutorial explains various methods for checking and correcting grammatical errors using Python. Automatic grammar correction helps students, professionals and content creators to make sure their writing follows proper grammar rules.

To read this article in full, please click here
This post appeared first on ListenData 2024-01-26 15:30:43

Mikiko Bazeley: What I Learned Building the ML Platform at Mailchimp

I started my ML journey as an analyst back in 2016. Since then, I’ve worked as a data scientist for a multinational company and an MLOps engineer for an early-stage startup before moving to Mailchimp in May 2021. I joined just before its $12 billion acquisition by Intuit. It was an exciting time to be… 2024-01-26 14:23:51

How to Build Machine Learning Systems With a Feature Store

Training and evaluating models is just the first step toward machine-learning success. To generate value from your model, it should make many predictions, and these predictions should improve a product or lead to better decisions. For this, we have to build an entire machine-learning system around our models that manages their lifecycle, feeds properly prepared… 2024-01-24 15:43:35

Logging PyMC and Arviz Artifacts on Neptune

When dealing with limited data or uncertain scenarios, one of the most potent methods is Bayesian inference. At its core, it is a formulation of statistics that enables one to incorporate prior knowledge and update beliefs systematically and coherently. Its power lies in the flexibility in model-building, especially its ability to take into account insights…
Filipe Saraiva's blog 2024-01-24 01:34:04

Por fora das redes sociais

Até tentei abrir meu Twitter pra verificar quando foi a última vez que passei por lá, mas descobri que os perfis não são mais públicos. Esse é meu estado atual em relação àquela rede: só hoje descobri essa restrição. Após cultivar um perfil por mais de uma década (talvez uma década e meia?), ser um… Continue a ler »Por fora das redes sociais
Quansight Labs 2024-01-24 00:00:00

Captioning: A Newcomer’s Guide

What are those words on the bottom of your video screen and where do they come from? Captioning’s normalization in the past several decades may seem like it would render those questions moot, but understanding more about captions means making more informed decisions about when, how, and why we make sure information is accessible.
Filipe Saraiva's blog 2024-01-22 13:59:06

Mestrado em Ciência da Computação 2024.1 na UFPA: Otimização e Aprendizado de Máquina

E iniciamos 2024 com mais um processo seletivo para o Mestrado em Ciência da Computação do Programa de Pós-Graduação em Ciência da Computação da UFPA, campus Belém. Nesse processo disponibilizo 2 vagas na área de inteligência computacional, que é basicamente um outro nome para inteligência artificial. As pesquisas que oriento atualmente tem muita relação com… Continue a ler »Mestrado em Ciência da Computação 2024.1 na UFPA: Otimização e Aprendizado de Máquina 2024-01-19 13:24:06

LLM Fine-Tuning and Model Selection Using Neptune and Transformers

Imagine you’re facing the following challenge: you want to develop a Large Language Model (LLM) that can proficiently respond to inquiries in Portuguese. You have a valuable dataset and can choose from various base models. But here’s the catch — you’re working with limited computational resources and can’t rely on expensive, high-power machines for fine-tuning. How do…
Pierre de Buyl's homepage - scipy 2024-01-10 10:00:00

JupyterHub on Ubuntu

JupyterHub is a solution to host Jupyter notebooks via a web interface to several users. On their website, there are two main recommended methods for deploying JupyterHub: /The Littlest JupyterHub/, or TLJH, and a Kubernetes deployment. I tried the former and had trouble diagnosing some configuration issues. In the following, I show how to deploy the pip version on Ubuntu 22.04.

maheshakyas' website 2023-12-30 13:20:00

First Post

So I’m starting to write about things that I find interesting. This is not a new year’s resolution. It just happened to be end of the year when I started writing this post.

Spyder Blog 2023-12-19 12:00:00

Reusable research Birds of a Feather session at Scipy 2023: Goals and challenges

The Spyder team and collaborators hosted a Birds of a Feather (BoF) session at SciPy 2023, focused on moving beyond just scripts and notebooks toward truly reproducible, reusable research. Here, we’ll recap the motivation and goals of the BoF and share the common challenges that participants brought up with notebooks and moving toward reproducible, reusable research. In our next post, we’ll follow up with some of the tips, tools, platforms and strategies attendees brought up as ways to address them, including using Spyder! We'd like to thank Juanita Gomez for helping organize the BoF, Hari for his hard work compiling a summary of the outcomes, and everyone for attending and sharing such great ideas and insights!

The trouble with notebooks

The overwhelming majority of current scientific code is siloed away into one-off scripts and notebooks, where the only real mechanism for reusing and building upon them is good old copy and paste. In order to keep "building upon the shoulders of giants",


Major Price Cuts: Deepnote Versus Cocalc --- Compute Server Pricing

Major Price Cuts: Deepnote Versus Cocalc

Deepnote is one of CoCalc's direct competitors. Today (November 30, 2023) they announced a major price cut on their pay-as-you-go rates:

"As you may have already heard, starting December 1, we're slashing the pay-as-you-go rates across all our machines – making them more budget-friendly without any hidden terms."

At CoCalc, we recently finally launched pay as you go machines, which was one of our main development priorities for 2023. These are fully integrated with CoCalc, and were a huge amount of work to bring to market. I was terrified that Deepnote's major price cuts would make Deepnote a much better deal than CoCalc.

Here is how the Deepnote and CoCalc pricing compares:

Deepnote's New Price CoCalc Standard CoCalc Spot
64GB RAM, 16vCPU $1.54 $0.59 $0.12
128GB RAM, 16vCPU (32 CPU on cocalc) $2.02 $1.17 $0.23
K80 GPU (newer L4 GPU on cocalc) $2.02 $0.93 $0.30

Conclusion: CoCalc's prices are still highly competitive, even in light of Deepnote's major price cuts.

Also, spot instances do work very well for many applications.

ListenData 2023-11-28 14:46:00

How to Get Unique Values in a Column in Pandas DataFrame

This tutorial explains how to get unique values from a column in Pandas DataFrame, along with examples.

Find Unique Values in a Column
To read this article in full, please click here
This post appeared first on ListenData
scikit-learn Blog 2023-11-27 00:00:00

My mentored internship at scikit-learn

Author: Stefanie Senger , François Goupil
Gaël Varoquaux - programming 2023-11-26 23:00:00

People underestimate how impactful Scikit-learn continues to be


François Chollet rightfully said that people often underestimate the impact of scikit-learn. I give here a few illustrations to back his claim.

A few days ago, François Chollet (the creator of Keras, the library that that democratized deep learning) posted:

Indeed, scikit-learn continues to be the most popular machine …

Quansight Labs 2023-11-24 00:00:00

Unlocking C-level performance in pandas.DataFrame.apply with Numba

A quick overview of the new Numba engine in DataFrame.apply
Quansight Labs 2023-11-23 00:00:00

Improving the interpolation and signal processing capabilities of CuPy

We are excited to spread the news about the improvements that have been taking place in CuPy, where 18 interpolation and more than 100 signal processing parallel GPU APIs are now available as part of a EOSS4 CZI grant.
Keep the gradient flowing 2023-11-18 23:00:00

Optimization Nuggets: Stochastic Polyak Step-size, Part 2

This blog post discusses the convergence rate of the Stochastic Gradient Descent with Stochastic Polyak Step-size (SGD-SPS) algorithm for minimizing a finite sum objective. Building upon the proof of the previous post, we show that the convergence rate can be improved to O(1/t) under the additional assumption that … 2023-11-14 15:30:20

How to Visualize Deep Learning Models

Deep learning models are typically highly complex. While many traditional machine learning models make do with just a couple of hundreds of parameters, deep learning models have millions or billions of parameters. The large language model GPT-4 that OpenAI released in the spring of 2023 is rumored to have nearly 2 trillion parameters. It goes…
ListenData 2023-11-11 22:13:00

NumPy argmin() Function : Learn with Examples

In this tutorial, we will see how to use the NumPy argmin() function in Python along with examples.

To read this article in full, please click here
This post appeared first on ListenData
Quansight Labs 2023-11-08 00:00:00

The 'eu' in eucatastrophe – Why SciPy builds for Python 3.12 on Windows are a minor miracle

Moving SciPy to Meson meant finding a different Fortran compiler on Windows, which was particularly tricky to pull off for conda-forge. This blog tells the story about how things looked pretty grim for the Python 3.12 release, and how things ended up working out just in the nick of time.
Quansight Labs 2023-11-08 00:00:00

Adding support for polynomials to Numba

My work was focused on improving NumPy support in Numba, with focus on the polynomial package.
Quansight Labs 2023-11-08 00:00:00

Refining NumPy's Python API for its 2.0 release

A journey through NumPy's Python API from a maintenance perspective.
ListenData 2023-11-06 09:54:00

NumPy argmax() Function : Learn with Examples

In this tutorial, we will see how to use the NumPy argmax() function in Python along with examples.

The numpy.argmax() function in Python is used to find the indices of the maximum element in an array.

Syntax of NumPy argmax() Function

Below is the syntax of the NumPy argmax() function:

import numpy as np
np.argmax(array, axis, out)
To read this article in full, please click here
This post appeared first on ListenData
Quansight Labs 2023-10-31 00:00:00

Improving SymPy's Documentation

SymPy's documentation has received many significant improvements over the past two years thanks to funding by the Chan Zuckerberg Initiative.
Quansight Labs 2023-10-30 00:00:00

Doctesting for PyData Libraries

The journey of a PyData Newbie
Quansight Labs 2023-10-30 00:00:00

Integrating Hypothesis into SymPy

Gives an introduction to the utility of hypothesis in SymPy 2023-10-20 11:42:04

How to Use Exploratory Notebooks [Best Practices]

Jupyter notebooks have been one of the most controversial tools in the data science community. There are some outspoken critics, as well as passionate fans. Nevertheless, many data scientists will agree that they can be really valuable – if used well. And that’s what we’re going to focus on in this article, which is the…
ListenData 2023-10-12 13:59:00

How to Install PyTorch on Windows

This tutorial explains the steps to install PyTorch on Windows.

PyTorch is a free and open source machine learning library developed by Facebook's AI Research lab. It is built on the Torch library and is mainly used for tasks like computer vision and natural language processing (NLP).

To read this article in full, please click here
This post appeared first on ListenData
Quansight Labs 2023-10-04 00:00:00

The Array API Standard in SciPy

How can SciPy use the Array API Standard to achieve array library interoperability? 2023-10-03 08:58:29

Learnings From Building the ML Platform at Mailchimp

This article was originally an episode of the ML Platform Podcast, a show where Piotr Niedźwiedź and Aurimas Griciūnas, together with ML platform professionals, discuss design choices, best practices, example tool stacks, and real-world learnings from some of the best ML platform professionals. In this episode, Mikiko Bazeley shares her learnings from building the ML…
Keep the gradient flowing 2023-09-28 22:00:00

Optimization Nuggets: Stochastic Polyak Step-size

The stochastic Polyak step-size (SPS) is a practical variant of the Polyak step-size for stochastic optimization. In this blog post, we'll discuss the algorithm and provide a simple analysis for convex objectives with bounded gradients.

Quansight Labs 2023-09-20 00:00:00

Bridging Data Science Tools with PyTorch-Ignite's Code-Generator and Nebari

A summary of my contributions to the Code-Generator project and PyTorch-Ignite ecosystem in the past few months as Quansight Labs intern and my learnings in the process.
Quansight Labs 2023-09-19 00:00:00

Array API Support in scikit-learn

In this blog post, we share how scikit-learn enabled support for the Array API Standard.
scikit-learn Blog 2023-09-10 00:00:00

scikit-learn 2023 In-person Developer Sprint in Paris, France

Author: Reshama Shaikh , François Goupil 2023-09-07 08:15:37

Software Engineering Patterns for Machine Learning

Have you ever talked to your Front-end or Back-end engineer peers and noticed how much they care about code quality? Writing legible, reusable, and efficient code has always been a challenge in the software development community. Endless conversations happen every day across Github pull requests and Slack threads around this topic. How to best adapt… 2023-08-11 13:15:44

ML Pipeline Architecture Design Patterns (With 10 Real-World Examples)

There comes a time when every ML practitioner realizes that training a model in Jupyter Notebook is just one small part of the entire project. Getting a workflow ready which takes your data from its raw form to predictions while maintaining responsiveness and flexibility is the real deal. At that point, the Data Scientists or…
ListenData 2023-08-08 16:38:00

How to Run Windscribe VPN in Windows with Python

In this tutorial, we will show you how to run Windscribe VPN in Windows using Python Code. Windscribe is a popular VPN service that offers several features. Windscribe's free version maintains the same speed as the paid plans.

To read this article in full, please click here
This post appeared first on ListenData
ListenData 2023-08-08 14:52:00

How to Run Proton VPN in Windows with Python

In this tutorial, we will show you how to run Proton VPN in Windows using Python Code.


First you need to download and install the OpenVPN GUI. OpenVPN GUI is a user-friendly application that allows you to easily configure and manage OpenVPN connections on your computer. OpenVPN is a popular open-source VPN protocol that provides secure and encrypted connections over public networks.

To read this article in full, please click here
This post appeared first on ListenData 2023-08-04 14:10:10

Organizing ML Monorepo With Pants

Have you ever copy-pasted chunks of utility code between projects, resulting in multiple versions of the same code living in different repositories? Or, perhaps, you had to make pull requests to tens of projects after the name of the GCP bucket in which you store your data was updated? Situations described above arise way too… 2023-08-03 11:24:14

Learnings From Building the ML Platform at Stitch Fix

This article was originally an episode of the ML Platform Podcast, a show where Piotr Niedźwiedź and Aurimas Griciūnas, together with ML platform professionals, discuss design choices, best practices, example tool stacks, and real-world learnings from some of the best ML platform professionals. In this episode, Stefan Krawczyk shares his learnings from building the ML…
Filipe Saraiva's blog 2023-07-30 14:46:19

Mestrado em Ciência da Computação 2023.2 na UFPA: PLN e Metaheurísticas

Estamos com mais um processo seletivo para o Mestrado em Ciência da Computação na UFPA aberto, com entrada para agora em agosto de 2023. Dessa vez continuo procurando candidatos e candidatas que queiram desenvolver pesquisas na área de metaheurísticas, para quaisquer problemas combinatoriais que queiram aplicar. Esse ainda é um campo muito vasto e tenho… Continue a ler »Mestrado em Ciência da Computação 2023.2 na UFPA: PLN e Metaheurísticas 2023-07-18 11:20:16

Deploying Conversational AI Products to Production With Jason Flaks

This article was originally an episode of the MLOps Live, an interactive Q&A session where ML practitioners answer questions from other ML practitioners.  Every episode is focused on one specific ML topic, and during this one, we talked to Jason Falks about deploying conversational AI products to production. You can watch it on YouTube: Or…
ListenData 2023-07-04 18:10:00

How to Use ChatGPT for Data Science

In this article, we will explore how you, as a data scientist, can use ChatGPT to enhance your data science projects. ChatGPT is a powerful tool that can help you in various aspects of your work, from exploring and analyzing data to generating insights and helping you with coding and troubleshooting. It can also help you to learn data science faster.

To read this article in full, please click here
This post appeared first on ListenData
Quansight Labs 2023-06-28 00:00:00

PyCon US 2023 - An action-packed week

In this post I'm sharing my experience of traveling to the US for PyCon US 2023
Quansight Labs 2023-06-27 00:00:00

Numba Dynamic Exceptions

In the following blogpost, we will explore the newly added feature in Numba: Dynamic exception support. We will discuss the previous limitations and explain how Numba was enhanced to handle runtime exceptions.
ListenData 2023-06-19 14:32:00

How to build ChatGPT Clone in Python

In this article, we will see the steps involved in building a chat application and an answering bot in Python using the ChatGPT API and gradio.

Developing a chat application in Python provides more control and flexibility over the ChatGPT website. You can customize and extend the chat application as per your needs. It also help you to integrate with your existing systems and other APIs.

To read this article in full, please click here
This post appeared first on ListenData
Keep the gradient flowing 2023-06-13 22:00:00

On the Convergence of the Unadjusted Langevin Algorithm

The Langevin algorithm is a simple and powerful method to sample from a probability distribution. It's a key ingredient of some machine learning methods such as diffusion models and differentially private learning. In this post, I'll derive a simple convergence analysis of this method in the special case when the …

Spyder Blog 2023-06-08 00:00:00

Spyder gets CZI grant to add remote development features, and a new job opening!

During the last few years, Spyder has positioned itself as a popular data science IDE by combining interactive computing and ease of use with robust programming tools. However, limited remote development support compared to some other IDEs has hindered adoption, as many users would like to work with data and code on high performance computing (HPC) clusters or cloud providers like AWS, GCP or DigitalOcean while developing on their personal computers. Adding such features would open up many new research possibilities by enabling the scientific community to tackle data and compute-intensive programming tasks from the ease and efficiency of their local development environments. Thanks to a two-year grant from the Chan Zuckerberg Initiative, we will be now able to address this shortcoming.

Right now, users have two main options to work remotely using a local IDE (aside from a purely web browser-based approach, which is sometimes not available or desirable): They can either edit and execute their files in a terminal, which is not

ListenData 2023-06-06 11:57:00

Transformers Agent: AI Tool That Automates Everything

We have a new AI tool in the market called Transformers Agent which is so powerful that it can automate just about any task you can think of. It can generate and edit images, video, audio, answer questions about documents, convert speech to text and do a lot of other things.

Hugging Face, a well-known name in the open-source AI world, released Transformers Agent that provides a natural language API on top of transformers. The API is designed to be easy to use. With a single line code, it provides a variety of tools for performing natural language tasks, such as question answering, image generation, video generation, text to speech, text classification, and summarization.

To read this article in full, please click here
This post appeared first on ListenData
ListenData 2023-05-26 09:38:00

Complete Guide to Massively Multilingual Speech (MMS) Model

In this article we have covered everything about the latest multilingual speech model from the basics of how it works to the step-by-step implementation of the model in Python.

Meta, the company that owns Facebook, released a new AI model called Massively Multilingual Speech (MMS) that can convert text to speech and speech to text in over 1,100 languages. It is available for free. It will not only help academicians and researchers across the world but also language preservationists or activists to document and preserve endangered languages to prevent their extinction.

MMS is trained on a large dataset of text and audio in over 1,100 languages. Another best part about the model is that it generates audio which sounds very natural, like human speech. It is also able to identify more than 4,000 spoken languages.

To read this article in full, please click here
This post appeared first on ListenData
Martin Fitzpatrick - python 2023-05-04 09:00:00

PyQt6 Book now available in Korean: 파이썬과 Qt6로 GUI 애플리케이션 만들기 — The hands-on guide to creating GUI applications with Python gets a new translation

I am very happy to announce that my Python GUI programming book Create GUI Applications with Python & Qt6 / PyQt6 Edition …

ListenData 2023-04-19 12:32:00

AutoGPT : Everything You Need To Know

In this post we have covered AutoGPT in detail. By end of this tutorial, you will not only understand how it works but also will be able to run it on your system. Auto-GPT has gained a significant amount of popularity in the media. It has become one of the most talked-about topics across various social media platforms after ChatGPT. It has not only captured the attention of people in Artifical Intelligence community but also people from other background. Media outlets across countries covered it and reported how it can automate everything ranging from simple to complex tasks.

Table of Contents

What is AutoGPT?

AutoGPT is an experimental open-source project built on the latest ChatGPT model i.e GPT-4. It is not limited to ChatGPT as it can also do web search and try to find information from internet. When a client gives us a project with instructions on what to do. We, as analysts, perform tasks to fulfill the project requirements.

ListenData 2023-04-09 08:58:00

Open Source GPT-4 Models Made Easy

In this post we will explain how Open Source GPT-4 Models work and how you can use them as an alternative to a commercial OpenAI GPT-4 solution. Everyday new open source large language models (LLMs) are emerging and the list gets bigger and bigger. We will cover these two models GPT-4 version of Alpaca and Vicuna. This tutorial includes the workings of the models, as well as their implementation with Python

Table of Contents

Vicuna Model Introduction : Vicuna Model

Vicuna was the first open-source model available publicly which is comparable to GPT-4 output. It was fine-tuned on Meta's LLaMA 13B model and conversations dataset collected from ShareGPT. ShareGPT is the website wherein people share their ChatGPT conversations with others.

Important Note : The Vicuna Model was primarily trained on the GPT-3.5 dataset because most of the conversations on ShareGPT during the model's development were based on GPT-3.5. But the model was evaluated based on
Living in an Ivory Basement 2023-04-06 22:00:00

snakemake for doing bioinformatics - inputs and outputs and more!

Slithering your way into bioinformatics with snakemake - inputs and outputs and more!

ListenData 2023-03-30 08:01:00

15 Free Open Source ChatGPT Alternatives (with Code)

In this article we will explain how Open Source ChatGPT alternatives work and how you can use them to build your own ChatGPT clone for free. By the end of this article you will have a good understanding of these models and will be able to compare and use them.

Benefits of Open Source ChatGPT Alternatives

There are various benefits of using open source large language models which are alternatives to ChatGPT. Some of them are listed below.

  1. Data Privacy: Many companies want to have control over data. It is important for them as they don't want any third-party to have access to their data.
  2. Customization: It allows developers to train large language models with their own data and some filtering on some topics if they want to apply
  3. Affordability: Open source GPT models let you to train sophisticated large language models without worrying about expensive hardware.
  4. Democratizing AI: It opens room for further research which can be used for solving real-world problems.
Table of
Martin Fitzpatrick - python 2023-03-20 06:00:00

Getting Started With Git and GitHub in Your Python Projects — Version-Controlling Your Python Projects With Git and GitHub

Using a version control system (VCS) is crucial for any software development project. These systems allow developers to track changes …

ListenData 2023-03-12 07:26:00

Complete Guide to Visual ChatGPT

In this post, we will talk about how to run Visual ChatGPT in Python with Google Colab. ChatGPT has garnered huge popularity recently due to its capability of human style response. As of now, it only provides responses in text format, which means it cannot process, generate or edit images. Microsoft recently released a solution for the same to handle images. Now you can ask ChatGPT to generate or edit the image for you.

Demo of Visual ChatGPT

In the image below, you can see the final output of Visual ChatGPT - how it looks like.

To read this article in full, please click here
This post appeared first on ListenData
Martin Fitzpatrick - python 2023-03-06 06:00:00

Working With Classes in Python — Understanding the Intricacies of Python Classes

Python supports object-oriented programming (OOP) through classes, which allow you to bundle data and behavior in a single entity. Python …

Living in an Ivory Basement 2023-03-02 23:00:00

snakemake for doing bioinformatics - using wildcards to generalize your rules

Slithering your way into bioinformatics with snakemake, wildcard version

Quansight Labs 2023-02-15 00:00:00

Quansight Labs Annual Report 2022: Celebrating Growth and Sustainability in Open Source

Presenting our first annual report! Read about our project achievements, community initiatives, and work culture.
Living in an Ivory Basement 2023-01-22 23:00:00

snakemake for doing bioinformatics - a beginner's guide (part 2)

Slithering your way into bioinformatics with snakemake, round 2.

Living in an Ivory Basement 2023-01-13 23:00:00

snakemake for doing bioinformatics - a beginner's guide (part 1)

Slithering your way into bioinformatics with snakemake

Quansight Labs 2023-01-10 00:00:00

Python packaging & workflows - where to next?

Potential solutions for pain points when dealing with native code; what needs unifying in the Python packaging space, and how should that be approached?
Living in an Ivory Basement 2023-01-07 23:00:00

sourmash has a plugin interface!

Enabling plugins in sourmash, for less directed & more incoherent progress!

Filipe Saraiva's blog 2022-12-15 01:13:41

A obsolescência humana na novela

Passei o dia no trabalho brincando com o ChatGPT, a inteligência artificial para conversas. Travamos diálogos surreais e esdrúxulos: perguntei a ela como seria a América Latina caso tivesse sido colonizada pela Inglaterra e também qual a relação entre Senhor dos Anéis e Game of Thrones. Em outra, pedi que escrevesse um diálogo fictício entre… Continue a ler »A obsolescência humana na novela
Quansight Labs 2022-12-12 00:00:00

Sangho's Internship at Quansight with PyTorch-Ignite project

Blogpost of working on the PyTorch-Ignite project during internship at Quansight
ListenData 2022-12-09 08:31:00

ChatGPT-4 Is a Smart Analyst, Unlike GPT-3.5

ChatGPT has been trending on social media platforms. It has crossed one million users in just a week time. Those who haven't heard about ChatGPT, it's a large language model trained by OpenAI. In simple words, it's a chat bot which answers your questions and the responses it provides may sound human-like. It's an impressive machine learning solution. With the release of GPT-4 we can rely on it over Google search for learning on any topic.

Update: I updated this article with reviews on GPT-4.
Why ChatGPT-3.5 Isn't Smart enough, but GPT-4 is

You can't trust ChatGPT-3.5 for preparation on any certification or exam. It's a Big NO if you think you can refer ChatGPT-3.5 for answering questions in a telephonic interview round. Yes I know it's a cheating if you even use Google for the same but wanted to give a WARNING as many people do this and many social media influencers posted on how to leverage ChatGPT-3.5 for cracking

Quansight Labs 2022-12-05 00:00:00

Conda on Colaboratory

Surbhi Sharma shares her exciting experience working as an intern at Quansight Labs and contributing to condacolab, a tool that lets you deploy a Miniconda installation easily on Google Colab notebooks. This enables you to use conda or mamba to install new packages on any Colab session.
Spyder Blog 2022-11-30 00:00:00

Improvements to the Spyder IDE installation experience

Juan Sebastian Bautista, C.A.M. Gerlach and Carlos Cordoba also contributed to this post.

Spyder 5.4.0 was released recently, featuring some major enhancements to its Windows and macOS standalone installers. You'll now get more detailed feedback when new versions are available, and you can download and start the update to them from right within Spyder, instead of having to install them manually. In this post, we'll go over how these new update features work and how you can start using them!

Before proceeding, we want to acknowledge that this work was made possible by a Small Development Grant awarded to Spyder by NumFOCUS, which has enabled us to hire a new developer (Juan Sebastian Bautista Rojas) to be in charge of all the implementation details.

Before these improvements, Spyder already had a mechanism to detect more recent versions, but that functionality was very simple. There was a pop-up dialog warning that a new version was available, but users had to

scikit-learn Blog 2022-11-30 00:00:00

Interview with Meekail Zain, scikit-learn Team Member

Author: Reshama Shaikh , Meekail zain
Quansight Labs 2022-11-28 00:00:00

Zoom zoom zoom! Improving Accessibility in JupyterLab

Kulsoom Zahra learns about accessibility and fixes a part of the JupyterLab interface (that used to break when zoomed in) during her summer 2022 internship at Quansight Labs.
Spyder Blog 2022-11-18 12:00:00

Introducing the Spyder-Watchlist plugin

Spyder's Variable Explorer is a great tool which aids the development and debugging of Python code by displaying all variables from the current scope. One thing the Variable Explorer is missing is the ability to display the value of arbitrary, user-definable expressions while debugging. For example, it might be useful to see the value of a specific attribute of an object, or the value of an array at some index. Such a feature is known as a "watchlist" or "watches" in other Integrated Development Environments (IDEs). This blog post introduces the Watchlist plugin developed for Spyder.


The watchlist consists of a user-definable list of expressions. They are evaluated after each debugger step, and the result of the evaluation is displayed as a string. This means that value = str(eval(expression)) is performed behind the scenes, and the result is shown in the plugin. The watchlist is a very powerful tool, but this comes at a cost: Any side effect of an expression will affect the execution environment.

Expressions can be

Filipe Saraiva's blog 2022-11-15 02:42:48

Por que abandonamos os blogs?

Interface de escrita do Twitter Estamos nesses dias assistindo o Elon Musk destruir o Twitter. Se espera que nessa dinâmica, ao longo do tempo, a rede social vá perdendo usuários e relevância – isso se não explodir de uma vez, pois seu novo dono fala até em falência. Não é a primeira vez que uma… Continue a ler »Por que abandonamos os blogs?
Quansight Labs 2022-11-15 00:00:00

Making pygments accessible

accessible-pygments hosts curated WCAG-compliant themes for all your syntax highlighting needs.
Quansight Labs 2022-11-15 00:00:00

The new Spyder Editor documentation under the spotlights!

In this blogpost, I share my experience as a Google Season of Docs 2022 technical writer working on updating the Editor user documentation.
Quansight Labs 2022-11-14 00:00:00

Close Encounter with pandas and the Jedis of open source

Learning from awesome mentors and contributing to pandas open source
Quansight Labs 2022-11-10 00:00:00

Quansight Labs awarded three CZI EOSS Cycle 5 Grants

We are delighted to share details about new grants to support the sustainability of SciPy, conda-forge, and CuPy
scikit-learn Blog 2022-11-08 00:00:00

Pandas DataFrame Output for sklearn Transformers

Author: Sangam SwadiK
Quansight Labs 2022-11-07 00:00:00

Developing a Typer CLI for Nebari

The Nebari CLI consists of various commands the user needs to run to initialize, deploy, configure, and update Nebari.
Keep the gradient flowing 2022-10-14 22:00:00

The Russian Roulette: An Unbiased Estimator of the Limit

The idea for what was later called Monte Carlo method occurred to me when I was playing solitaire during my illness.

Stanislaw Ulam, Adventures of a Mathematician

The Russian Roulette offers a simple way to construct an unbiased estimator for the limit of a sequence. It allows for example to …

scikit-learn Blog 2022-10-13 00:00:00

scikit-learn and Hugging Face join forces

Author: Lysandre Debut , François Goupil
scikit-learn Blog 2022-09-29 00:00:00

scikit-learn Sprint in Salta, Argentina

Author: Juan Martín Loyola
Martin Fitzpatrick - python 2022-09-21 09:00:00

Getting started with VS Code for Python — Setting up a Development Environment for Python programming

Setting up a working development environment is the first step for any project. Your development environment setup will determine how …

Keep the gradient flowing 2022-08-25 22:00:00

Notes on the Frank-Wolfe Algorithm, Part III: backtracking line-search

Backtracking step-size strategies (also known as adaptive step-size or approximate line-search) that set the step-size based on a sufficient decrease condition are the standard way to set the step-size on gradient descent and quasi-Newton methods. However, these techniques are much less common for Frank-Wolfe-like algorithms. In this blog post I …

Quansight Labs 2022-08-07 00:00:00

Introducing the 2022 Interns Cohort

Quansight Labs is delighted to welcome its second cohort of 6 interns, who will work on a variety of open source projects and tasks
Spyder Blog 2022-07-25 12:00:00

New 2022 roadmap and grant funding

For the last couple of months, the Spyder team has been working on defining a new roadmap and submitting grant proposals to fund more features and improvements. We are pleased to announce our roadmap for the rest of 2022, and that two proposals were funded!

The roadmap

Considering the importance of sharing a clear perspective of where the Spyder project is going and where we will be focusing our efforts over the coming months, the team has created an initial roadmap for the rest of 2022. We prioritized the highlighted features and enhancements based on input from issues, face-to-face and virtual discussions, Stack Overflow, social media and other feedback, to try to best capture the interests of our users and community.

The proposals

To help make our roadmap achievable, we wrote and submitted proposals to several different venues and organizations in the last couple of months. While we have yet to hear back from some of them, two have already been funded!

The first was for the

Quansight Labs 2022-07-13 00:00:00

SciPy 2022 Accessibility Awareness Programs

Announcing the SciPy 2022 Accessibility Awareness Efforts
ListenData 2022-07-11 16:05:00

Pollution in India : Real-time AQI Data

Air pollution has become a serious problem in recent years across the world. Effects of Air Pollution is devastating and its harmful effects are not just limited to Humans but also animals and plants as well. It also leads to global warming which is esentially increasing air and ocean temperatures around the world.

Indian cities have been topping the list of polluted cities. In order to solve the problem of air pollution the most important thing is to track air pollution on real-time basis first which alerts people to avoid outdoor activities during high air Pollution. This post explains how you can fetch real-time Air Quality Index (AQI) of Indian cities using Python and R code. It allows both Python and R programmers to pull pollution data.

You can download the dataset which contains static information about Indian states, cities and AQI stations. Variables stored in this dataset will be used further to fetch real-time data.

Gaël Varoquaux - programming 2022-07-09 22:00:00

My Mayavi story: discovering open source communities

The Mayavi Python software, and my personal history: A thread on Python and scipy ecosystems, building open source codebase, and meeting really cool and friendly people

I am writing today as a goodbye to the project: I used to be one of the core contributors and maintainers but have been …

ListenData 2022-06-30 14:04:00

Pointwise mutual information (PMI) in NLP

Natural Language Processing (NLP) has secured so much acceptance recently as there are many live projects running and now it's not just limited to academics only. Use cases of NLP can be seen across industries like understanding customers' issues, predicting the next word user is planning to type in the keyboard, automatic text summarization etc. Many researchers across the world trained NLP models in several human languages like English, Spanish, French, Mandarin etc so that benefit of NLP can be seen in every society. In this post we will talk about one of the most useful NLP metric called Pointwise mutual information (PMI) to identify words that can go together along with its implementation in Python and R.

Table of Contents

What is Pointwise mutual information?

PMI helps us to find related words. In other words, it explains how likely the co-occurrence of two words than we would expect by chance. For example the word "Data Science" has a specific meaning when these