Planet SciPy
Array API Support in scikit-learn
In this blog post, we share how scikit-learn enabled support for the Array API Standard.scikit-learn 2023 In-person Developer Sprint in Paris, France
Author: Reshama Shaikh , François GoupilSoftware Engineering Patterns for Machine Learning
Have you ever talked to your Front-end or Back-end engineer peers and noticed how much they care about code quality? Writing legible, reusable, and efficient code has always been a challenge in the software development community. Endless conversations happen every day across Github pull requests and Slack threads around this topic. How to best adapt…ML Pipeline Architecture Design Patterns (With 10 Real-World Examples)
There comes a time when every ML practitioner realizes that training a model in Jupyter Notebook is just one small part of the entire project. Getting a workflow ready which takes your data from its raw form to predictions while maintaining responsiveness and flexibility is the real deal. At that point, the Data Scientists or…How to Run Windscribe VPN in Windows with Python
In this tutorial, we will show you how to run Windscribe VPN in Windows using Python Code. Windscribe is a popular VPN service that offers several features. Windscribe's free version maintains the same speed as the paid plans.
To read this article in full, please click hereHow to Run Proton VPN in Windows with Python
In this tutorial, we will show you how to run Proton VPN in Windows using Python Code.
First you need to download and install the OpenVPN GUI. OpenVPN GUI is a user-friendly application that allows you to easily configure and manage OpenVPN connections on your computer. OpenVPN is a popular open-source VPN protocol that provides secure and encrypted connections over public networks.
To read this article in full, please click hereOrganizing ML Monorepo With Pants
Have you ever copy-pasted chunks of utility code between projects, resulting in multiple versions of the same code living in different repositories? Or, perhaps, you had to make pull requests to tens of projects after the name of the GCP bucket in which you store your data was updated? Situations described above arise way too…Learnings From Building the ML Platform at Stitch Fix
This article was originally an episode of the ML Platform Podcast, a show where Piotr Niedźwiedź and Aurimas Griciūnas, together with ML platform professionals, discuss design choices, best practices, example tool stacks, and real-world learnings from some of the best ML platform professionals. In this episode, Stefan Krawczyk shares his learnings from building the ML…Mestrado em Ciência da Computação 2023.2 na UFPA: PLN e Metaheurísticas
Estamos com mais um processo seletivo para o Mestrado em Ciência da Computação na UFPA aberto, com entrada para agora em agosto de 2023. Dessa vez continuo procurando candidatos e candidatas que queiram desenvolver pesquisas na área de metaheurísticas, para quaisquer problemas combinatoriais que queiram aplicar. Esse ainda é um campo muito vasto e tenho… Continue a ler »Mestrado em Ciência da Computação 2023.2 na UFPA: PLN e MetaheurísticasDeploying Conversational AI Products to Production With Jason Flaks
This article was originally an episode of the MLOps Live, an interactive Q&A session where ML practitioners answer questions from other ML practitioners. Every episode is focused on one specific ML topic, and during this one, we talked to Jason Falks about deploying conversational AI products to production. You can watch it on YouTube: Or…How to Use ChatGPT for Data Science
In this article, we will explore how you, as a data scientist, can use ChatGPT to enhance your data science projects. ChatGPT is a powerful tool that can help you in various aspects of your work, from exploring and analyzing data to generating insights and helping you with coding and troubleshooting. It can also help you to learn data science faster.
To read this article in full, please click herePyCon US 2023 - An action-packed week
In this post I'm sharing my experience of traveling to the US for PyCon US 2023How to Use SHAP Values to Optimize and Debug ML Models
Picture this, you’ve dedicated countless hours to training and fine-tuning your model, meticulously analyzing mountains of data. Yet, you lack a clear understanding of the factors influencing its predictions and, as a result, find it hard to improve it further. If you have ever found yourself in such a situation, trying to make sense of…MLOps Landscape in 2023: Top Tools and Platforms
As you delve into the landscape of MLOps in 2023, you will find a plethora of tools and platforms that have gained traction and are shaping the way models are developed, deployed, and monitored. To provide you with a comprehensive overview, this article explores the key players in the MLOps and FMOps (or LLMOps) ecosystems,…Numba Dynamic Exceptions
In the following blogpost, we will explore the newly added feature in Numba: Dynamic exception support. We will discuss the previous limitations and explain how Numba was enhanced to handle runtime exceptions.How to build ChatGPT Clone in Python
In this article, we will see the steps involved in building a chat application and an answering bot in Python using the ChatGPT API and gradio.
Developing a chat application in Python provides more control and flexibility over the ChatGPT website. You can customize and extend the chat application as per your needs. It also help you to integrate with your existing systems and other APIs.
To read this article in full, please click hereOn the Convergence of the Unadjusted Langevin Algorithm
The Langevin algorithm is a simple and powerful method to sample from a probability distribution. It's a key ingredient of some machine learning methods such as diffusion models and differentially private learning. In this post, I'll derive a simple convergence analysis of this method in the special case when the …
Spyder gets CZI grant to add remote development features, and a new job opening!
During the last few years, Spyder has positioned itself as a popular data science IDE by combining interactive computing and ease of use with robust programming tools. However, limited remote development support compared to some other IDEs has hindered adoption, as many users would like to work with data and code on high performance computing (HPC) clusters or cloud providers like AWS, GCP or DigitalOcean while developing on their personal computers. Adding such features would open up many new research possibilities by enabling the scientific community to tackle data and compute-intensive programming tasks from the ease and efficiency of their local development environments. Thanks to a two-year grant from the Chan Zuckerberg Initiative, we will be now able to address this shortcoming.
Right now, users have two main options to work remotely using a local IDE (aside from a purely web browser-based approach, which is sometimes not available or desirable): They can either edit and execute their files in a terminal, which is not
(continued...)How to Build ML Model Training Pipeline
Hands up if you’ve ever lost hours untangling messy scripts or felt like you’re hunting a ghost while trying to fix that elusive bug, all while your models are taking forever to train. We’ve all been there, right? But now, picture a different scenario: Clean code. Streamlined workflows. Efficient model training. Too good to be…Transformers Agent: AI Tool That Automates Everything
We have a new AI tool in the market called Transformers Agent
which is so powerful that it can automate just about any task you can think of. It can generate and edit images, video, audio, answer questions about documents, convert speech to text and do a lot of other things.
Hugging Face, a well-known name in the open-source AI world, released Transformers Agent that provides a natural language API on top of transformers. The API is designed to be easy to use. With a single line code, it provides a variety of tools for performing natural language tasks, such as question answering, image generation, video generation, text to speech, text classification, and summarization.
To read this article in full, please click hereWhat Does GPT-3 Mean For the Future of MLOps? With David Hershey
This article was originally an episode of the MLOps Live, an interactive Q&A session where ML practitioners answer questions from other ML practitioners. Every episode is focused on one specific ML topic, and during this one, we talked to David Hershey about GPT-3 and the feature of MLOps. You can watch it on YouTube: Or…Building ML Platform in Retail and eCommerce
Getting machine learning to solve some of the hardest problems in an organization is great. And eCommerce companies have a ton of use cases where ML can help. The problem is, with more ML models and systems in production, you need to set up more infrastructure to reliably manage everything. And because of that, many…Complete Guide to Massively Multilingual Speech (MMS) Model
In this article we have covered everything about the latest multilingual speech model from the basics of how it works to the step-by-step implementation of the model in Python.
Meta, the company that owns Facebook, released a new AI model called Massively Multilingual Speech (MMS) that can convert text to speech and speech to text in over 1,100 languages. It is available for free. It will not only help academicians and researchers across the world but also language preservationists or activists to document and preserve endangered languages to prevent their extinction.
MMS is trained on a large dataset of text and audio in over 1,100 languages. Another best part about the model is that it generates audio which sounds very natural, like human speech. It is also able to identify more than 4,000 spoken languages.
How to Build ETL Data Pipeline in ML
From data processing to quick insights, robust pipelines are a must for any ML system. Often the Data Team, comprising Data and ML Engineers, needs to build this infrastructure, and this experience can be painful. However, efficient use of ETL pipelines in ML can help make their life much easier. This article explores the importance…How to Save Trained Model in Python
When working on real-world machine learning (ML) use cases, finding the best algorithm/model is not the end of your responsibilities. It is crucial to save, store, and package these models for their future use and deployment to production. These practices are needed for a number of reasons: To reiterate, while saving and storing ML models…PyQt6 Book now available in Korean: 파이썬과 Qt6로 GUI 애플리케이션 만들기 — The hands-on guide to creating GUI applications with Python gets a new translation
I am very happy to announce that my Python GUI programming book Create GUI Applications with Python & Qt6 / PyQt6 Edition …
AutoGPT : Everything You Need To Know
In this post we have covered AutoGPT in detail. By end of this tutorial, you will not only understand how it works but also will be able to run it on your system. Auto-GPT has gained a significant amount of popularity in the media. It has become one of the most talked-about topics across various social media platforms after ChatGPT
. It has not only captured the attention of people in Artifical Intelligence community but also people from other background. Media outlets across countries covered it and reported how it can automate everything ranging from simple to complex tasks.
AutoGPT is an experimental open-source project built on the latest ChatGPT model i.e GPT-4. It is not limited to ChatGPT as it can also do web search and try to find information from internet. When a client gives us a project with instructions on what to do. We, as analysts, perform tasks to fulfill the project requirements.
Open Source GPT-4 Models Made Easy
In this post we will explain how Open Source GPT-4 Models work and how you can use them as an alternative to a commercial OpenAI GPT-4 solution. Everyday new open source large language models (LLMs) are emerging and the list gets bigger and bigger. We will cover these two models GPT-4 version of Alpaca
and Vicuna
. This tutorial includes the workings of the models, as well as their implementation with Python
Vicuna was the first open-source model available publicly which is comparable to GPT-4 output. It was fine-tuned on Meta's LLaMA 13B model and conversations dataset collected from ShareGPT. ShareGPT is the website wherein people share their ChatGPT conversations with others.
Important Note : The Vicuna Model was primarily trained on the GPT-3.5 dataset because most of the conversations on ShareGPT during the model's development were based on GPT-3.5. But the model was evaluated based on
snakemake for doing bioinformatics - inputs and outputs and more!
Slithering your way into bioinformatics with snakemake - inputs and outputs and more!
14 Free and Open Source Alternatives to ChatGPT
In this article we will explain how Open Source ChatGPT alternatives work and how you can use them to build your own ChatGPT clone for free. We will introduce you to 14 powerful open source alternatives to ChatGPT, such as GPT4All, Dolly 2, Vicuna, Alpaca GPT-4. We have provided Python code for each of these models so you can run them with ease in Python. By the end of this article you will have a good understanding of these models and will be able to compare and use them according to your requirements.
ChatGPT is not open source. It has had two recent popular releases GPT-3.5 and GPT-4. GPT-4 has major improvements over GPT-3.5 and is more accurate in producing responses. ChatGPT does not allow you to view or modify the source code as it is not publicly available. Hence there is a need for the models which are open source and available for free. By using these open source
(continued...)Getting Started With Git and GitHub in Your Python Projects — Version-Controlling Your Python Projects With Git and GitHub
Using a version control system (VCS) is crucial for any software development project. These systems allow developers to track changes …
Complete Guide to Visual ChatGPT
In this post, we will talk about how to run Visual ChatGPT in Python with Google Colab. ChatGPT has garnered huge popularity recently due to its capability of human style response. As of now, it only provides responses in text format, which means it cannot process, generate or edit images. Microsoft recently released a solution for the same to handle images. Now you can ask ChatGPT to generate or edit the image for you.
In the image below, you can see the final output of Visual ChatGPT - how it looks like.
Working With Classes in Python — Understanding the Intricacies of Python Classes
Python supports object-oriented programming (OOP) through classes, which allow you to bundle data and behavior in a single entity. Python …
snakemake for doing bioinformatics - using wildcards to generalize your rules
Slithering your way into bioinformatics with snakemake, wildcard version
Quansight Labs Annual Report 2022: Celebrating Growth and Sustainability in Open Source
Presenting our first annual report! Read about our project achievements, community initiatives, and work culture.conda & mamba on shared clusters works better now!
conda is great!
A brief overview of automation and parallelization options in UNIX/on an HPC
Automating things! Parallelizing them!
snakemake for doing bioinformatics - a beginner's guide (part 2)
Slithering your way into bioinformatics with snakemake, round 2.
snakemake for doing bioinformatics - a beginner's guide (part 1)
Slithering your way into bioinformatics with snakemake
Python packaging & workflows - where to next?
Potential solutions for pain points when dealing with native code; what needs unifying in the Python packaging space, and how should that be approached?sourmash has a plugin interface!
Enabling plugins in sourmash, for less directed & more incoherent progress!
Reading "Orwell's Roses" by Rebecca Solnit
This is a good book!
A obsolescência humana na novela
Passei o dia no trabalho brincando com o ChatGPT, a inteligência artificial para conversas. Travamos diálogos surreais e esdrúxulos: perguntei a ela como seria a América Latina caso tivesse sido colonizada pela Inglaterra e também qual a relação entre Senhor dos Anéis e Game of Thrones. Em outra, pedi que escrevesse um diálogo fictício entre… Continue a ler »A obsolescência humana na novelaSangho's Internship at Quansight with PyTorch-Ignite project
Blogpost of working on the PyTorch-Ignite project during internship at QuansightChatGPT-4 Is a Smart Analyst, Unlike GPT-3.5
ChatGPT has been trending on social media platforms. It has crossed one million users in just a week time. Those who haven't heard about ChatGPT, it's a large language model trained by OpenAI. In simple words, it's a chat bot which answers your questions and the responses it provides may sound human-like. It's an impressive machine learning solution. With the release of GPT-4 we can rely on it over Google search for learning on any topic.
Update: I updated this article with reviews on GPT-4.You can't trust ChatGPT-3.5 for preparation on any certification or exam. It's a Big NO if you think you can refer ChatGPT-3.5 for answering questions in a telephonic interview round. Yes I know it's a cheating if you even use Google for the same but wanted to give a WARNING as many people do this and many social media influencers posted on how to leverage ChatGPT-3.5 for cracking
(continued...)Conda on Colaboratory
Surbhi Sharma shares her exciting experience working as an intern at Quansight Labs and contributing to condacolab, a tool that lets you deploy a Miniconda installation easily on Google Colab notebooks. This enables you to use conda or mamba to install new packages on any Colab session.Improvements to the Spyder IDE installation experience
Juan Sebastian Bautista, C.A.M. Gerlach and Carlos Cordoba also contributed to this post.
Spyder 5.4.0 was released recently, featuring some major enhancements to its Windows and macOS standalone installers. You'll now get more detailed feedback when new versions are available, and you can download and start the update to them from right within Spyder, instead of having to install them manually. In this post, we'll go over how these new update features work and how you can start using them!
Before proceeding, we want to acknowledge that this work was made possible by a Small Development Grant awarded to Spyder by NumFOCUS, which has enabled us to hire a new developer (Juan Sebastian Bautista Rojas) to be in charge of all the implementation details.
Before these improvements, Spyder already had a mechanism to detect more recent versions, but that functionality was very simple. There was a pop-up dialog warning that a new version was available, but users had to
(continued...)Interview with Meekail Zain, scikit-learn Team Member
Author: Reshama Shaikh , Meekail zainZoom zoom zoom! Improving Accessibility in JupyterLab
Kulsoom Zahra learns about accessibility and fixes a part of the JupyterLab interface (that used to break when zoomed in) during her summer 2022 internship at Quansight Labs.Introducing the Spyder-Watchlist plugin
Spyder's Variable Explorer is a great tool which aids the development and debugging of Python code by displaying all variables from the current scope. One thing the Variable Explorer is missing is the ability to display the value of arbitrary, user-definable expressions while debugging. For example, it might be useful to see the value of a specific attribute of an object, or the value of an array at some index. Such a feature is known as a "watchlist" or "watches" in other Integrated Development Environments (IDEs). This blog post introduces the Watchlist plugin developed for Spyder.
FeaturesThe watchlist consists of a user-definable list of expressions.
They are evaluated after each debugger step, and the result of the evaluation is displayed as a string.
This means that value = str(eval(expression))
is performed behind the scenes, and the result is shown in the plugin.
The watchlist is a very powerful tool, but this comes at a cost: Any side effect of an expression will affect the execution environment.
Expressions can be
(continued...)Por que abandonamos os blogs?
Interface de escrita do Twitter Estamos nesses dias assistindo o Elon Musk destruir o Twitter. Se espera que nessa dinâmica, ao longo do tempo, a rede social vá perdendo usuários e relevância – isso se não explodir de uma vez, pois seu novo dono fala até em falência. Não é a primeira vez que uma… Continue a ler »Por que abandonamos os blogs?Making pygments accessible
accessible-pygments hosts curated WCAG-compliant themes for all your syntax highlighting needs.The new Spyder Editor documentation under the spotlights!
In this blogpost, I share my experience as a Google Season of Docs 2022 technical writer working on updating the Editor user documentation.Close Encounter with pandas and the Jedis of open source
Learning from awesome mentors and contributing to pandas open sourceQuansight Labs awarded three CZI EOSS Cycle 5 Grants
We are delighted to share details about new grants to support the sustainability of SciPy, conda-forge, and CuPyPandas DataFrame Output for sklearn Transformers
Author: Sangam SwadiKDeveloping a Typer CLI for Nebari
The Nebari CLI consists of various commands the user needs to run to initialize, deploy, configure, and update Nebari.The Russian Roulette: An Unbiased Estimator of the Limit
The idea for what was later called Monte Carlo method occurred to me when I was playing solitaire during my illness.Stanislaw Ulam, Adventures of a Mathematician
The Russian Roulette offers a simple way to construct an unbiased estimator for the limit of a sequence. It allows for example to …
scikit-learn and Hugging Face join forces
Author: Lysandre Debut , François Goupilscikit-learn Sprint in Salta, Argentina
Author: Juan Martín LoyolaGetting started with VS Code for Python — Setting up a Development Environment for Python programming
Setting up a working development environment is the first step for any project. Your development environment setup will determine how …
So! You want to search all the public metagenomes with a genome sequence!
Searching all the things - faster!
Notes on the Frank-Wolfe Algorithm, Part III: backtracking line-search
Backtracking step-size strategies (also known as adaptive step-size or approximate line-search) that set the step-size based on a sufficient decrease condition are the standard way to set the step-size on gradient descent and quasi-Newton methods. However, these techniques are much less common for Frank-Wolfe-like algorithms. In this blog post I …
Introducing the 2022 Interns Cohort
Quansight Labs is delighted to welcome its second cohort of 6 interns, who will work on a variety of open source projects and tasksNew 2022 roadmap and grant funding
For the last couple of months, the Spyder team has been working on defining a new roadmap and submitting grant proposals to fund more features and improvements. We are pleased to announce our roadmap for the rest of 2022, and that two proposals were funded!
The roadmapConsidering the importance of sharing a clear perspective of where the Spyder project is going and where we will be focusing our efforts over the coming months, the team has created an initial roadmap for the rest of 2022. We prioritized the highlighted features and enhancements based on input from issues, face-to-face and virtual discussions, Stack Overflow, social media and other feedback, to try to best capture the interests of our users and community.
The proposalsTo help make our roadmap achievable, we wrote and submitted proposals to several different venues and organizations in the last couple of months. While we have yet to hear back from some of them, two have already been funded!
The first was for the
(continued...)SciPy 2022 Accessibility Awareness Programs
Announcing the SciPy 2022 Accessibility Awareness EffortsThe Value of Open Source Sprints, the scikit-learn Experience
Author: Reshama ShaikhPollution in India : Real-time AQI Data
Air pollution has become a serious problem in recent years across the world. Effects of Air Pollution is devastating and its harmful effects are not just limited to Humans but also animals and plants as well. It also leads to global warming which is esentially increasing air and ocean temperatures around the world.
Indian cities have been topping the list of polluted cities. In order to solve the problem of air pollution the most important thing is to track air pollution on real-time basis first which alerts people to avoid outdoor activities during high air Pollution. This post explains how you can fetch real-time Air Quality Index (AQI) of Indian cities using Python and R code. It allows both Python and R programmers to pull pollution data.
You can download the dataset which contains static information about Indian states, cities and AQI stations. Variables stored in this dataset will be used further to fetch real-time data.
(continued...)
My Mayavi story: discovering open source communities
The Mayavi Python software, and my personal history: A thread on Python and scipy ecosystems, building open source codebase, and meeting really cool and friendly people
I am writing today as a goodbye to the project: I used to be one of the core contributors and maintainers but have been …
Pointwise mutual information (PMI) in NLP
Natural Language Processing (NLP) has secured so much acceptance recently as there are many live projects running and now it's not just limited to academics only. Use cases of NLP can be seen across industries like understanding customers' issues, predicting the next word user is planning to type in the keyboard, automatic text summarization etc. Many researchers across the world trained NLP models in several human languages like English, Spanish, French, Mandarin etc so that benefit of NLP can be seen in every society. In this post we will talk about one of the most useful NLP metric called Pointwise mutual information (PMI) to identify words that can go together along with its implementation in Python and R.
PMI helps us to find related words. In other words, it explains how likely the co-occurrence of two words than we would expect by chance. For example the word "Data Science" has a specific meaning when these
How to import your data into Acoular
Acoular is a Python library that processes multichannel data (up to a few hundred channels) from acoustic measurements with a microphone array which is stored in an HDF5 file. This blog post explains how to convert data available in other formats into this file format. As examples for other file formats we will use both .csv (comma separated text files) and .mat (Matlab files).Checking for accessibility: thoughts and a checklist!
A non-exhaustive but totally honest checklist for accessibility reviewOn the Link Between Optimization and Polynomials, Part 5
Six: All of this has happened before.
Baltar: But the question remains, does all of this have to happen again?
Six: This time I bet no.
Baltar: You know, I've never known you to play the optimist. Why the change of heart?
Six: Mathematics. Law of averages. Let a complex …
Announcing ribbity - a hacky project to build Web sites from GitHub issue trackers
Munging GitHub issue trackers for fun!
Interview with Norbert Preining, scikit-learn Team Member
Author: Reshama Shaikh , Norbert PreiningPyQt6, PySide6, PyQt5 and PySide2 Books -- updated for 2022! — New editions extended and updated, now 780+ pages
Hello! Today I have released new digital editions of my PyQt5, PyQt6, PySide2 and PySide6 book Create GUI Applications with …
5 Years, 10 Sprints, A scikit-learn Open Source Journey
Author: Reshama ShaikhOnly size-1 arrays can be converted to Python scalars
Numpy is one of the most used module in Python and it is used in a variety of tasks ranging from creating array to mathematical and statistical calculations. Numpy also bring efficiency in Python programming. While using numpy you may encounter this errorTypeError: only size-1 arrays can be converted to Python scalars
It is one of the frequently appearing error and sometimes it becomes a daunting challenge to solve it.
Meaning : Only Size 1 Arrays Can Be Converted To Python Scalars Error
This error generally appears when Python expects a single value but you passed an array which consists of multiple values.
For example : you want to calculate exponential value of an array but the function for exponential value was designed for scalar variable (which means single value). When you pass numpy array in the function, it will return this error. This error handling is to prevent your code to process further and avoids unexpected output from the (continued...)
Interview with Lucy Liu, scikit-learn Team Member
Author: Reshama Shaikh , Lucy LiuThe evolution of the SciPy developer CLI
The development story of a developer command-line interface (CLI) for the SciPy project, with exmaplesThe second Common Fund Data Ecosystem hackathon - May 9-13, 2022!
We're running another hackathon!
Storing 64-bit unsigned integers in SQLite databases, for fun and profit
Storing unsigned longs in SQLite is possible, and can be fast.
Why is writing blog posts hard?
In our weekly show and tell we got real about "why can writing blog posts be so hard?" and collaboratively wrote up this blog post about what we learned from the discussion.Making GPUs accessible to the PyData Ecosystem via the Array API Standard.
How we can use the Python Array API Standard with the fundamental libraries in the PyData ecosystem along with CuPy for making GPUs accessible to the users of these librariesInterview with Maren Westermann: Extending the Impact of the scikit-learn Sprints to the Community
Author: Reshama Shaikh , Maren WestermannThe First Common Fund Data Ecosystem Hackathon
We ran a successful pilot hackathon, and we will run a second one soon!
Jupyter accessibility efforts have a roadmap!
The Chan Zuckerberg Initiative has funded efforts to make the Jupyter ecosystem, starting with JupyterLab, more accessible. As a part of these increased efforts, the team will be providing a periodically updated list of what is currently being worked on and what is coming soon.Mestrado em Ciência da Computação 2022: Metaheurísticas
Estamos ainda com algumas vagas abertas para o Mestrado em Ciência da Computação na UFPA, Belém. Os interessados, favor olhar as instruções para submissão na página de seleção do programa. Desde meu ingresso no programa venho orientando alunos em diferentes pesquisas sobre inteligência computacional aplicados a problemas de smart grids. Já tivemos trabalhos sobre sistemas multiagentes… Continue a ler »Mestrado em Ciência da Computação 2022: MetaheurísticasDiffCast: Hands-free Python Screencast Creator — Create reproducible programming screencasts without typos or edits
Programming screencasts are a popular way to teach programming and demo tools. Typically people will open up their favorite editor …
Conda and Grayskull, the Masters of Software Packaging
Grayskull is an automatic conda recipe generator, with a focus on conda-forge.On minimum metagenome covers, and calculating them for your own data.
You, too, can run our software!
IPython 8.0, Lessons learned maintaining software
This is a companion post from the Official release of IPython 8.0. We hope it will help you apply best practices, and have an easier time maintaining your projects, or helping other.Optimization Nuggets: Implicit Bias of Gradient-based Methods
When an optimization problem has multiple global minima, different algorithms can find different solutions, a phenomenon often referred to as the implicit bias of optimization algorithms. In this post we'll characterize the implicit bias of gradient-based methods on a class of regression problems that includes linear least squares and Huber …
Optimization Nuggets: Exponential Convergence of SGD
This is the first of a series of blog posts on short and beautiful proofs in optimization (let me know what you think in the comments!). For this first post in the series I'll show that stochastic gradient descent (SGD) converges exponentially fast to a neighborhood of the solution.
A year of Jupyter community calls
A lot of us showed up for the code, but hung around for the community. We'll continue this post talking about the monthly Jupyter community calls, and how they help all jovyans, Project Jupyter's pet name for their developers and users, stay connected.A vision for extensibility to GPU & distributed support for SciPy, scikit-learn, scikit-image and beyond
In this post, we aim to articulate that vision and suggest a path to making it concrete, focusing on three libraries at the core of the PyData ecosystem: SciPy, scikit-learn and scikit-image.A bioinformatics training career panel in the DIB Lab
Careers in training!
NumPy Benchmarking
My work was majorly focused on providing performance benchmarks to NumPy in realistic situations. The target was to show the world that NumPy is efficient in handling quasi real-life situations too.Hiring an engineer and post-doc to simplify data science on dirty data
Note
Join us to work on reinventing data-science practices and tools to produce robust analysis with less data curation.
It is well known that data cleaning and preparation are a heavy burden to the data scientist.
In the dirty data project, we have been conducting machine-learning research …