Planet SciPy

Anaconda Blog 2022-05-18 16:30:00

Asian, American, SVP

When I reflect on my journey as an Asian American woman in corporate America, I can tease out a few things that I feel have contributed to my success. First of all, the aforementioned strong will and confidence have come in handy. I’ve always believed in myself, propelling myself into my career head-and-heart-first and working hard to achieve my goals. I’ve found that if you consistently believe you can get the job done and then execute on that belief, other people become confident in you, too. Plus, I’ve had amazing mentors of all genders and ethnicities who shared their knowledge and modeled success. I can’t overstate how important it is to connect with mentors who can serve as guides and invest in your professional development. 2022-05-18 07:29:03 Named to the 2022 CB Insights AI 100 List of Most Promising AI Startups

It’s been only a couple of weeks since I announced that we raised an $8M series A, and here I am with more good news. has been named to the 2022 CB Insights AI 100 List of Most Promising AI Startups. We’ve been recognized in the experiment tracking and version control category.  The CB […]

The post Named to the 2022 CB Insights AI 100 List of Most Promising AI Startups appeared first on

scikit-learn Blog 2022-05-18 00:00:00

The Value of Open Source Sprints, the scikit-learn Experience

Author: Reshama Shaikh

With contributions from: Gaël Varoquaux, Andreas Mueller, Olivier Grisel, Julien Jerphanion, Guillaume Lemaitre

Top Line Summary

Sprints are working sessions to contribute to an open source library. The goals and achievements differ between Developer and Community sprints. The long-term impact of open source sprints, particularly community events, is not easily quantifiable or measurable. Positive outcomes of sprints have slowly been emerging, and for that reason, to realize the value of open source sprints requires playing the “long game”.


The scikit-learn project has a long and extraordinary legacy of open source sprints. Since 2010, when its first public version was released, there have been as many as 45 sprints organized. The 45 number is a lower bound, since there are likely more sprints that have not been

(continued...) 2022-05-17 13:22:53

Deploying Computer Vision Models: Tools & Best Practices

Computer vision models have become insanely sophisticated with a wide variety of use cases enhancing business effectiveness, automating critical decision systems, and so on. But a promising model can turn out to be a costly liability if the model fails to perform as expected in production. Having said that, how we develop and deploy computer […]

The post Deploying Computer Vision Models: Tools & Best Practices appeared first on

scikit-learn Blog 2022-05-12 00:00:00

5 Years, 10 Sprints, A scikit-learn Open Source Journey

Author: Reshama Shaikh

Video About

We all use open source tools in various capacities, yet knowing how to contribute to open source is not as well known or accessible. The limited knowledge and education surrounding contributing to open source could be one explanation of the low participation rates by underrepresented persons in open source. Open source sprints are hands-on “workshops” or “hackathons” where contributors collaborate to resolve coding and documentation issues posted on a GitHub repository.

Reshama shares how she organized her first open source sprint in 2017, which was in-person and held in New York City. Over the next 5 years, she organized in-person sprints from San Francisco, USA to Nairobi, Kenya, as well as pivoting to online sprints due to the global pandemic. In this keynote, Reshama shares highlights, challenges and lessons learned

(continued...) 2022-05-10 12:56:42

5 Must-Do Error Analysis Before You Put Your Model in Production

The blossom of the deep learning era began in 2012 when Alex Krizhevsky created a convolutional neural network that boosted the accuracies in image classification by more than 10%. The drastic success was soon followed by other research domains and soon other businesses – both conglomerates and startups – hoped to apply this cutting-edge technology […]

The post 5 Must-Do Error Analysis Before You Put Your Model in Production appeared first on

Anaconda Blog 2022-05-06 21:50:00

New Release: Anaconda Distribution Now Supporting M1

2022.05 Anaconda Distribution
ListenData 2022-05-06 11:06:00

Only size-1 arrays can be converted to Python scalars

Numpy is one of the most used module in Python and it is used in a variety of tasks ranging from creating array to mathematical and statistical calculations. Numpy also bring efficiency in Python programming. While using numpy you may encounter this error TypeError: only size-1 arrays can be converted to Python scalars It is one of the frequently appearing error and sometimes it becomes a daunting challenge to solve it.
Meaning : Only Size 1 Arrays Can Be Converted To Python Scalars ErrorThis error generally appears when Python expects a single value but you passed an array which consists of multiple values. For example : you want to calculate exponential value of an array but the function for exponential value was designed for scalar variable (which means single value). When you pass numpy array in the function, it will return this error. This error handling is to prevent your code to process further and avoids unexpected output (continued...) 2022-05-05 10:44:12

Multi GPU Model Training: Monitoring and Optimizing

Do you struggle with monitoring and optimizing the training of Deep Neural Networks on multiple GPUs? If yes, you’re in the right place. In this article, we will discuss multi GPU training with Pytorch Lightning and find out the best practices that should be adopted to optimize the training process. We shall also see how […]

The post Multi GPU Model Training: Monitoring and Optimizing appeared first on

scikit-learn Blog 2022-05-04 00:00:00

Interview with Lucy Liu, scikit-learn Team Member

Lucy Liu joined the scikit-learn Team in September 2020. In this interview, learn more about Lucy’s journey through open source, from rstats to scikit-learn.

  1. Tell us about yourself.

    My name is Lucy, I grew up in New Zealand and I am culturally Chinese. I currently live in Australia and work for Quansight labs.

  2. How

Quansight Labs 2022-05-03 02:30:00

The evolution of the SciPy developer CLI

🤔 What is a command-line interface (CLI)?

Imagine a situation, where there is a massive system with various tools and functionalities, and every functionality requires a special command or an input from the user. A CLI is designed to tackle such situations. Like a catalog or menu, it lists all the options available, thus helping the user to navigate a complex system.

Now that we understand what a CLI is, how about we dive into the world of SciPy?

Read more… (5 min remaining to read)

Anaconda Blog 2022-04-30 16:00:00

New from Anaconda: Python in the Browser

PyScript wouldn't be here today without the help of some incredible people.
Anaconda Blog 2022-04-28 13:30:00

How Anaconda Is Rallying to Protect Commercial Users From Cybersecurity Threats

“You have the power, the capacity, and the responsibility to strengthen the cybersecurity and resilience of the critical services and technologies on which Americans rely. We need everyone to do their part to meet one of the defining threats of our time—your vigilance and urgency today can prevent or mitigate attacks tomorrow.” 2022-04-28 10:49:05

Experiment Tracking in Kubeflow Pipelines

Experiment tracking has been one of the most popular topics in the context of machine learning projects. It is difficult to imagine a new project being developed without tracking each experiment’s run history, parameters, and metrics. While some projects may use more “primitive” solutions like storing all the experiment metadata in spreadsheets, it is definitely […]

The post Experiment Tracking in Kubeflow Pipelines appeared first on

Living in an Ivory Basement 2022-04-21 22:00:00

Storing 64-bit unsigned integers in SQLite databases, for fun and profit

Storing unsigned longs in SQLite is possible, and can be fast.

Anaconda Blog 2022-04-21 13:10:00

Making Data, Models, and Analytics Awesome

About the Author Mark Skov Madsen, PhD, CFA, is a Lead Trading Analyst at Ørsted. His team develops data, models, and analytics for Ørsted’s Traders end to end. They work in an analytics environment based on Azure DevOps, Kubernetes, JupyterHub and Python. 2022-04-15 16:27:54

Reducing Pipeline Debt With Great Expectations

You are a part of a data science team at a product company. Your team has a number of machine learning models in place. Their outputs guide critical business decisions, as well as a couple of dashboards displaying important KPIs that are closely watched by your executives day and night.  On that fatal day, you […]

The post Reducing Pipeline Debt With Great Expectations appeared first on 2022-04-14 15:55:56

Building Machine Learning Pipelines: Common Pitfalls

In recent years, there have been rapid advancements in Machine Learning and this has led to many companies and startups delving into the field without understanding the pitfalls. Common examples are the pitfalls involved when building ML pipelines. Machine Learning pipelines are complex and there are several ways they can fail or be misused. Stakeholders […]

The post Building Machine Learning Pipelines: Common Pitfalls appeared first on

Anaconda Blog 2022-04-13 13:13:00

Introducing Anaconda Business: Enhanced Open-Source Security in the Cloud

We’re incredibly excited to announce the latest addition to Anaconda’s product line: Anaconda Business. 2022-04-13 05:00:00

We Raised $8M Series A to Continue Building Experiment Tracking and Model Registry That “Just Works”

When I came to the machine learning space from software engineering in 2016, I was surprised by the messy experimentation practices, lack of control over model building, and a missing ecosystem of tools to help people deliver models confidently.   It was a stark contrast from the software development ecosystem, where you have mature tools for […]

The post We Raised $8M Series A to Continue Building Experiment Tracking and Model Registry That “Just Works” appeared first on

Quansight Labs 2022-04-10 11:00:00

Why is writing blog posts hard?

We write code. We write issues. We write documentation. We write notes to ourselves, messages to each other, and guidelines to unite teams across projects.

Day in and out our remote work and open source lives are driven by written communication. But blog posts are one kind of writing that eludes our regular practice. In our weekly show and tell we got real about "why can writing blog posts be so hard?" and collaboratively wrote up this blog post about what we learned from the discussion.

Read more… (4 min remaining to read) 2022-04-07 15:17:19

Time Series Projects: Tools, Packages, and Libraries That Can Help

Since you are here, you probably know that time series data is a bit different than static ML data. So when working on time series projects, oftentimes, Data Scientists or ML Engineers use specific tools and libraries. Or they use commonly known tools that have proved to be well adjusted to time series projects. We […]

The post Time Series Projects: Tools, Packages, and Libraries That Can Help appeared first on

Anaconda Blog 2022-04-07 13:06:00

Why Power Efficiency Is Key to AI Innovation, and What It Means for Hardware and Software Makers

AI adoption is continuing to grow, with 56% of respondents in a 2021 survey reporting AI has been adopted within at least one business function, up from 50% the previous year. And in parallel, AI itself continues to progress rapidly, with ever-larger, ever-faster models with seemingly insatiable appetites for compute power and training data. For a long time, the focus has been about going bigger at all costs: more extensive data sets, bigger models, and more significant deployments. The constant demand for scale has been met with a combination of the traditional Moore’s Law increases in chip density, the transition to more specialized hardware (like GPUs), and the use of brute force in increasingly large compute clusters. 2022-04-05 15:40:09

Kedro Pipelines With Optuna: Running Hyperparameter Sweeps

Software engineering’s workflow management ecosystem is quite mature – Git for version control, Postman for API testing and many tools to make your life easier. In ML, we consistently experiment with code and data, contrary to software development where ‘experimentation’ is not so common. In addition, ML experiments can get messy quickly and often fail […]

The post Kedro Pipelines With Optuna: Running Hyperparameter Sweeps appeared first on

Quansight Labs 2022-03-31 23:59:02

Making GPUs accessible to the PyData Ecosystem via the Array API Standard.

GPUs have become an essential part of the scientific computing stack and with the advancement in the technology around GPUs and the ease of accessing a GPU in the cloud or on-prem, it is in the best interest of the PyData community to spend time and effort to make GPUs accessible for the users of PyData libraries. A typical user in the PyData ecosystem is quite familiar with the APIs of libraries like SciPy, scikit-learn, and scikit-image -- and at the moment these libraries are largely limited to single-threaded operations on CPU (there are exceptions to that, like linear algebra functions and scikit-learn functionality which uses OpenMP under the hood). In this blog post I will talk about how we can use the Python Array API Standard with the fundamental libraries in the PyData ecosystem along with CuPy for making GPUs accessible to the users of these libraries. With the introduction of that standard by the Consortium for Python Data API Standards and its adoption mechanism in NEP 47 it

Anaconda Blog 2022-03-30 17:21:00

Anaconda "Editions" Repositioned as Feature-Additive Enterprise Product Suite

Here at Anaconda, we’re constantly innovating—and our drive to improve the capabilities of our product is matched by our drive to improve the way we offer said product to our community. As such, we are pleased to announce that Anaconda has moved to a tier-based product model.
Anaconda Blog 2022-03-28 13:48:00

Why “Boomerang” Is Such a Common Term at Anaconda

“Boomerang” is a commonly used term at Anaconda. This is because Anaconda has a special—and growing—category of employees that left Anaconda, realized that the “grass wasn’t greener,” and returned to Anaconda in the same or a similar capacity as before. This group spans across departments, tenures, genders, geographies, and roles. We took some time to dig into the boomerang experience and highlight what our boomerangs have learned from leaving and returning “home.”
scikit-learn Blog 2022-03-28 00:00:00

Interview with Maren Westermann: Extending the Impact of the scikit-learn Sprints to the Community

In this interview, learn more about how Maren moved from being a Data Umbrella scikit-learn participant to a mentor, and then to organise open source workshops.

  1. How did you learn of the Data Umbrella scikit-learn sprints and what inspired you to attend?

I learned of the first Data Umbrella scikit-learn online sprint, which took place in June 2020, via Twitter. I was interested in contributing to open source and had already made one contribution to scikit-learn. However, when I started contributing to open source I didn’t have a network of like-minded people. I was very much looking forward to connecting with

Anaconda Blog 2022-03-24 13:32:00

Finding a Place in Open Source

Anaconda is amplifying the voices of some of its most active and cherished community members in a monthly blog series. If you’re a Maker who has been looking for a chance to tell your story, elaborate on a favorite project, educate your peers, and build your personal brand, consider submitting an abstract. For more details and to access a wealth of educational data science resources and discussion threads—including one about this blog post—visit Anaconda Nucleus.
scikit-learn Blog 2022-03-21 00:00:00

Behind the Scenes of Data Umbrella scikit-learn Open Source Sprints


Prior to 2020, most data sprints were held in person during intensive 8-hour-long days. Data Umbrella founder, Reshama Shaikh, for example, led several in-person sprints in New York (2017, 2018, 2019), Nairobi (2019) and San Francisco (2019). Data Umbrella had always been interested in developing online resources and exploring ways to enable virtual participation, but this was not able to become a priority until 2020 when the pandemic forced everything online including data sprints. It was clear that an 8-hour in-person event could not just switch to being an 8-hour online event. So the move to online data sprints required the team to

scikit-learn Blog 2022-03-12 00:00:00

Women in Machine Learning - A WiMLDS Paris sprint and contribution workshop

Did you know that, on a rough estimation, only 6% of open source contributors were women?! This is awfully low. The scikit-learn team really cares about improving its diversity, gender being one of our focus, we decided to partner with Women in Machine Learning and Data Science Paris (WiMLDS Paris) to help there. On March 12th, on a Saturday morning, we joined for our sprint at CybelAngel! It’s been a long time since we organized a face-to-face event, especially a sprint!

What is a scikit-learn sprint you may ask? The scikit-learn sprint is a hands-on “hackathon” where we work on issues in the scikit-learn GitHub repository and learn to contribute to open source. This sprint included an introductory and practical workshop about contribution to open source software.


Living in an Ivory Basement 2022-03-04 23:00:00

The First Common Fund Data Ecosystem Hackathon

We ran a successful pilot hackathon, and we will run a second one soon!

Quansight Labs 2022-02-28 10:00:00

Jupyter accessibility efforts have a roadmap!

Really? Tell me more.

The Chan Zuckerberg Initiative has funded efforts to make the Jupyter ecosystem, starting with JupyterLab, more accessible (As was announced in a prior Jupyter blog post about grants in the ecosystem). You can read the full grant proposal for Jupyter accessibility, the proposal summary, or a GitHub Project list of the grant's milestones to get a sense of the grant's scope.

Read more… (1 min remaining to read)

scikit-learn Blog 2022-02-19 00:00:00

Three Components for Reviewing a Pull Request

Author: Thomas J. Fan

A pull request is a method for submitting contributions to a software project. Maintainers or contributors review these pull requests to discuss the proposed changes and help ensure the changes meet the project guidelines and quality standards. In this talk, we will learn about three components for reviewing a pull request:

  1. The mechanics of code review on GitHub.
  2. The social aspects of code review and how to effectively give feedback.
  3. The technical aspects of reviewing a pull request.

The slides for this presentation are available.

  • 00:00:00 Reshama introduces Data Umbrella
  • 00:05:20 Thomas begins talk
  • 00:06:35 Terms: pull request, reviewer, contributor, merged
  • 00:07:12 PART 1: Mechanics of Code Review. Why code review?
  • 00:19:10 Browsing Code in GitHub
scikit-learn Blog 2022-02-08 00:00:00

Performances and scikit-learn

For more than 10 years, scikit-learn has been bringing machine learning and data science to the world. Since then, the library has aimed to deliver quality implementations to its users.

This series of blog post aims at explaining the on-going work of the scikit-learn developers to boost the performances of the library.

Read more online

scikit-learn Blog 2022-02-07 00:00:00

An Open Source Software Award for scikit-learn

We are pleased to announce that scikit-learn has received a prize for open-source scientific software from the French government. It is great recognition for all the community of contributors and users of a project born in France. Congratulations to the worldwide community for this great achievement!

A Community Award scikit-learn was awarded for its very active community with more than 500,000 users per month and 2,200 contributors. Scikit-learn prides itself on being able to showcase its best practices for community building, an essential element of successful open-source software and open science innovation. Congratulations to all the projects and the teams that received the open-source software and open science award today. This work is inspiring for all of us!

The Reaction of the Community

“I literally owe my career in the

Filipe Saraiva's blog 2022-02-06 14:31:39

Mestrado em Ciência da Computação 2022: Metaheurísticas

Estamos ainda com algumas vagas abertas para o Mestrado em Ciência da Computação na UFPA, Belém. Os interessados, favor olhar as instruções para submissão na página de seleção do programa. Desde meu ingresso no programa venho orientando alunos em diferentes pesquisas sobre inteligência computacional aplicados a problemas de smart grids. Já tivemos trabalhos sobre sistemas multiagentes… Continue a ler »Mestrado em Ciência da Computação 2022: Metaheurísticas
Martin Fitzpatrick - python 2022-01-26 11:00:00

DiffCast: Hands-free Python Screencast Creator — Create reproducible programming screencasts without typos or edits

Programming screencasts are a popular way to teach programming and demo tools. Typically people will open up their favorite editor and record themselves tapping away. But this has a few problems. A good setup for coding isn't necessarily a good setup for video -- with text too small, a window too …

scikit-learn Blog 2022-01-22 00:00:00

Interview with Chiara Marmo, Triage Team Member

Chiara Marmo joined the scikit-learn Triage Team in 2019. In this interview, learn more about Chiara’s passion in open source.

  1. Tell us about yourself.

    I’m Chiara, I’m Italian, from Biella, a small town in the North of Piedmont. I have a PhD in Astronomy. I am a Research Engineer and a French civil servant. I have worked in data processing and archiving for Astronomy, Earth and Planetary Sciences. Right now, I’m living in the US with my family.

Photo credit: Chiara Marmo
  1. How

Quansight Labs 2022-01-19 10:00:00

Conda and Grayskull, the Masters of Software Packaging

Python might be the most popular snake out there, but most of us have also heard of that other serpent: Conda. And some of us have wondered what it really is. In this post we’ll learn about Conda, software packages and package recipes. Most importantly we’ll learn about Grayskull — a conda recipe generator.

Read more… (6 min remaining to read)

Quansight Labs 2022-01-12 13:00:00

IPython 8.0, Lessons learned maintaining software

This is a companion post from the Official release of IPython 8.0, that describe what we learned with this large new major IPython release. We hope it will help you apply best practices, and have an easier time maintaining your projects, or helping other. We'll focus on many patterns that made it easier for us to make IPython 8.0 what it is with minimal time involved.

Read more… (8 min remaining to read) 2022-01-09 23:00:00

Optimization Nuggets: Implicit Bias of Gradient-based Methods

When an optimization problem has multiple global minima, different algorithms can find different solutions, a phenomenon often referred to as the implicit bias of optimization algorithms. In this post we'll characterize the implicit bias of gradient-based methods on a class of regression problems that includes linear least squares and Huber … 2021-12-14 23:00:00

Optimization Nuggets: Exponential Convergence of SGD

This is the first of a series of blog posts on short and beautiful proofs in optimization (let me know what you think in the comments!). For this first post in the series I'll show that stochastic gradient descent (SGD) converges exponentially fast to a neighborhood of the solution.

Quansight Labs 2021-12-10 06:00:00

A year of Jupyter community calls

A framing for open source is that the software and code are kernels of community. The code, and its abstractions, unite developers and their patrons; a struggle for growing/evolving open communities is to make sure these groups remain connected. A lot of us showed up for the code, but hung around for the community. We'll continue this post talking about the monthly Jupyter community calls, and how they help all jovyans, Project Jupyter's pet name for their developers and users, stay connected.

Read more… (2 min remaining to read)

Quansight Labs 2021-11-17 10:00:00

A vision for extensibility to GPU & distributed support for SciPy, scikit-learn, scikit-image and beyond

Over the years, array computing in Python has evolved to support distributed arrays, GPU arrays, and other various kinds of arrays that work with specialized hardware, or carry additional metadata, or use different internal memory representations. The foundational library for array computing in the PyData ecosystem is NumPy. But NumPy alone is a CPU-only library - and a single-threaded one at that - and in a world where it's possible to get a GPU or a CPU with a large core count in the cloud cheaply or even for free in a matter of seconds, that may not seem enough. For the past couple of years, a lot of thought and effort has been spent on devising mechanisms to tackle this problem, and evolve the ecosystem in a gradual way towards a state where PyData libraries can run on a GPU, as well as in distributed mode across multiple GPUs.

We feel like a shared vision has emerged, in bits and pieces. In this post, we aim to articulate that vision and

Quansight Labs 2021-11-03 17:23:40

NumPy Benchmarking

In this blog post, I'll be talking about my journey in Quansight. I want to share all things I was involved in and accomplished. What issues I faced, and most importantly, what were awesome life hacks I learned during this period.

First of all, I'd like to express my gratitude to the whole team for allowing me to be a part of such a great team. My work was majorly focused on providing performance benchmarks to NumPy in realistic situations. The target was to show the world that NumPy is efficient in handling quasi real-life situations too.

The primary technical outcome of my work is available in the numpy documentation.

Read more… (6 min remaining to read)

Gaël Varoquaux - programming 2021-10-28 22:00:00

Hiring an engineer and post-doc to simplify data science on dirty data


Join us to work on reinventing data-science practices and tools to produce robust analysis with less data curation.

It is well known that data cleaning and preparation are a heavy burden to the data scientist.

Dirty data research

In the dirty data project, we have been conducting machine-learning research …

Sparrow Computing 2021-10-22 21:27:39

TorchVision Datasets: Getting Started

The TorchVision datasets subpackage is a convenient utility for accessing well-known public image and video datasets. You can use these tools to start training new computer vision models very quickly. TorchVision Datasets Example To get started, all you have to do is import one of the Dataset classes. Then, instantiate it and access one of ... Read more

The post TorchVision Datasets: Getting Started appeared first on Sparrow Computing.

Sparrow Computing 2021-10-21 14:19:21

NumPy Any: Understanding np.any()

The np.any() function tests whether any element in a NumPy array evaluates to true: The input can have any shape and the data type does not have to be boolean (as long as it’s truthy). If none of the elements evaluate to true, the function returns false: Passing in a value for the axis argument ... Read more

The post NumPy Any: Understanding np.any() appeared first on Sparrow Computing.

Quansight Labs 2021-10-21 09:00:00

Dataframe interchange protocol: cuDF implementation

This is Ismaël Koné from Côte d'Ivoire (Ivory Coast). I am a fan of open source software. In the next lines, I'll try to capture my experience at Quansight Labs as an intern working on the cuDF implementation of the dataframe interchange protocol.

We'll continue by motivating this project through details about cuDF and the dataframe interchange protocol.

Read more… (9 min remaining to read)

Sparrow Computing 2021-10-07 20:52:03

PyTorch DataLoader Quick Start

PyTorch comes with powerful data loading capabilities out of the box. But with great power comes great responsibility and that makes data loading in PyTorch a fairly advanced topic. One of the best ways to learn advanced topics is to start with the happy path. Then add complexity when you find out you need it. ... Read more

The post PyTorch DataLoader Quick Start appeared first on Sparrow Computing.

Sparrow Computing 2021-10-06 16:53:32

How the NumPy append operation works

Understanding the np.append() operation and when you might want to use it.

The post How the NumPy append operation works appeared first on Sparrow Computing.

Gaël Varoquaux - programming 2021-09-13 22:00:00

Hiring someone to develop scikit-learn community and industry partners


With the growth of scikit-learn and the wider PyData ecosystem, we want to recruit in the Inria scikit-learn team for a new role. Departing from our usual focus on excellence in algorithms, statistics, or code, we want to add to the team someone with some technical understanding, but an …

Pierre de Buyl's homepage - scipy 2021-08-24 13:00:00

A paper on the Lees-Edwards method

A few years ago1, Sebastian contacted me to help with simulations. Great, I like simulation studies, so we start discussing the details. The idea: use an established method, the Lees-Edwards boundary condition, to study colloids under shear.

Living in an Ivory Basement 2021-07-19 22:00:00

A biotech career panel in the DIB Lab

Careers outside of universities!

Sparrow Computing 2021-07-08 16:09:47

Poetry for Package Management in Machine Learning Projects

When you’re building a production machine learning system, reproducibility is a proxy for the effectiveness of your development process. But without locking all your Python dependencies, your builds are not actually repeatable. If you work in a Python project without locking long enough, you will eventually get a broken build because of a transitive dependency ... Read more

The post Poetry for Package Management in Machine Learning Projects appeared first on Sparrow Computing.

Sparrow Computing 2021-06-29 20:38:29

Development containers in VS Code: a quick start guide

If you’re building production ML systems, dev containers are the killer feature of VS Code. Dev containers give you full VS Code functionality inside a Docker container. This lets you unify your dev and production environments if production is a Docker container. But even if you’re not targeting a Docker deployment, running your code in ... Read more

The post Development containers in VS Code: a quick start guide appeared first on Sparrow Computing.

Living in an Ivory Basement 2021-06-28 22:00:00

New sourmash databases are available!

Databases are now available for GTDB!

Filipe Saraiva's blog 2021-06-25 12:06:45

Colunando no O Estado do Piauí

O Estado do Piauí é um novo jornal que surgiu recentemente pelas bandas de lá. Com um foco maior em reportagens longas e densas, misturando jornalismo investigativo e literário, o projeto pretende discutir em profundidade os temas de interesse do estado, descobrir histórias piauienses únicas, repercutir situações problemáticas, apontar alternativas e muito mais. Não se… Continue a ler »Colunando no O Estado do Piauí
Filipe Saraiva's blog 2021-06-21 21:51:57

Ciclo de Entrevistas sobre as Pesquisas no PPGCC da UFPA – Inteligência Computacional

A Faculdade de Computação e o Programa de Pós-Graduação em Ciência da Computação da UFPA estão desenvolvendo um projeto que pretende atingir dois objetivos: o primeiro, fazer uma melhor divulgação para o público externo à universidade do que produzimos em nossas pesquisas; o segundo, uma melhor divulgação INTERNA da mesma coisa – o que desenvolvemos… Continue a ler »Ciclo de Entrevistas sobre as Pesquisas no PPGCC da UFPA – Inteligência Computacional
Living in an Ivory Basement 2021-06-07 22:00:00

Searching all public metagenomes with sourmash

Searching all the things!

Pierre de Buyl's homepage - scipy 2021-05-21 13:00:00

Is your software ready for the Journal of Open Source Software?

For the unaware reader, the Journal of Open Source Software (JOSS) is an open-access scientific journal founded in 2016 and aimed at publishing scientific software. A JOSS article in itself is short and its publication contributes to recognize the work on the software. I share here my point of view on what makes some software tools more ready to be published in JOSS. I do not comment on the size or the relevance for research which are both documented on JOSS' website.

Living in an Ivory Basement 2021-05-16 22:00:00

sourmash 4.1.0 released!!

sourmash v4.1.0 is here!

Sparrow Computing 2021-05-14 20:11:16

Basic Counting in Python

I love fancy machine learning algorithms as much as anyone. But sometimes, you just need to count things. And Python’s built-in data structures make this really easy. Let’s say we have a list of strings: With a list like this, you might care about a few different counts. What’s the count of all items? What’s ... Read more

The post Basic Counting in Python appeared first on Sparrow Computing.

Sparrow Computing 2021-05-13 18:11:11

How to Use the PyTorch Sigmoid Operation

The PyTorch sigmoid function is an element-wise operation that squishes any real number into a range between 0 and 1. This is a very common activation function to use as the last layer of binary classifiers (including logistic regression) because it lets you treat model predictions like probabilities that their outputs are true, i.e. p(y ... Read more

The post How to Use the PyTorch Sigmoid Operation appeared first on Sparrow Computing. 2021-04-12 22:00:00

On the Link Between Optimization and Polynomials, Part 4

While the most common accelerated methods like Polyak and Nesterov incorporate a momentum term, a little known fact is that simple gradient descent –no momentum– can achieve the same rate through only a well-chosen sequence of step-sizes. In this post we'll derive this method and through simulations discuss its practical …

NumFOCUS 2021-04-09 18:02:05

NumFOCUS Welcomes Tesco Technology to Corporate Sponsors

NumFOCUS is pleased to announce our new partnership with Tesco Technology. A long-time PyData event sponsor, Tesco Technology joined NumFOCUS as a Silver Corporate Sponsor in December 2020. “We are very excited to formalize our partnership with Tesco Technology,” said Leah Silen, NumFOCUS Executive Director. “Tesco Technology has partnered with NumFOCUS for the past several […]

The post NumFOCUS Welcomes Tesco Technology to Corporate Sponsors appeared first on NumFOCUS.

NumFOCUS 2021-04-08 21:14:55

Job Posting | Communications and Marketing Manager

Job Title: Communications and Marketing Manager Position Overview The primary role of the Communications & Marketing Manager is to manage the NumFOCUS brand by overseeing all outgoing communications between NumFOCUS and our stakeholders. You will serve the project communities by playing a key role in their event marketing management and assist with project promotional and […]

The post Job Posting | Communications and Marketing Manager appeared first on NumFOCUS.

Acoular 2021-04-01 05:00:00

Getting started with Acoular - Part 1

This is the first in a series of three blog posts about the basic use of Acoular. It explains some fundamental concepts and walks through a simple example. Acoular is a Python library that processes multichannel data (up to a few hundred channels) from acoustic measurements with a microphone array. The focus of the processing is on the construction of a map of acoustic sources. This is somewhat similar to taking an acoustic photograph of some sound sources.
Acoular 2021-04-01 05:00:00

Getting started with Acoular - Part 2

This is the second in a series of three blog posts about the basic use of Acoular. It assumes that you already have read the first post and continues by explaining some more concepts and additional methods. Acoular is a Python library that processes multichannel data (up to a few hundred channels) from acoustic measurements with a microphone array. The focus of the processing is on the construction of a map of acoustic sources. This is somewhat similar to taking an acoustic photograph of some sound sources.
Acoular 2021-04-01 05:00:00

Getting started with Acoular - Part 3

This is the third and final in a series of three blog posts about the basic use of Acoular. It assumes that you already have read the first two posts and continues by explaining additional concepts to be used with time domain methods. Acoular is a Python library that processes multichannel data (up to a few hundred channels) from acoustic measurements with a microphone array. The focus of the processing is on the construction of a map of acoustic sources. This is somewhat similar to taking an acoustic photograph of some sound sources. To continue, we do the same set up as in Part 1. However, as we are setting out to do some signal processing in time domain, we define only TimeSamples, MicGeom, RectGrid and SteeringVector objects but no PowerSpectra or BeamformerBase. import acoular ts = acoular.TimeSamples( name="three_sources.h5" ) mg = acoular.MicGeom( from_file="array_64.xml" ) rg = acoular.RectGrid( x_min=-0.2, x_max=0.2, y_min=-0.2, y_max=0.2, z=0.3, increment=0.01 ) st = acoular.SteeringVector( grid=rg, mics=mg (continued...)
Sparrow Computing 2021-03-22 23:54:00

PyTorch Tensor to NumPy Array and Back

You can easily convert a NumPy array to a PyTorch tensor and a PyTorch tensor to a NumPy array. This post explains how it works.

The post PyTorch Tensor to NumPy Array and Back appeared first on Sparrow Computing.

Sparrow Computing 2021-03-20 03:15:00

TorchVision Transforms: Image Preprocessing in PyTorch

TorchVision, a PyTorch computer vision package, has a great API for image pre-processing in its torchvision.transforms module. This post gives some basic usage examples, describes the API and shows you how to create and use custom image transforms.

The post TorchVision Transforms: Image Preprocessing in PyTorch appeared first on Sparrow Computing. 2021-03-01 23:00:00

On the Link Between Optimization and Polynomials, Part 3

I've seen things you people wouldn't believe.
Valleys sculpted by trigonometric functions.
Rates on fire off the shoulder of divergence.
Beams glitter in the dark near the Polyak gate.
All those landscapes will be lost in time, like tears in rain.
Time to halt.

A momentum optimizer *

While My MCMC Gently Samples 2021-02-23 15:00:00

Introducing PyMC Labs: Saving the World with Bayesian Modeling

After I left Quantopian in 2020, something interesting happened: various companies contacted me inquiring about consulting to help them with their PyMC3 models.

Usually, I don't hear how people are using PyMC3 -- they mostly show up on GitHub or Discourse when something isn't working right. So, hearing about all these …

Martin Fitzpatrick - python 2021-02-22 08:00:00

Using MicroPython and uploading libraries on Raspberry Pi Pico — Using rshell to upload custom code

MicroPython is an implementation of the Python 3 programming language, optimized to run microcontrollers. It's one of the options available for programming your Raspberry Pi Pico and a nice friendly way to get started with microcontrollers.

MicroPython can be installed easily on your Pico, by following the instructions on the …

NumFOCUS 2021-02-10 19:54:10

Job Posting | Events and Digital Marketing Coordinator

Job Title: Events and Digital Marketing Coordinator Position Overview The primary role of the Events and Digital Marketing Coordinator is to support and assist the Events Manager and the Community Communications and Marketing Manager to advance one of NumFOCUS’s primary missions of educating and building the community of users and developers of open source scientific […]

The post Job Posting | Events and Digital Marketing Coordinator appeared first on NumFOCUS.

Living in an Ivory Basement 2021-02-01 23:00:00

Transition your Python project to use pyproject.toml and setup.cfg! (An example.)

Updating old Python packages, in this year of the PSF 2021!

Martin Fitzpatrick - python 2021-01-28 14:00:00

Writing a SAM Coupé SCREEN$ Converter in Python — Interrupt optimizing image converter

The SAM Coupé was a British 8 bit home computer that was pitched as a successor to the ZX Spectrum, featuring improved graphics and sound and higher processor speed.

The SAM Coupé's high-color MODE4 could manage 256x192 resolution graphics, with 16 colors from a choice of 128. Each pixel can …

Living in an Ivory Basement 2021-01-24 23:00:00

A snakemake hack for checkpoints

snakemake checkpoints r awesome

Martin Fitzpatrick - python 2021-01-21 07:00:00

Squeezing Space Invaders onto the BBC micro:bit's 25 pixels — MicroPython retro game in just 25 pixels

How much game can you fit into 25 pixels? Quite a bit it turns out.

This is a mini clone of arcade classic Space Invaders for the BBC micro:bit microcomputer. Using the accelerometer and two buttons for input, to can beat off wave after wave of aliens that advance …

ListenData 2021-01-06 10:35:00

Run SAS in Python without Installation

In the past few years python has gained a huge popularity as a programming language in data science world. Many banks and pharma organisations have started using Python and some of them are in transition stage, migrating SAS syntax library to Python. Many big organisations have been using SAS since early 2000 and they developed a hundreds of SAS codes for various tasks ranging from data extraction to model building and validation. Hence it's a marathon task to migrate SAS code to any other programming language. Migration can only be done in phases so day to day tasks would not be hit by development and testing of python code. Since Python is open source it becomes difficult sometimes in terms of maintaining the existing code. Some SAS procedures are very robust and powerful in nature its alternative in Python is still not implemented, might be doable but not a straightforward way for average developer or analyst.

Do you wish

Filipe Saraiva's blog 2020-12-30 12:43:56

Disnatia X/Potências de X

Nenhuma equipe de heróis me é tão querida quanto X-Men. Lá pelo final dos anos 90 comecei a colecionar por alguns anos, mas em seguida veio o fatídico aumento de preço com as Super-Heróis Premium, o que me acabou desmotivando a comprar. De lá para cá, acompanho esporadicamente, lendo notícias sobre, comprando uma ou outra… Continue a ler »Disnatia X/Potências de X
ListenData 2020-12-21 14:50:00

Wish Christmas with Python and R

This post is dedicated to all the Python and R Programming Lovers...Flaunt your knowledge in your peer group with the following programs. As a data science professional, you want your wish to be special on eve of christmas. If you observe the code, you may also learn 1-2 tricks which you can use later in your daily tasks.

Method 1 : Run the following program and see what I mean

R Code

paste(rep(intToUtf8(acos(exp(0)/2)*180/pi+2^4+3*2),2), collapse = intToUtf8(0)),
LETTERS[5^(3-1)], intToUtf8(atan(1/sqrt(3))*180/pi+2),
sep = intToUtf8(0)

Python Code

import math
import datetime

(chr(int(math.acos(math.log(1))*180/math.pi-13)) \
+, 2, 1).strftime('%B')[1] \
+ 2 *, 2, 1).strftime('%B')[3] \
+, 2, 1).strftime('%B')[7] \
+ chr(int(math.atan(1/math.sqrt(3))*180/math.pi+2)) \
+, 10, 1).strftime('%B')[1] \
+ chr(int(math.acos(math.log(1))*180/math.pi-18)) \
+, 4, 1).strftime('%B')[2:4] \
+ chr(int(math.acos(math.exp(0)/2)*180/math.pi+2**4+3*2+1)) \
+ chr(int(math.acos(math.exp(0)/2)*180/math.pi+2**4+2*4)) \
+ chr(int(math.acos(math.log(1))*180/math.pi-13)) \
+ "{:c}".format(97) \
+ chr(int(math.atan(1/math.sqrt(3))*180/math.pi*3-7))).upper()
Method 2 : Audio Wish for Christmas

Turn on computer speakers before running the code.

R Code

christmas_file <- tempfile()
download.file("", christmas_file, mode = "wb")
(continued...) 2020-12-20 23:00:00

On the Link Between Optimization and Polynomials, Part 2

We can tighten the analysis of gradient descent with momentum through a cobination of Chebyshev polynomials of the first and second kind. Following this connection, we'll derive one of the most iconic methods in optimization: Polyak momentum.

ListenData 2020-12-19 15:59:00

How to use variable in a query in pandas

Suppose you want to reference a variable in a query in pandas package in Python. This seems to be a straightforward task but it becomes daunting sometimes. Let's discuss it with examples in the article below.

Let's create a sample dataframe having 3 columns and 4 rows. This dataframe is used for demonstration purpose.

import pandas as pd
df = pd.DataFrame({"col1" : range(1,5),
"col2" : ['A A','B B','A A','B B'],
"col3" : ['A A','A A','B B','B B']
Filter a value A A in column col2
In order to do reference of a variable in query, you need to use @.
NumFOCUS 2020-12-18 21:21:54

NumFOCUS hires Open Source Developer Advocate!

  NumFOCUS is pleased to announce that Arliss Collins has been hired as our organization’s first Open Source Developer Advocate. Founded in 2012, NumFOCUS has finally grown beyond just providing non-technical needs for our 40+ sponsored projects! As our first technical hire, Arliss will work to help understand our projects from a technical perspective and […]

The post NumFOCUS hires Open Source Developer Advocate! appeared first on NumFOCUS.

NumFOCUS 2020-12-11 19:37:25

A Pivotal Time in NumFOCUS’s Project Aimed DEI Efforts

NumFOCUS is pleased to announce the launch of our Contributor Diversification & Retention Research Project funded by a grant from the Gordon and Betty Moore Foundation.  “We were eager to support NumFOCUS’s diversity initiative because it aims to get to the heart of what is preventing greater participation in data science. We are hopeful that […]

The post A Pivotal Time in NumFOCUS’s Project Aimed DEI Efforts appeared first on NumFOCUS.

NumFOCUS 2020-11-23 14:44:42

Anaconda Announces Multi-Year Partnership with NumFOCUS

A key stakeholder in the open source scientific computing ecosystem has further formalized their long-standing partnership with NumFOCUS. Anaconda, the Austin, Texas-based software development and consulting company which provides global distribution of Python and R software packages, last month introduced their Anaconda Dividend Program. Through this initiative, Anaconda plans to direct a portion of their […]

The post Anaconda Announces Multi-Year Partnership with NumFOCUS appeared first on NumFOCUS.

Pierre de Buyl's homepage - scipy 2020-11-23 10:00:00

What's in a model

During the coronavirus epidemic, the belgian federal group of scientific experts came up regularly in the official communication of the government. How can scientists understand the spread of an epidemic? By using a model: a mathematical description of a phenomenon. By varying the parameters of the model, one can test …

NumFOCUS 2020-11-18 18:36:55

NumFOCUS Receives Support from Heising-Simons

NumFOCUS is grateful to announce that we received a grant award of $50,000 in October from the Heising-Simons Foundation. This generous grant funding will provide general support resources to NumFOCUS and will benefit all of our Sponsored and Affiliated Projects as well as our organization’s several programs and initiatives. “This grant award from Heising-Simons will […]

The post NumFOCUS Receives Support from Heising-Simons appeared first on NumFOCUS.

Filipe Saraiva's blog 2020-11-05 14:50:03

Bate-papo com Vivi Reis sobre tecnologia e política

Hoje à noite (5 de novembro) às 20h conversarei com Vivi Reis, candidata a vereadora pelo PSOL em Belém. No bate-papo vamos focar bastante sobre temas que entrelaçam tecnologia e política. Entre os pontos, teremos o Escritório de Dados, dados e políticas públicas, software livre na administração pública, conectividade em Belém, inclusão digital, aplicativos cidadãos,… Continue a ler »Bate-papo com Vivi Reis sobre tecnologia e política