Planet SciPy 2023-05-31 07:34:20

Building ML Platform in Retail and eCommerce

Getting machine learning to solve some of the hardest problems in an organization is great. And eCommerce companies have a ton of use cases where ML can help. The problem is, with more ML models and systems in production, you need to set up more infrastructure to reliably manage everything. And because of that, many…
ListenData 2023-05-26 09:38:00

Complete Guide to Massively Multilingual Speech (MMS) Model

In this article we have covered everything about the latest multilingual speech model from the basics of how it works to the step-by-step implementation of the model in Python.

Meta, the company that owns Facebook, released a new AI model called Massively Multilingual Speech (MMS) that can convert text to speech and speech to text in over 1,100 languages. It is available for free. It will not only help academicians and researchers across the world but also language preservationists or activists to document and preserve endangered languages to prevent their extinction.

MMS is trained on a large dataset of text and audio in over 1,100 languages. Another best part about the model is that it generates audio which sounds very natural, like human speech. It is also able to identify more than 4,000 spoken languages.

This post appeared first on ListenData 2023-05-17 13:20:24

How to Build ETL Data Pipeline in ML

From data processing to quick insights, robust pipelines are a must for any ML system. Often the Data Team, comprising Data and ML Engineers, needs to build this infrastructure, and this experience can be painful. However, efficient use of ETL pipelines in ML can help make their life much easier. This article explores the importance… 2023-05-10 13:56:49

How to Save Trained Model in Python

When working on real-world machine learning (ML) use cases, finding the best algorithm/model is not the end of your responsibilities. It is crucial to save, store, and package these models for their future use and deployment to production. These practices are needed for a number of reasons: To reiterate, while saving and storing ML models… 2023-05-09 14:21:55

How to Build an End-To-End ML Pipeline

One of the most prevalent complaints we hear from ML engineers in the community is how costly and error-prone it is to manually go through the ML workflow of building and deploying models. They run scripts manually to preprocess their training data, rerun the deployment scripts, manually tune their models, and spend their working hours…
Sparrow Computing 2023-04-26 09:00:00

The Importance of High-Quality Labeled Data

The key to unlocking the power of machine learning (ML) lies in having high-quality labeled data. In this email, we’ll explore the significance of labeled data, its impact on the performance of ML models, and how you can capitalize on this natural resource of the modern age to drive your ... Read more

The post The Importance of High-Quality Labeled Data appeared first on Sparrow Computing.

Sparrow Computing 2023-04-25 21:20:23

Predictive Maintenance at General Electric

As you think through the ways machine learning (ML) can be used to accelerate your business, it can be helpful to see how other companies have done it. Today, I want to share an example of how General Electric (GE) harnessed the power of ML to transform a large part ... Read more

The post Predictive Maintenance at General Electric appeared first on Sparrow Computing. 2023-04-20 09:09:45

Building and Deploying CV Models: Lessons Learned From Computer Vision Engineer

With over 3 years of experience in designing, building, and deploying computer vision (CV) models, I’ve realized people don’t focus enough on crucial aspects of building and deploying such complex systems. In this blog post, I’ll share my own experiences and the hard-won insights I’ve gained from designing, building, and deploying cutting-edge CV models across…
ListenData 2023-04-19 12:32:00

AutoGPT Explained: Everything You Need To Know

In this post we have covered AutoGPT in detail. By end of this tutorial, you will not only understand how it works but also will be able to run it on your system. Auto-GPT has gained a significant amount of popularity in the media. It has become one of the most talked-about topics across various social media platforms after ChatGPT. It has not only captured the attention of people in Artifical Intelligence community but also people from other background. Media outlets across countries covered it and reported how it can automate everything ranging from simple to complex tasks.

Table of Contents

What is AutoGPT?

AutoGPT is an experimental open-source project built on the latest ChatGPT model i.e GPT-4. It is not limited to ChatGPT as it can also do web search and try to find information from internet. When a client gives us a project with instructions on what to do. We, as analysts, perform tasks to fulfill the project requirements.

(continued...) 2023-04-17 16:11:10

How to Build an Experiment Tracking Tool [Learnings From Engineers Behind Neptune]

As an MLOps engineer on your team, you are often tasked with improving the workflow of your data scientists by adding capabilities to your ML platform or by building standalone tools for them to use.  Experiment tracking is one such capability. And since you are reading this article, the data scientists you support have probably…
ListenData 2023-04-09 08:58:00

Open Source GPT-4 Models Made Easy

In this post we will explain how Open Source GPT-4 Models work and how you can use them as an alternative to a commercial OpenAI GPT-4 solution. Everyday new open source large language models (LLMs) are emerging and the list gets bigger and bigger. We will cover these two models GPT-4 version of Alpaca and Vicuna. This tutorial includes the workings of the models, as well as their implementation with Python

Table of Contents

Vicuna Model Introduction : Vicuna Model

Vicuna was the first open-source model available publicly which is comparable to GPT-4 output. It was fine-tuned on Meta's LLaMA 13B model and conversations dataset collected from ShareGPT. ShareGPT is the website wherein people share their ChatGPT conversations with others.

Important Note : The Vicuna Model was primarily trained on the GPT-3.5 dataset because most of the conversations on ShareGPT during the model's development were based on GPT-3.5. But the model was evaluated based on
Living in an Ivory Basement 2023-04-06 22:00:00

snakemake for doing bioinformatics - inputs and outputs and more!

Slithering your way into bioinformatics with snakemake - inputs and outputs and more! 2023-04-05 09:05:22

ML Model Packaging [The Ultimate Guide]

Have you ever spent weeks or months building a machine learning model, only to later find out that deploying it into a production environment is complicated and time-consuming? Or have you struggled to manage multiple versions of a model and keep track of all the dependencies and configurations required for deployment? If you’re nodding your…
Sparrow Computing 2023-03-31 17:28:38

How to Label Data for Machine Learning

Machine learning has revolutionized the world of technology, playing a crucial role in various applications, from self-driving cars and facial recognition systems to language translation and sentiment analysis. The success of machine learning models largely depends on the quality and quantity of data they are trained on. In particular, labeled ... Read more

The post How to Label Data for Machine Learning appeared first on Sparrow Computing. 2023-03-31 08:02:17

Real-World MLOps Examples: End-To-End MLOps Pipeline for Visual Search at Brainly

In this second installment of the series “Real-world MLOps Examples,” Paweł Pęczek, Machine Learning Engineer at Brainly, will walk you through the end-to-end Machine Learning Operations (MLOps) process in the Visual Search team at Brainly. And because it takes more than technologies and processes to succeed with MLOps, he will also share details on:  Enjoy…
ListenData 2023-03-30 08:01:00

14 Open Source Alternatives to ChatGPT - Build Your Own Clone for Free

In this article we will explain how Open Source ChatGPT alternatives work and how you can run them to build your own ChatGPT clone for free. We will introduce you to fourteen powerful open source alternatives to ChatGPT, such as GPT4All, Dolly 2, Vicuna, Alpaca GPT-4. We have provided Python code for each of these models so you can run them with ease in Python. By the end of this article you will have a good understanding of these models and will be able to compare and use them according to your requirements.

ChatGPT is not open source. It has had two recent popular releases GPT-3.5 and GPT-4. GPT-4 has major improvements over GPT-3.5 and is more accurate in producing responses. ChatGPT does not allow you to view or modify the source code as it is not publicly available. Hence there is a need for the models which are open source and available for free. By using these open source

Sparrow Computing 2023-03-29 20:52:44

Understanding the Data Science Process for Entrepreneurs

As an entrepreneur looking to harness the power of machine learning (ML) in your business, understanding the data science process is crucial. This process can be broken down into three main steps: The goal is to move through these stages as quickly as possible so that you can gather feedback ... Read more

The post Understanding the Data Science Process for Entrepreneurs appeared first on Sparrow Computing. 2023-03-23 09:24:59

Deploying Large NLP Models: Infrastructure Cost Optimization

NLP models in commercial applications such as text generation systems have experienced great interest among the user. These models have achieved various groundbreaking results in many NLP tasks like question-answering, summarization, language translation, classification, paraphrasing, et cetera.  Models like for example ChatGPT, Gopher **(280B), GPT-3 (175B), Jurassic-1 (178B), and Megatron-Turing NLG (530B) are predominantly very… 2023-03-21 11:29:52

Building a Machine Learning Platform [Definitive Guide]

Moving across the typical machine learning lifecycle can be a nightmare. From gathering and processing data to building models through experiments, deploying the best ones, and managing them at scale for continuous value in production—it’s a lot.  As the number of ML-powered apps and services grows, it gets overwhelming for data scientists and ML engineers… 2023-03-20 14:07:01

Managing Dataset Versions in Long-Term ML Projects

Long-term ML project involves developing and sustaining applications or systems that leverage machine learning models, algorithms, and techniques. As a result of the life span of these apps and systems, the ML models associated require to be constantly updated, redeployed, and maintained, which means that they require proper dataset version management.  An example of a… 2023-03-15 18:38:39

How to Build a CI/CD MLOps Pipeline [Case Study]

Based on the McKinsey survey, 56% of orgs today are using machine learning in at least one business function. It’s clear that the need for efficient and effective MLOps and CI/CD practices is becoming increasingly vital.  This article is a real-life study of building a CI/CD MLOps pipeline. We’ll delve into the MLOps practices and strategies…
ListenData 2023-03-12 07:26:00

Complete Guide to Visual ChatGPT

In this post, we will talk about how to run Visual ChatGPT in Python with Google Colab. ChatGPT has garnered huge popularity recently due to its capability of human style response. As of now, it only provides responses in text format, which means it cannot process, generate or edit images. Microsoft recently released a solution for the same to handle images. Now you can ask ChatGPT to generate or edit the image for you.

Demo of Visual ChatGPT

In the image below, you can see the final output of Visual ChatGPT - how it looks like.

This post appeared first on ListenData
Living in an Ivory Basement 2023-03-02 23:00:00

snakemake for doing bioinformatics - using wildcards to generalize your rules

Slithering your way into bioinformatics with snakemake, wildcard version

Sparrow Computing 2023-03-02 16:32:08

Saving Utility Companies Years with Computer Vision

How do utility companies monitor thousands of miles of electrical wire to find small imperfections that threaten the entire system? For the entire history of electrical infrastructure, the only answer has been ‘very slowly.’ Now, Sparrow’s computer vision capabilities, combined with Fast Forward’s thermal imaging system, can accomplish what used ... Read more

The post Saving Utility Companies Years with Computer Vision appeared first on Sparrow Computing.

Living in an Ivory Basement 2023-01-22 23:00:00

snakemake for doing bioinformatics - a beginner's guide (part 2)

Slithering your way into bioinformatics with snakemake, round 2.

Living in an Ivory Basement 2023-01-13 23:00:00

snakemake for doing bioinformatics - a beginner's guide (part 1)

Slithering your way into bioinformatics with snakemake

Living in an Ivory Basement 2023-01-07 23:00:00

sourmash has a plugin interface!

Enabling plugins in sourmash, for less directed & more incoherent progress!

Filipe Saraiva's blog 2022-12-15 01:13:41

A obsolescência humana na novela

Passei o dia no trabalho brincando com o ChatGPT, a inteligência artificial para conversas. Travamos diálogos surreais e esdrúxulos: perguntei a ela como seria a América Latina caso tivesse sido colonizada pela Inglaterra e também qual a relação entre Senhor dos Anéis e Game of Thrones. Em outra, pedi que escrevesse um diálogo fictício entre… Continue a ler »A obsolescência humana na novela
Sparrow Computing 2022-12-14 17:55:08

Speed Trap

Overview This post is going to showcase the development of a vehicle speed detector using Sparrow Computing’s open-source libraries and PyTorch Lightning. The exciting news here is that we could make this speed detector for any traffic feed without prior knowledge about the site (no calibration required), or specialized imaging ... Read more

The post Speed Trap appeared first on Sparrow Computing.

ListenData 2022-12-09 08:31:00

ChatGPT-4 Is a Smart Analyst, Unlike GPT-3.5

ChatGPT has been trending on social media platforms. It has crossed one million users in just a week time. Those who haven't heard about ChatGPT, it's a large language model trained by OpenAI. In simple words, it's a chat bot which answers your questions and the responses it provides may sound human-like. It's an impressive machine learning solution. With the release of GPT-4 we can rely on it over Google search for learning on any topic.

Update: I updated this article with reviews on GPT-4.
Why ChatGPT-3.5 Isn't Smart enough, but GPT-4 is

You can't trust ChatGPT-3.5 for preparation on any certification or exam. It's a Big NO if you think you can refer ChatGPT-3.5 for answering questions in a telephonic interview round. Yes I know it's a cheating if you even use Google for the same but wanted to give a WARNING as many people do this and many social media influencers posted on how to leverage ChatGPT-3.5 for cracking

Spyder Blog 2022-11-30 00:00:00

Improvements to the Spyder IDE installation experience

Juan Sebastian Bautista, C.A.M. Gerlach and Carlos Cordoba also contributed to this post.

Spyder 5.4.0 was released recently, featuring some major enhancements to its Windows and macOS standalone installers. You'll now get more detailed feedback when new versions are available, and you can download and start the update to them from right within Spyder, instead of having to install them manually. In this post, we'll go over how these new update features work and how you can start using them!

Before proceeding, we want to acknowledge that this work was made possible by a Small Development Grant awarded to Spyder by NumFOCUS, which has enabled us to hire a new developer (Juan Sebastian Bautista Rojas) to be in charge of all the implementation details.

Before these improvements, Spyder already had a mechanism to detect more recent versions, but that functionality was very simple. There was a pop-up dialog warning that a new version was available, but users had to

scikit-learn Blog 2022-11-30 00:00:00

Interview with Meekail Zain, scikit-learn Team Member

Author: Reshama Shaikh , Meekail zain
Spyder Blog 2022-11-18 12:00:00

Introducing the Spyder-Watchlist plugin

Spyder's Variable Explorer is a great tool which aids the development and debugging of Python code by displaying all variables from the current scope. One thing the Variable Explorer is missing is the ability to display the value of arbitrary, user-definable expressions while debugging. For example, it might be useful to see the value of a specific attribute of an object, or the value of an array at some index. Such a feature is known as a "watchlist" or "watches" in other Integrated Development Environments (IDEs). This blog post introduces the Watchlist plugin developed for Spyder.


The watchlist consists of a user-definable list of expressions. They are evaluated after each debugger step, and the result of the evaluation is displayed as a string. This means that value = str(eval(expression)) is performed behind the scenes, and the result is shown in the plugin. The watchlist is a very powerful tool, but this comes at a cost: Any side effect of an expression will affect the execution environment.

Expressions can be

Filipe Saraiva's blog 2022-11-15 02:42:48

Por que abandonamos os blogs?

Interface de escrita do Twitter Estamos nesses dias assistindo o Elon Musk destruir o Twitter. Se espera que nessa dinâmica, ao longo do tempo, a rede social vá perdendo usuários e relevância – isso se não explodir de uma vez, pois seu novo dono fala até em falência. Não é a primeira vez que uma… Continue a ler »Por que abandonamos os blogs?
scikit-learn Blog 2022-11-08 00:00:00

Pandas DataFrame Output for sklearn Transformers

Author: Sangam SwadiK
Keep the gradient flowing 2022-10-14 22:00:00

The Russian Roulette: An Unbiased Estimator of the Limit

The idea for what was later called Monte Carlo method occurred to me when I was playing solitaire during my illness.

Stanislaw Ulam, Adventures of a Mathematician

The Russian Roulette offers a simple way to construct an unbiased estimator for the limit of a sequence. It allows for example to …

scikit-learn Blog 2022-10-13 00:00:00

scikit-learn and Hugging Face join forces

Author: Lysandre Debut , François Goupil
scikit-learn Blog 2022-09-29 00:00:00

scikit-learn Sprint in Salta, Argentina

Author: Juan Martín Loyola
Keep the gradient flowing 2022-08-25 22:00:00

Notes on the Frank-Wolfe Algorithm, Part III: backtracking line-search

Backtracking step-size strategies (also known as adaptive step-size or approximate line-search) that set the step-size based on a sufficient decrease condition are the standard way to set the step-size on gradient descent and quasi-Newton methods. However, these techniques are much less common for Frank-Wolfe-like algorithms. In this blog post I …

Spyder Blog 2022-07-25 12:00:00

New 2022 roadmap and grant funding

For the last couple of months, the Spyder team has been working on defining a new roadmap and submitting grant proposals to fund more features and improvements. We are pleased to announce our roadmap for the rest of 2022, and that two proposals were funded!

The roadmap

Considering the importance of sharing a clear perspective of where the Spyder project is going and where we will be focusing our efforts over the coming months, the team has created an initial roadmap for the rest of 2022. We prioritized the highlighted features and enhancements based on input from issues, face-to-face and virtual discussions, Stack Overflow, social media and other feedback, to try to best capture the interests of our users and community.

The proposals

To help make our roadmap achievable, we wrote and submitted proposals to several different venues and organizations in the last couple of months. While we have yet to hear back from some of them, two have already been funded!

The first was for the

ListenData 2022-07-11 16:05:00

Pollution in India : Real-time AQI Data

Air pollution has become a serious problem in recent years across the world. Effects of Air Pollution is devastating and its harmful effects are not just limited to Humans but also animals and plants as well. It also leads to global warming which is esentially increasing air and ocean temperatures around the world.

Indian cities have been topping the list of polluted cities. In order to solve the problem of air pollution the most important thing is to track air pollution on real-time basis first which alerts people to avoid outdoor activities during high air Pollution. This post explains how you can fetch real-time Air Quality Index (AQI) of Indian cities using Python and R code. It allows both Python and R programmers to pull pollution data.

You can download the dataset which contains static information about Indian states, cities and AQI stations. Variables stored in this dataset will be used further to fetch real-time data.

Gaël Varoquaux - programming 2022-07-09 22:00:00

My Mayavi story: discovering open source communities

The Mayavi Python software, and my personal history: A thread on Python and scipy ecosystems, building open source codebase, and meeting really cool and friendly people

I am writing today as a goodbye to the project: I used to be one of the core contributors and maintainers but have been …

ListenData 2022-06-30 14:04:00

Pointwise mutual information (PMI) in NLP

Natural Language Processing (NLP) has secured so much acceptance recently as there are many live projects running and now it's not just limited to academics only. Use cases of NLP can be seen across industries like understanding customers' issues, predicting the next word user is planning to type in the keyboard, automatic text summarization etc. Many researchers across the world trained NLP models in several human languages like English, Spanish, French, Mandarin etc so that benefit of NLP can be seen in every society. In this post we will talk about one of the most useful NLP metric called Pointwise mutual information (PMI) to identify words that can go together along with its implementation in Python and R.

Table of Contents

What is Pointwise mutual information?

PMI helps us to find related words. In other words, it explains how likely the co-occurrence of two words than we would expect by chance. For example the word "Data Science" has a specific meaning when these

Acoular 2022-06-24 05:00:00

How to import your data into Acoular

Acoular is a Python library that processes multichannel data (up to a few hundred channels) from acoustic measurements with a microphone array which is stored in an HDF5 file. This blog post explains how to convert data available in other formats into this file format. As examples for other file formats we will use both .csv (comma separated text files) and .mat (Matlab files).
Keep the gradient flowing 2022-05-26 22:00:00

On the Link Between Optimization and Polynomials, Part 5

Six: All of this has happened before.
Baltar: But the question remains, does all of this have to happen again?
Six: This time I bet no.
Baltar: You know, I've never known you to play the optimist. Why the change of heart?
Six: Mathematics. Law of averages. Let a complex …

scikit-learn Blog 2022-05-22 00:00:00

Interview with Norbert Preining, scikit-learn Team Member

Author: Reshama Shaikh , Norbert Preining
ListenData 2022-05-06 11:06:00

Only size-1 arrays can be converted to Python scalars

Numpy is one of the most used module in Python and it is used in a variety of tasks ranging from creating array to mathematical and statistical calculations. Numpy also bring efficiency in Python programming. While using numpy you may encounter this error TypeError: only size-1 arrays can be converted to Python scalars It is one of the frequently appearing error and sometimes it becomes a daunting challenge to solve it.
Meaning : Only Size 1 Arrays Can Be Converted To Python Scalars Error This error generally appears when Python expects a single value but you passed an array which consists of multiple values. For example : you want to calculate exponential value of an array but the function for exponential value was designed for scalar variable (which means single value). When you pass numpy array in the function, it will return this error. This error handling is to prevent your code to process further and avoids unexpected output from the (continued...)
scikit-learn Blog 2022-05-04 00:00:00

Interview with Lucy Liu, scikit-learn Team Member

Author: Reshama Shaikh , Lucy Liu
Living in an Ivory Basement 2022-04-21 22:00:00

Storing 64-bit unsigned integers in SQLite databases, for fun and profit

Storing unsigned longs in SQLite is possible, and can be fast.

scikit-learn Blog 2022-03-21 00:00:00

Behind the Scenes of Data Umbrella scikit-learn Open Source Sprints

Author: Reshama Shaikh , Angela Okune
Living in an Ivory Basement 2022-03-04 23:00:00

The First Common Fund Data Ecosystem Hackathon

We ran a successful pilot hackathon, and we will run a second one soon!

Filipe Saraiva's blog 2022-02-06 14:31:39

Mestrado em Ciência da Computação 2022: Metaheurísticas

Estamos ainda com algumas vagas abertas para o Mestrado em Ciência da Computação na UFPA, Belém. Os interessados, favor olhar as instruções para submissão na página de seleção do programa. Desde meu ingresso no programa venho orientando alunos em diferentes pesquisas sobre inteligência computacional aplicados a problemas de smart grids. Já tivemos trabalhos sobre sistemas multiagentes… Continue a ler »Mestrado em Ciência da Computação 2022: Metaheurísticas
Martin Fitzpatrick - python 2022-01-26 11:00:00

DiffCast: Hands-free Python Screencast Creator — Create reproducible programming screencasts without typos or edits

Programming screencasts are a popular way to teach programming and demo tools. Typically people will open up their favorite editor and record themselves tapping away. But this has a few problems. A good setup for coding isn't necessarily a good setup for video -- with text too small, a window too …

Keep the gradient flowing 2022-01-09 23:00:00

Optimization Nuggets: Implicit Bias of Gradient-based Methods

When an optimization problem has multiple global minima, different algorithms can find different solutions, a phenomenon often referred to as the implicit bias of optimization algorithms. In this post we'll characterize the implicit bias of gradient-based methods on a class of regression problems that includes linear least squares and Huber …

Keep the gradient flowing 2021-12-14 23:00:00

Optimization Nuggets: Exponential Convergence of SGD

This is the first of a series of blog posts on short and beautiful proofs in optimization (let me know what you think in the comments!). For this first post in the series I'll show that stochastic gradient descent (SGD) converges exponentially fast to a neighborhood of the solution.

Gaël Varoquaux - programming 2021-10-28 22:00:00

Hiring an engineer and post-doc to simplify data science on dirty data


Join us to work on reinventing data-science practices and tools to produce robust analysis with less data curation.

It is well known that data cleaning and preparation are a heavy burden to the data scientist.

Dirty data research

In the dirty data project, we have been conducting machine-learning research …

Sparrow Computing 2021-10-22 21:27:39

TorchVision Datasets: Getting Started

The TorchVision datasets subpackage is a convenient utility for accessing well-known public image and video datasets. You can use these tools to start training new computer vision models very quickly. TorchVision Datasets Example To get started, all you have to do is import one of the Dataset classes. Then, instantiate ... Read more

The post TorchVision Datasets: Getting Started appeared first on Sparrow Computing.

Sparrow Computing 2021-10-21 14:19:21

NumPy Any: Understanding np.any()

The np.any() function tests whether any element in a NumPy array evaluates to true: The input can have any shape and the data type does not have to be boolean (as long as it’s truthy). If none of the elements evaluate to true, the function returns false: Passing in a ... Read more

The post NumPy Any: Understanding np.any() appeared first on Sparrow Computing.

Sparrow Computing 2021-10-07 20:52:03

PyTorch DataLoader Quick Start

PyTorch comes with powerful data loading capabilities out of the box. But with great power comes great responsibility and that makes data loading in PyTorch a fairly advanced topic. One of the best ways to learn advanced topics is to start with the happy path. Then add complexity when you ... Read more

The post PyTorch DataLoader Quick Start appeared first on Sparrow Computing.

Sparrow Computing 2021-10-06 16:53:32

How the NumPy append operation works

Understanding the np.append() operation and when you might want to use it.

The post How the NumPy append operation works appeared first on Sparrow Computing.

Gaël Varoquaux - programming 2021-09-13 22:00:00

Hiring someone to develop scikit-learn community and industry partners


With the growth of scikit-learn and the wider PyData ecosystem, we want to recruit in the Inria scikit-learn team for a new role. Departing from our usual focus on excellence in algorithms, statistics, or code, we want to add to the team someone with some technical understanding, but an …

Pierre de Buyl's homepage - scipy 2021-08-24 13:00:00

A paper on the Lees-Edwards method

A few years ago1, Sebastian contacted me to help with simulations. Great, I like simulation studies, so we start discussing the details. The idea: use an established method, the Lees-Edwards boundary condition, to study colloids under shear.

Living in an Ivory Basement 2021-07-19 22:00:00

A biotech career panel in the DIB Lab

Careers outside of universities!

Living in an Ivory Basement 2021-06-28 22:00:00

New sourmash databases are available!

Databases are now available for GTDB!

Filipe Saraiva's blog 2021-06-25 12:06:45

Colunando no O Estado do Piauí

O Estado do Piauí é um novo jornal que surgiu recentemente pelas bandas de lá. Com um foco maior em reportagens longas e densas, misturando jornalismo investigativo e literário, o projeto pretende discutir em profundidade os temas de interesse do estado, descobrir histórias piauienses únicas, repercutir situações problemáticas, apontar alternativas e muito mais. Não se… Continue a ler »Colunando no O Estado do Piauí
Filipe Saraiva's blog 2021-06-21 21:51:57

Ciclo de Entrevistas sobre as Pesquisas no PPGCC da UFPA – Inteligência Computacional

A Faculdade de Computação e o Programa de Pós-Graduação em Ciência da Computação da UFPA estão desenvolvendo um projeto que pretende atingir dois objetivos: o primeiro, fazer uma melhor divulgação para o público externo à universidade do que produzimos em nossas pesquisas; o segundo, uma melhor divulgação INTERNA da mesma coisa – o que desenvolvemos… Continue a ler »Ciclo de Entrevistas sobre as Pesquisas no PPGCC da UFPA – Inteligência Computacional
Living in an Ivory Basement 2021-06-07 22:00:00

Searching all public metagenomes with sourmash

Searching all the things!

Pierre de Buyl's homepage - scipy 2021-05-21 13:00:00

Is your software ready for the Journal of Open Source Software?

For the unaware reader, the Journal of Open Source Software (JOSS) is an open-access scientific journal founded in 2016 and aimed at publishing scientific software. A JOSS article in itself is short and its publication contributes to recognize the work on the software. I share here my point of view on what makes some software tools more ready to be published in JOSS. I do not comment on the size or the relevance for research which are both documented on JOSS' website.

Living in an Ivory Basement 2021-05-16 22:00:00

sourmash 4.1.0 released!!

sourmash v4.1.0 is here!

Keep the gradient flowing 2021-04-12 22:00:00

On the Link Between Optimization and Polynomials, Part 4

While the most common accelerated methods like Polyak and Nesterov incorporate a momentum term, a little known fact is that simple gradient descent –no momentum– can achieve the same rate through only a well-chosen sequence of step-sizes. In this post we'll derive this method and through simulations discuss its practical …

NumFOCUS 2021-04-09 18:02:05

NumFOCUS Welcomes Tesco Technology to Corporate Sponsors

NumFOCUS is pleased to announce our new partnership with Tesco Technology. A long-time PyData event sponsor, Tesco Technology joined NumFOCUS as a Silver Corporate Sponsor in December 2020. “We are very excited to formalize our partnership with Tesco Technology,” said Leah Silen, NumFOCUS Executive Director. “Tesco Technology has partnered with NumFOCUS for the past several […]

The post NumFOCUS Welcomes Tesco Technology to Corporate Sponsors appeared first on NumFOCUS.

NumFOCUS 2021-04-08 21:14:55

Job Posting | Communications and Marketing Manager

Job Title: Communications and Marketing Manager Position Overview The primary role of the Communications & Marketing Manager is to manage the NumFOCUS brand by overseeing all outgoing communications between NumFOCUS and our stakeholders. You will serve the project communities by playing a key role in their event marketing management and assist with project promotional and […]

The post Job Posting | Communications and Marketing Manager appeared first on NumFOCUS.

Acoular 2021-04-01 05:00:00

Getting started with Acoular - Part 2

This is the second in a series of three blog posts about the basic use of Acoular. It assumes that you already have read the first post and continues by explaining some more concepts and additional methods. Acoular is a Python library that processes multichannel data (up to a few hundred channels) from acoustic measurements with a microphone array. The focus of the processing is on the construction of a map of acoustic sources. This is somewhat similar to taking an acoustic photograph of some sound sources.
Acoular 2021-04-01 05:00:00

Getting started with Acoular - Part 3

This is the third and final in a series of three blog posts about the basic use of Acoular. It assumes that you already have read the first two posts and continues by explaining additional concepts to be used with time domain methods. Acoular is a Python library that processes multichannel data (up to a few hundred channels) from acoustic measurements with a microphone array. The focus of the processing is on the construction of a map of acoustic sources. This is somewhat similar to taking an acoustic photograph of some sound sources. To continue, we do the same set up as in Part 1. However, as we are setting out to do some signal processing in time domain, we define only TimeSamples, MicGeom, RectGrid and SteeringVector objects but no PowerSpectra or BeamformerBase. import acoular ts = acoular.TimeSamples( name="three_sources.h5" ) mg = acoular.MicGeom( from_file="array_64.xml" ) rg = acoular.RectGrid( x_min=-0.2, x_max=0.2, y_min=-0.2, y_max=0.2, z=0.3, increment=0.01 ) st = acoular.SteeringVector( grid=rg, mics=mg (continued...)
Acoular 2021-04-01 05:00:00

Getting started with Acoular - Part 1

This is the first in a series of three blog posts about the basic use of Acoular. It explains some fundamental concepts and walks through a simple example. Acoular is a Python library that processes multichannel data (up to a few hundred channels) from acoustic measurements with a microphone array. The focus of the processing is on the construction of a map of acoustic sources. This is somewhat similar to taking an acoustic photograph of some sound sources.
Keep the gradient flowing 2021-03-01 23:00:00

On the Link Between Optimization and Polynomials, Part 3

I've seen things you people wouldn't believe.
Valleys sculpted by trigonometric functions.
Rates on fire off the shoulder of divergence.
Beams glitter in the dark near the Polyak gate.
All those landscapes will be lost in time, like tears in rain.
Time to halt.

A momentum optimizer *

While My MCMC Gently Samples 2021-02-23 15:00:00

Introducing PyMC Labs: Saving the World with Bayesian Modeling

After I left Quantopian in 2020, something interesting happened: various companies contacted me inquiring about consulting to help them with their PyMC3 models.

Usually, I don't hear how people are using PyMC3 -- they mostly show up on GitHub or Discourse when something isn't working right. So, hearing about all these …

Martin Fitzpatrick - python 2021-02-22 08:00:00

Using MicroPython and uploading libraries on Raspberry Pi Pico — Using rshell to upload custom code

MicroPython is an implementation of the Python 3 programming language, optimized to run microcontrollers. It's one of the options available for programming your Raspberry Pi Pico and a nice friendly way to get started with microcontrollers.

MicroPython can be installed easily on your Pico, by following the instructions on the …

NumFOCUS 2021-02-10 19:54:10

Job Posting | Events and Digital Marketing Coordinator

Job Title: Events and Digital Marketing Coordinator Position Overview The primary role of the Events and Digital Marketing Coordinator is to support and assist the Events Manager and the Community Communications and Marketing Manager to advance one of NumFOCUS’s primary missions of educating and building the community of users and developers of open source scientific […]

The post Job Posting | Events and Digital Marketing Coordinator appeared first on NumFOCUS.

Living in an Ivory Basement 2021-02-01 23:00:00

Transition your Python project to use pyproject.toml and setup.cfg! (An example.)

Updating old Python packages, in this year of the PSF 2021!

Martin Fitzpatrick - python 2021-01-28 14:00:00

Writing a SAM Coupé SCREEN$ Converter in Python — Interrupt optimizing image converter

The SAM Coupé was a British 8 bit home computer that was pitched as a successor to the ZX Spectrum, featuring improved graphics and sound and higher processor speed.

The SAM Coupé's high-color MODE4 could manage 256x192 resolution graphics, with 16 colors from a choice of 128. Each pixel can …

Living in an Ivory Basement 2021-01-24 23:00:00

A snakemake hack for checkpoints

snakemake checkpoints r awesome

Martin Fitzpatrick - python 2021-01-21 07:00:00

Squeezing Space Invaders onto the BBC micro:bit's 25 pixels — MicroPython retro game in just 25 pixels

How much game can you fit into 25 pixels? Quite a bit it turns out.

This is a mini clone of arcade classic Space Invaders for the BBC micro:bit microcomputer. Using the accelerometer and two buttons for input, to can beat off wave after wave of aliens that advance …