Planet SciPy
How to Label Data for Machine Learning
Machine learning has revolutionized the world of technology, playing a crucial role in various applications, from self-driving cars and facial recognition systems to language translation and sentiment analysis. The success of machine learning models largely depends on the quality and quantity of data they are trained on. In particular, labeled ... Read more
The post How to Label Data for Machine Learning appeared first on Sparrow Computing.
Real-World MLOps Examples: End-To-End MLOps Pipeline for Visual Search at Brainly
In this second installment of the series “Real-world MLOps Examples,” Paweł Pęczek, Machine Learning Engineer at Brainly, will walk you through the end-to-end Machine Learning Operations (MLOps) process in the Visual Search team at Brainly. And because it takes more than technologies and processes to succeed with MLOps, he will also share details on: Enjoy…Open Source ChatGPT Models: A Step-by-Step Guide
In this article we will explain how Open Source ChatGPT Models works and how you can run them. We will cover two different open source models, namely Alpaca and GPT4All. By the end of this article you should have good understanding of these models and you should be able to run them in Python. Since these models are open source, they are available for free and you don't need to use the paid OpenAI API to access them.
A team of researchers from Stanford University developed an open-source language model called Alpaca
. It is based on Meta's large-scale language model LLaMA
. The team used OpenAI's GPT API (text-davinci-003) to fine tune the LLaMA 7 billion (7B) parameters sized model. The goal of the team is to make AI available for everyone for free so that academicians can do further research without worrying about expensive hardwares to execute these memory-intensive algorithms. Although these open
Understanding the Data Science Process for Entrepreneurs
As an entrepreneur looking to harness the power of machine learning (ML) in your business, understanding the data science process is crucial. This process can be broken down into three main steps: The goal is to move through these stages as quickly as possible so that you can gather feedback ... Read more
The post Understanding the Data Science Process for Entrepreneurs appeared first on Sparrow Computing.
Anaconda’s Q1 2023 Open-Source Roundup
About the Author Martin Durant is a former astrophysicist with several years of scientific research experience. He has also worked in medical imaging, building AI/ML pipelines and a research platform. After a brief stint as a data scientist in ad-tech, Martin moved to Anaconda to work on PyData education. He now leads a number of open-source PyData projects, focussing on data access, formats, and parallel processing.5 Capabilities Your AI Platform Should Have
AI technology has come a long way in recent years, and it’s become an invaluable tool for businesses looking to stay ahead of the competition. But there’s no such thing as a one-size-fits-all AI platform. Every organization has its own objectives and requirements and its own deployment strategies, so it’s important to select an AI platform that meets your needs.Deploying Large NLP Models: Infrastructure Cost Optimization
NLP models in commercial applications such as text generation systems have experienced great interest among the user. These models have achieved various groundbreaking results in many NLP tasks like question-answering, summarization, language translation, classification, paraphrasing, et cetera. Models like for example ChatGPT, Gopher **(280B), GPT-3 (175B), Jurassic-1 (178B), and Megatron-Turing NLG (530B) are predominantly very…Anaconda’s Response to the CircleCI Security Breach
You may have heard about the recent security breach that affected CircleCI, a container-based continuous integration service that conda-forge uses to build packages in Linux and sometimes, OSX packages.New Release: Anaconda Distribution 2023.03
We are pleased to announce the release of Anaconda Distribution 2023.03! Find the relevant release notes here, and download the installer here.Building a Machine Learning Platform [Definitive Guide]
Moving across the typical machine learning lifecycle can be a nightmare. From gathering and processing data to building models through experiments, deploying the best ones, and managing them at scale for continuous value in production—it’s a lot. As the number of ML-powered apps and services grows, it gets overwhelming for data scientists and ML engineers…Managing Dataset Versions in Long-Term ML Projects
Long-term ML project involves developing and sustaining applications or systems that leverage machine learning models, algorithms, and techniques. As a result of the life span of these apps and systems, the ML models associated require to be constantly updated, redeployed, and maintained, which means that they require proper dataset version management. An example of a…How to Build a CI/CD MLOps Pipeline [Case Study]
Based on the McKinsey survey, 56% of orgs today are using machine learning in at least one business function. It’s clear that the need for efficient and effective MLOps and CI/CD practices is becoming increasingly vital. This article is a real-life study of building a CI/CD MLOps pipeline. We’ll delve into the MLOps practices and strategies…Comparing Tools For Data Processing Pipelines
If you will ask data professionals about what is the most challenging part of their day to day work, you will likely discover their concerns around managing different aspects of data before they get to graduate to the data modeling stage. Data professionals spend most of their time managing data in various forms – be…How Did We Get to ML Model Reproducibility
When working on real-world ML projects, you come face-to-face with a series of obstacles. The ml model reproducibility problem is one of them. This article is going to take you through an experience-based, step-by-step approach to solve the ml reproducibility challenge taken by my ML team working on a fraud detection system for the insurance…Complete Guide to Visual ChatGPT
In this post, we will talk about how to run Visual ChatGPT in Python with Google Colab. ChatGPT has garnered huge popularity recently due to its capability of human style response. As of now, it only provides responses in text format, which means it cannot process, generate or edit images. Microsoft recently released a solution for the same to handle images. Now you can ask ChatGPT to generate or edit the image for you.
In the image below, you can see the final output of Visual ChatGPT - how it looks like.
ChatGPT: AI-Assisted Writing Pros, Cons, and Tips to 3x Content Production
Prompt I gave ChatGPT and Copymatic: Write a blog article with tips for enterprise teams that want to add AI writers to their content production processes. Include the pros and cons of using AI writers for content generation, then provide 5 good recommendations for how teams can use AI writers in the content strategy and copywriting processes.snakemake for doing bioinformatics - using wildcards to generalize your rules
Slithering your way into bioinformatics with snakemake, wildcard version
ChatGPT: Is Adding AI Writers To Your Content Production Process Worth It?
And this is only the beginning, as the GPT-3 model already has new capabilities and is considered to be in version GPT-3.5 today. Already, search-engine marketers are buzzing about GPT-4, OpenAI’s highly-anticipated next version of the language model, expected later this year.Saving Utility Companies Years with Computer Vision
How do utility companies monitor thousands of miles of electrical wire to find small imperfections that threaten the entire system? For the entire history of electrical infrastructure, the only answer has been ‘very slowly.’ Now, Sparrow’s computer vision capabilities, combined with Fast Forward’s thermal imaging system, can accomplish what used ... Read more
The post Saving Utility Companies Years with Computer Vision appeared first on Sparrow Computing.
Distributed Training: Errors to Avoid
In this era of large language models (LLMs), monolithic foundation models, and increasingly enormous datasets, distributed training is a must, as both data and model weights very rarely fit on a single machine. However, distributed training in ML is complex and error-prone, with many hidden pitfalls that can cause huge issues in the model training…Managing Computer Vision Projects with Michał Tadeusiak
This article was originally an episode of the MLOps Live, an interactive Q&A session where ML practitioners answer questions from other ML practitioners. Every episode is focused on one specific ML topic, and during this one, we talked to Michal Tadeusiak about managing computer vision projects. You can watch it on YouTube: Or listen to…Upcoming Release(s): Anaconda Distribution 2023.03 and Beyond
Anaconda Distribution 2023.03 Installer We are pleased to announce the upcoming release of the Anaconda Distribution 2023.03 installer, scheduled for March 2023. The Anaconda Distribution 2023.03 installer comes with support for Python 3.10 and an updated Anaconda Navigator 2.3.2.Four Open-Source Projects and the Anaconda Maintainers Who Love Them
To share the love, we are offering one month of free access to Anaconda Starter tier with code "LOVEDATA." For a limited time, this code will grant you full access to these features:conda & mamba on shared clusters works better now!
conda is great!
Code in the Cloud With Anaconda—for Free!
Last Fall, we introduced Anaconda's fully-loaded and ready-to-code cloud notebook as part of Anaconda’s paid subscription plans. Today, we are thrilled to deliver on our mission to empower people with data literacy and announce the free availability of Anaconda’s cloud notebook. Now, truly anyone can break into the world of data science and start coding immediately.Training Models on Streaming Data [Practical Guide]
What comes into your mind when you hear Streaming Data? May be data generated through video streaming platforms like YouTube, but this is not the only thing which qualifies as streaming data. There are many platforms and sources that generate this kind of data. In this article: What is streaming data? “Streaming data is a…Take Our OSS Security Survey!
OSS Sparks and Accelerates Innovation Open-source software (OSS) reflects a comprehensive and quickly evolving ecosystem of innovators who collaborate on a global scale. OSS offers individuals and organizations flexibility, control, and a cost-effective way to harness the power of this community. As such, usage of OSS has become extensive; in fact, a 2022 report by Synopsys reveals that 97% of audited codebases use OSS, with OSS comprising 78% of the code in said codebases. OSS is one of the main drivers contributing to the rise and widespread adoption of machine learning and artificial intelligence. The ubiquitousness of OSS is reflected in everything from searching the web to ordering a product on a smartphone.A brief overview of automation and parallelization options in UNIX/on an HPC
Automating things! Parallelizing them!
Building a Sentiment Classification System With BERT Embeddings: Lessons Learned
Sentiment analysis, commonly referred to as opinion mining/sentiment classification, is the technique of identifying and extracting subjective information from source materials using computational linguistics, text analysis, and natural language processing. It is frequently used to assess a speaker or writer’s perspective on a subject or the overall contextual polarity of a piece of writing. The…MLOps Is an Extension of DevOps. Not a Fork — My Thoughts on THE MLOPS Paper as an MLOps Startup CEO
By now, everyone must have seen THE MLOps paper. “Machine Learning Operations (MLOps): Overview, Definition, and Architecture” By Dominik Kreuzberger, Niklas Kühl, Sebastian Hirschl Great stuff. If you haven’t read it yet, definitely do so. The authors give a solid overview of: They tackle the ugly problem in the canonical MLOps movement: How do all…snakemake for doing bioinformatics - a beginner's guide (part 2)
Slithering your way into bioinformatics with snakemake, round 2.
snakemake for doing bioinformatics - a beginner's guide (part 1)
Slithering your way into bioinformatics with snakemake
sourmash has a plugin interface!
Enabling plugins in sourmash, for less directed & more incoherent progress!
Reading "Orwell's Roses" by Rebecca Solnit
This is a good book!
A obsolescência humana na novela
Passei o dia no trabalho brincando com o ChatGPT, a inteligência artificial para conversas. Travamos diálogos surreais e esdrúxulos: perguntei a ela como seria a América Latina caso tivesse sido colonizada pela Inglaterra e também qual a relação entre Senhor dos Anéis e Game of Thrones. Em outra, pedi que escrevesse um diálogo fictício entre… Continue a ler »A obsolescência humana na novelaSpeed Trap
Overview This post is going to showcase the development of a vehicle speed detector using Sparrow Computing’s open-source libraries and PyTorch Lightning. The exciting news here is that we could make this speed detector for any traffic feed without prior knowledge about the site (no calibration required), or specialized imaging ... Read more
The post Speed Trap appeared first on Sparrow Computing.
ChatGPT-4 Is a Smart Analyst, Unlike GPT-3.5
ChatGPT has been trending on social media platforms. It has crossed one million users in just a week time. Those who haven't heard about ChatGPT, it's a large language model trained by OpenAI. In simple words, it's a chat bot which answers your questions and the responses it provides may sound human-like. It's an impressive machine learning solution. With the release of GPT-4 we can rely on it over Google search for learning on any topic.
Update: I updated this article with reviews on GPT-4.You can't trust ChatGPT-3.5 for preparation on any certification or exam. It's a Big NO if you think you can refer ChatGPT-3.5 for answering questions in a telephonic interview round. Yes I know it's a cheating if you even use Google for the same but wanted to give a WARNING as many people do this and many social media influencers posted on how to leverage ChatGPT-3.5 for cracking
(continued...)Improvements to the Spyder IDE installation experience
Juan Sebastian Bautista, C.A.M. Gerlach and Carlos Cordoba also contributed to this post.
Spyder 5.4.0 was released recently, featuring some major enhancements to its Windows and macOS standalone installers. You'll now get more detailed feedback when new versions are available, and you can download and start the update to them from right within Spyder, instead of having to install them manually. In this post, we'll go over how these new update features work and how you can start using them!
Before proceeding, we want to acknowledge that this work was made possible by a Small Development Grant awarded to Spyder by NumFOCUS, which has enabled us to hire a new developer (Juan Sebastian Bautista Rojas) to be in charge of all the implementation details.
Before these improvements, Spyder already had a mechanism to detect more recent versions, but that functionality was very simple. There was a pop-up dialog warning that a new version was available, but users had to
(continued...)Interview with Meekail Zain, scikit-learn Team Member
Author: Reshama Shaikh , Meekail zainIntroducing the Spyder-Watchlist plugin
Spyder's Variable Explorer is a great tool which aids the development and debugging of Python code by displaying all variables from the current scope. One thing the Variable Explorer is missing is the ability to display the value of arbitrary, user-definable expressions while debugging. For example, it might be useful to see the value of a specific attribute of an object, or the value of an array at some index. Such a feature is known as a "watchlist" or "watches" in other Integrated Development Environments (IDEs). This blog post introduces the Watchlist plugin developed for Spyder.
FeaturesThe watchlist consists of a user-definable list of expressions.
They are evaluated after each debugger step, and the result of the evaluation is displayed as a string.
This means that value = str(eval(expression))
is performed behind the scenes, and the result is shown in the plugin.
The watchlist is a very powerful tool, but this comes at a cost: Any side effect of an expression will affect the execution environment.
Expressions can be
(continued...)Por que abandonamos os blogs?
Interface de escrita do Twitter Estamos nesses dias assistindo o Elon Musk destruir o Twitter. Se espera que nessa dinâmica, ao longo do tempo, a rede social vá perdendo usuários e relevância – isso se não explodir de uma vez, pois seu novo dono fala até em falência. Não é a primeira vez que uma… Continue a ler »Por que abandonamos os blogs?Pandas DataFrame Output for sklearn Transformers
Author: Sangam SwadiKThe Russian Roulette: An Unbiased Estimator of the Limit
The idea for what was later called Monte Carlo method occurred to me when I was playing solitaire during my illness.Stanislaw Ulam, Adventures of a Mathematician
The Russian Roulette offers a simple way to construct an unbiased estimator for the limit of a sequence. It allows for example to …
scikit-learn and Hugging Face join forces
Author: Lysandre Debut , François GoupilSo! You want to search all the public metagenomes with a genome sequence!
Searching all the things - faster!
Notes on the Frank-Wolfe Algorithm, Part III: backtracking line-search
Backtracking step-size strategies (also known as adaptive step-size or approximate line-search) that set the step-size based on a sufficient decrease condition are the standard way to set the step-size on gradient descent and quasi-Newton methods. However, these techniques are much less common for Frank-Wolfe-like algorithms. In this blog post I …
scikit-learn Sprint in Salta, Argentina
Author: Juan Martín LoyolaNew 2022 roadmap and grant funding
For the last couple of months, the Spyder team has been working on defining a new roadmap and submitting grant proposals to fund more features and improvements. We are pleased to announce our roadmap for the rest of 2022, and that two proposals were funded!
The roadmapConsidering the importance of sharing a clear perspective of where the Spyder project is going and where we will be focusing our efforts over the coming months, the team has created an initial roadmap for the rest of 2022. We prioritized the highlighted features and enhancements based on input from issues, face-to-face and virtual discussions, Stack Overflow, social media and other feedback, to try to best capture the interests of our users and community.
The proposalsTo help make our roadmap achievable, we wrote and submitted proposals to several different venues and organizations in the last couple of months. While we have yet to hear back from some of them, two have already been funded!
The first was for the
(continued...)Pollution in India : Real-time AQI Data
Air pollution has become a serious problem in recent years across the world. Effects of Air Pollution is devastating and its harmful effects are not just limited to Humans but also animals and plants as well. It also leads to global warming which is esentially increasing air and ocean temperatures around the world.
Indian cities have been topping the list of polluted cities. In order to solve the problem of air pollution the most important thing is to track air pollution on real-time basis first which alerts people to avoid outdoor activities during high air Pollution. This post explains how you can fetch real-time Air Quality Index (AQI) of Indian cities using Python and R code. It allows both Python and R programmers to pull pollution data.
You can download the dataset which contains static information about Indian states, cities and AQI stations. Variables stored in this dataset will be used further to fetch real-time data.
(continued...)
My Mayavi story: discovering open source communities
The Mayavi Python software, and my personal history: A thread on Python and scipy ecosystems, building open source codebase, and meeting really cool and friendly people
I am writing today as a goodbye to the project: I used to be one of the core contributors and maintainers but have been …
Pointwise mutual information (PMI) in NLP
Natural Language Processing (NLP) has secured so much acceptance recently as there are many live projects running and now it's not just limited to academics only. Use cases of NLP can be seen across industries like understanding customers' issues, predicting the next word user is planning to type in the keyboard, automatic text summarization etc. Many researchers across the world trained NLP models in several human languages like English, Spanish, French, Mandarin etc so that benefit of NLP can be seen in every society. In this post we will talk about one of the most useful NLP metric called Pointwise mutual information (PMI) to identify words that can go together along with its implementation in Python and R.
PMI helps us to find related words. In other words, it explains how likely the co-occurrence of two words than we would expect by chance. For example the word "Data Science" has a specific meaning when these
How to import your data into Acoular
Acoular is a Python library that processes multichannel data (up to a few hundred channels) from acoustic measurements with a microphone array which is stored in an HDF5 file. This blog post explains how to convert data available in other formats into this file format. As examples for other file formats we will use both .csv (comma separated text files) and .mat (Matlab files).Checking for accessibility: thoughts and a checklist!
On the Link Between Optimization and Polynomials, Part 5
Six: All of this has happened before.
Baltar: But the question remains, does all of this have to happen again?
Six: This time I bet no.
Baltar: You know, I've never known you to play the optimist. Why the change of heart?
Six: Mathematics. Law of averages. Let a complex …
Announcing ribbity - a hacky project to build Web sites from GitHub issue trackers
Munging GitHub issue trackers for fun!
Interview with Norbert Preining, scikit-learn Team Member
Author: Reshama Shaikh , Norbert PreiningThe Value of Open Source Sprints, the scikit-learn Experience
Author: Reshama Shaikh5 Years, 10 Sprints, A scikit-learn Open Source Journey
Author: Reshama ShaikhOnly size-1 arrays can be converted to Python scalars
Numpy is one of the most used module in Python and it is used in a variety of tasks ranging from creating array to mathematical and statistical calculations. Numpy also bring efficiency in Python programming. While using numpy you may encounter this errorTypeError: only size-1 arrays can be converted to Python scalars
It is one of the frequently appearing error and sometimes it becomes a daunting challenge to solve it.
Meaning : Only Size 1 Arrays Can Be Converted To Python Scalars Error
This error generally appears when Python expects a single value but you passed an array which consists of multiple values.
For example : you want to calculate exponential value of an array but the function for exponential value was designed for scalar variable (which means single value). When you pass numpy array in the function, it will return this error. This error handling is to prevent your code to process further and avoids unexpected output from the (continued...)
Interview with Lucy Liu, scikit-learn Team Member
Author: Reshama Shaikh , Lucy LiuThe second Common Fund Data Ecosystem hackathon - May 9-13, 2022!
We're running another hackathon!
Storing 64-bit unsigned integers in SQLite databases, for fun and profit
Storing unsigned longs in SQLite is possible, and can be fast.
Interview with Maren Westermann: Extending the Impact of the scikit-learn Sprints to the Community
Author: Reshama Shaikh , Maren WestermannBehind the Scenes of Data Umbrella scikit-learn Open Source Sprints
Author: Reshama Shaikh , Angela OkuneThe First Common Fund Data Ecosystem Hackathon
We ran a successful pilot hackathon, and we will run a second one soon!
Mestrado em Ciência da Computação 2022: Metaheurísticas
Estamos ainda com algumas vagas abertas para o Mestrado em Ciência da Computação na UFPA, Belém. Os interessados, favor olhar as instruções para submissão na página de seleção do programa. Desde meu ingresso no programa venho orientando alunos em diferentes pesquisas sobre inteligência computacional aplicados a problemas de smart grids. Já tivemos trabalhos sobre sistemas multiagentes… Continue a ler »Mestrado em Ciência da Computação 2022: MetaheurísticasDiffCast: Hands-free Python Screencast Creator — Create reproducible programming screencasts without typos or edits
Programming screencasts are a popular way to teach programming and demo tools. Typically people will open up their favorite editor and record themselves tapping away. But this has a few problems. A good setup for coding isn't necessarily a good setup for video -- with text too small, a window too …
On minimum metagenome covers, and calculating them for your own data.
You, too, can run our software!
Optimization Nuggets: Implicit Bias of Gradient-based Methods
When an optimization problem has multiple global minima, different algorithms can find different solutions, a phenomenon often referred to as the implicit bias of optimization algorithms. In this post we'll characterize the implicit bias of gradient-based methods on a class of regression problems that includes linear least squares and Huber …
Optimization Nuggets: Exponential Convergence of SGD
This is the first of a series of blog posts on short and beautiful proofs in optimization (let me know what you think in the comments!). For this first post in the series I'll show that stochastic gradient descent (SGD) converges exponentially fast to a neighborhood of the solution.
A bioinformatics training career panel in the DIB Lab
Careers in training!
Hiring an engineer and post-doc to simplify data science on dirty data
Note
Join us to work on reinventing data-science practices and tools to produce robust analysis with less data curation.
It is well known that data cleaning and preparation are a heavy burden to the data scientist.
In the dirty data project, we have been conducting machine-learning research …
TorchVision Datasets: Getting Started
The TorchVision datasets subpackage is a convenient utility for accessing well-known public image and video datasets. You can use these tools to start training new computer vision models very quickly. TorchVision Datasets Example To get started, all you have to do is import one of the Dataset classes. Then, instantiate ... Read more
The post TorchVision Datasets: Getting Started appeared first on Sparrow Computing.
NumPy Any: Understanding np.any()
The np.any() function tests whether any element in a NumPy array evaluates to true: The input can have any shape and the data type does not have to be boolean (as long as it’s truthy). If none of the elements evaluate to true, the function returns false: Passing in a ... Read more
The post NumPy Any: Understanding np.any() appeared first on Sparrow Computing.
PyTorch DataLoader Quick Start
PyTorch comes with powerful data loading capabilities out of the box. But with great power comes great responsibility and that makes data loading in PyTorch a fairly advanced topic. One of the best ways to learn advanced topics is to start with the happy path. Then add complexity when you ... Read more
The post PyTorch DataLoader Quick Start appeared first on Sparrow Computing.
How the NumPy append operation works
Understanding the np.append() operation and when you might want to use it.
The post How the NumPy append operation works appeared first on Sparrow Computing.
Hiring someone to develop scikit-learn community and industry partners
Note
With the growth of scikit-learn and the wider PyData ecosystem, we want to recruit in the Inria scikit-learn team for a new role. Departing from our usual focus on excellence in algorithms, statistics, or code, we want to add to the team someone with some technical understanding, but an …
Using snakemake to do simple wildcard operations on many, many, many files
snakemake is awesome
A paper on the Lees-Edwards method
A few years ago1, Sebastian contacted me to help with simulations. Great, I like simulation studies, so we start discussing the details. The idea: use an established method, the Lees-Edwards boundary condition, to study colloids under shear.
A biotech career panel in the DIB Lab
Careers outside of universities!
Scaling sourmash to millions of samples
Bigger and better!
Poetry for Package Management in Machine Learning Projects
When you’re building a production machine learning system, reproducibility is a proxy for the effectiveness of your development process. But without locking all your Python dependencies, your builds are not actually repeatable. If you work in a Python project without locking long enough, you will eventually get a broken build ... Read more
The post Poetry for Package Management in Machine Learning Projects appeared first on Sparrow Computing.
Development containers in VS Code: a quick start guide
If you’re building production ML systems, dev containers are the killer feature of VS Code. Dev containers give you full VS Code functionality inside a Docker container. This lets you unify your dev and production environments if production is a Docker container. But even if you’re not targeting a Docker ... Read more
The post Development containers in VS Code: a quick start guide appeared first on Sparrow Computing.
New sourmash databases are available!
Databases are now available for GTDB!
Colunando no O Estado do Piauí
O Estado do Piauí é um novo jornal que surgiu recentemente pelas bandas de lá. Com um foco maior em reportagens longas e densas, misturando jornalismo investigativo e literário, o projeto pretende discutir em profundidade os temas de interesse do estado, descobrir histórias piauienses únicas, repercutir situações problemáticas, apontar alternativas e muito mais. Não se… Continue a ler »Colunando no O Estado do PiauíCiclo de Entrevistas sobre as Pesquisas no PPGCC da UFPA – Inteligência Computacional
A Faculdade de Computação e o Programa de Pós-Graduação em Ciência da Computação da UFPA estão desenvolvendo um projeto que pretende atingir dois objetivos: o primeiro, fazer uma melhor divulgação para o público externo à universidade do que produzimos em nossas pesquisas; o segundo, uma melhor divulgação INTERNA da mesma coisa – o que desenvolvemos… Continue a ler »Ciclo de Entrevistas sobre as Pesquisas no PPGCC da UFPA – Inteligência ComputacionalMoving sourmash towards more community engagement - a funding application
CZI EOSS4 application for sourmash support
Searching all public metagenomes with sourmash
Searching all the things!
Is your software ready for the Journal of Open Source Software?
For the unaware reader, the Journal of Open Source Software (JOSS) is an open-access scientific journal founded in 2016 and aimed at publishing scientific software. A JOSS article in itself is short and its publication contributes to recognize the work on the software. I share here my point of view on what makes some software tools more ready to be published in JOSS. I do not comment on the size or the relevance for research which are both documented on JOSS' website.
On the Link Between Optimization and Polynomials, Part 4
While the most common accelerated methods like Polyak and Nesterov incorporate a momentum term, a little known fact is that simple gradient descent –no momentum– can achieve the same rate through only a well-chosen sequence of step-sizes. In this post we'll derive this method and through simulations discuss its practical …
NumFOCUS Welcomes Tesco Technology to Corporate Sponsors
NumFOCUS is pleased to announce our new partnership with Tesco Technology. A long-time PyData event sponsor, Tesco Technology joined NumFOCUS as a Silver Corporate Sponsor in December 2020. “We are very excited to formalize our partnership with Tesco Technology,” said Leah Silen, NumFOCUS Executive Director. “Tesco Technology has partnered with NumFOCUS for the past several […]
The post NumFOCUS Welcomes Tesco Technology to Corporate Sponsors appeared first on NumFOCUS.
Job Posting | Communications and Marketing Manager
Job Title: Communications and Marketing Manager Position Overview The primary role of the Communications & Marketing Manager is to manage the NumFOCUS brand by overseeing all outgoing communications between NumFOCUS and our stakeholders. You will serve the project communities by playing a key role in their event marketing management and assist with project promotional and […]
The post Job Posting | Communications and Marketing Manager appeared first on NumFOCUS.
Getting started with Acoular - Part 2
This is the second in a series of three blog posts about the basic use of Acoular. It assumes that you already have read the first post and continues by explaining some more concepts and additional methods. Acoular is a Python library that processes multichannel data (up to a few hundred channels) from acoustic measurements with a microphone array. The focus of the processing is on the construction of a map of acoustic sources. This is somewhat similar to taking an acoustic photograph of some sound sources.Getting started with Acoular - Part 3
This is the third and final in a series of three blog posts about the basic use of Acoular. It assumes that you already have read the first two posts and continues by explaining additional concepts to be used with time domain methods. Acoular is a Python library that processes multichannel data (up to a few hundred channels) from acoustic measurements with a microphone array. The focus of the processing is on the construction of a map of acoustic sources. This is somewhat similar to taking an acoustic photograph of some sound sources. To continue, we do the same set up as in Part 1. However, as we are setting out to do some signal processing in time domain, we define only TimeSamples, MicGeom, RectGrid and SteeringVector objects but no PowerSpectra or BeamformerBase. import acoular ts = acoular.TimeSamples( name="three_sources.h5" ) mg = acoular.MicGeom( from_file="array_64.xml" ) rg = acoular.RectGrid( x_min=-0.2, x_max=0.2, y_min=-0.2, y_max=0.2, z=0.3, increment=0.01 ) st = acoular.SteeringVector( grid=rg, mics=mg (continued...)Getting started with Acoular - Part 1
This is the first in a series of three blog posts about the basic use of Acoular. It explains some fundamental concepts and walks through a simple example. Acoular is a Python library that processes multichannel data (up to a few hundred channels) from acoustic measurements with a microphone array. The focus of the processing is on the construction of a map of acoustic sources. This is somewhat similar to taking an acoustic photograph of some sound sources.sourmash 4.0 is now available! Low low cost if you buy now!
sourmash v4.0.0 is here!
On the Link Between Optimization and Polynomials, Part 3
I've seen things you people wouldn't believe.
Valleys sculpted by trigonometric functions.
Rates on fire off the shoulder of divergence.
Beams glitter in the dark near the Polyak gate.
All those landscapes will be lost in time, like tears in rain.
Time to halt.
A momentum optimizer *
Introducing PyMC Labs: Saving the World with Bayesian Modeling
After I left Quantopian in 2020, something interesting happened: various companies contacted me inquiring about consulting to help them with their PyMC3 models.
Usually, I don't hear how people are using PyMC3 -- they mostly show up on GitHub or Discourse when something isn't working right. So, hearing about all these …
Using MicroPython and uploading libraries on Raspberry Pi Pico — Using rshell to upload custom code
MicroPython is an implementation of the Python 3 programming language, optimized to run microcontrollers. It's one of the options available for programming your Raspberry Pi Pico and a nice friendly way to get started with microcontrollers.
MicroPython can be installed easily on your Pico, by following the instructions on the …