Category:

Data Science and Governance

Leveraging NLP in Knowledge Management: a Case Study of Lab Document Management

by 2021-07-01

Knowledge management in complex environments, such as laboratories, requires innovative approaches to handle vast amounts of data efficiently. The advent of Natural Language Processing (NLP) and large language models offers a transformative solution.

This article explores a case study where advanced NLP techniques were applied to manage millions of lab scanned documents.

Project Overview

The project entailed developing a system to manage a substantial repository of scanned laboratory documents. The primary objective was to enhance accessibility and categorization of these documents using state-of-the-art NLP techniques and machine learning models.

NLP Techniques and Models Applied

Topic Modeling & Document Clustering: Utilized for categorizing documents into coherent groups based on their content. This approach facilitated easier retrieval and analysis of documents based on subject matter.
Semantic Similarity Analysis: Implemented to understand the context and deeper meaning within the documents. This technique helped in linking related documents and provided a more nuanced search capability.
BERT and GPT for Keyword Extraction & Synonym Generation: The use of BERT (Bidirectional Encoder Representations from Transformers) and GPT (Generative Pre-trained Transformer) models significantly enhanced keyword extraction. This led to the generation of relevant synonyms, thereby improving document categorization.

Development of Sophisticated Ontology

A critical aspect of the project was the development of a sophisticated ontology. This ontology served as a structured framework of knowledge, representing concepts within the laboratory domain and the relationships between them.

Integration with Public Data Sources: The ontology was integrated with various public data sources. This integration enriched the ontology with external knowledge, making the internal repository more comprehensive.
Improved Data Interoperability: By aligning the internal data structure with external sources, data interoperability was significantly enhanced. This alignment facilitated seamless data exchange and integration, enabling more collaborative and efficient research.
Research Collaboration: The integrated ontology fostered research collaboration. Researchers could easily connect their work with existing knowledge and collaborate based on shared terminologies and concepts.

Outcomes and Impacts

The implementation of advanced NLP techniques and large language models revolutionized the management of lab documents. The enhanced categorization and retrieval system led to a more efficient research process. Researchers could now easily access relevant documents, draw connections, and collaborate more effectively.

The integration of a well-structured ontology with public data sources further amplified the benefits, establishing a more cohesive and collaborative research environment. This approach not only streamlined internal processes but also positioned the organization at the forefront of research collaboration and knowledge sharing.

Conclusion

This case study underscores the potential of NLP and machine learning in transforming knowledge management, particularly in complex and data-intensive environments. By harnessing the power of advanced algorithms and models like BERT and GPT, organizations can effectively manage vast repositories of information, enhancing both internal efficiencies and external collaboration. The success of this project serves as a benchmark for future endeavors in knowledge management and NLP applications

How many fields in data science?

by 2021-06-27

Data Science is a research activity mostly

Data-driven scientific discovery is regarded as the fourth science paradigm

The twenty-first century has ushered in a new age that is coined as data science and big data analytics. Data-driven scientific discovery is regarded as the fourth science paradigm. Data science has been a core driver of the new-generation science, technologies and economy, and is driving new researches, innovation, profession, applications and education across both disciplines and business domains.

There are many scientific and technical challenges associated with big data, ranging from data capture, creation, storage, search, sharing, modeling, representation, analysis, learning, visualization, explanation, and decision making. Among the many data characteristics and complexities to be addressed, I mention the hybridization of heterogeneous, multisource, hierarchical,
interactive, dynamic, multidimensional, and quality-poor data mixed with real-time business operations, strategic planning, decision-making, value creation, and future developments.
The field of data sciences and big data analytics have been evolving from statistics since half century ago to broad areas including but not limited to data and signal analytics, knowledge discovery, information retrieval, machine learning, statistics, optimization, computing, and data management. The literature defined areas of Data Science that requires in depth knowledge and pure research to be effective. By synergizing the three big areas—statistics, informatics and computing, data science has been spreading to essential and specific areas such as

data intelligence and complexity analysis
representation, modeling, analytics, mining and learning including statistical and deep learning
computational intelligence including neural networks, evolutionary computing, fuzzy systems
neuroscience and linguistics
behavioral science and social and economic computing
uncertainty and optimization
system and modeling infrastructures and architectures
networking and interoperation
social issues including
privacy, security, trust, value and impact,
enterprises, services, applications, solutions and systems
simulation, visualization and explanation

Speaker at FUTURE Labs 2021

by 2021-05-25

It is a honor for me to speak at the next conference FUTURE Labs 2021. I will talk about Artificial Intelligence applied to Research and Development.

What is Future Labs Live?

This is the world’s most diverse, stimulating, and exciting event for the future of all labs.

Start-ups, disruptors and innovators from academia, Biotech, Pharma, Chemicals, Food & Beverage, Agricultural, Materials, FMGC, Architects, Planners and many more are invited to help pave the way and share their experiences and vision for the lab of the future.

Featuring industry leaders sharing their views and expertise across 9 themes including: AI & Machine Learning, Digital Transformation, Lab Operations & Efficiency, Data Management and more. All conference sessions will be in English.

Microservices architecture: the case of AWS

by 2021-04-12

Use serverless computing for Artificial Intelligence solutions

Cloud computing has always been about abstraction and virtualization.

One of the very first offerings was virtual servers.

With virtual servers, the specific hardware and physical components like racks, cords, and connections were abstracted out.

The management and configuration effort was shifted only to the software running on the servers.

For example, in AWS, this level of abstraction is achieved with EC2, the virtual servers. As cloud service offerings increased and matured more abstraction became available in the form of containers. With containers, the specific OS software and supporting network libraries are abstracted out.
The user could now just focus on specific application components.

In AWS, examples of services offering this level of abstraction are ECS and Elastic Beanstalk.

ervers are still running but choices don’t have to be made about the underlying OS and supporting packages and libraries on the infrastructure. There is yet another level of abstraction available in what is often referred to as serverless.

At this level, the language runtime itself is being abstracted out and focuses instead turned only to the individual functions performing specific tasks.

AWS Lambda is one of the AWS services providing this level of abstraction. With Lambda, just specific code functions are what is running, only when they are needed, and without any knowledge of the servers or the OS or the language runtime configuration.
A snippet of code that is designed to perform some process is provided and the rest is taken care of by Lambda. The two primary components of AWS Lambda are the Lambda function itself and the event source:

The function part is simply some custom code that has been written and uploaded to the service.
The event source is something that is capable of publishing an event that will invoke the function.

Code is written to process events and event sources publish events for processing.
The Lambda function consists of custom code and some configuration.

Event sources can be other AWS services that support publishing events or custom applications written to publish events and invoke Lambda functions. There are many AWS services that can be configured as event sources and this list will continue to grow (S3, DynamoDB, Simple Notification Service, Kinesis streams, Gateway API…).

Microservices architecture

One common use case of Lambda is in designing what is referred to as microservices architecture.

This is typically accomplished by using the Amazon API Gateway as the event source for Lambda functions. This is referred to as an on-demand invocation of Lambda as explicit requests come into the API and then the API invokes Lambda functions.

Another common use case for Lambda is for file or data processing by using S3 as the event source. S3 can publish events of different types such as when objects are added, updated, copied, or deleted within buckets. Using the bucket notification feature, S3 can be configured to invoke a Lambda function when one of these bucket events occur. For example, raw data being gathered from the Internet and initially placed into an S3 bucket. In this example, the raw data needs to be cleaned up and formatted before it can be used in some internal application. When the data is added to the S3 bucket, S3 triggers the object-created event, which then invokes the Lambda function. S3 knows which Lambda function to invoke based on the event source mapping that is stored in the bucket notification configuration.
The Lambda function runs, cleans up the data, and then writes it out to a data storage to be used by some internal application. So when architecting applications on AWS, it is important to consider the level of abstraction that can be tolerated by the business use cases and application being designed. Take advantage of the virtualization options the cloud provides and consider removing as much maintenance and administrative overhead as possible.

Speaker at Swiss Data Leaders Conference

by 2021-03-31

I am honored to attend the SWISS LEADERS CONFERENCE ON BIG DATA STRATEGIES 2021 in Zurich, May 19 2021. Wonderful experience with talents around the world to talk about data strategy and digital transformation.

High Level Executive Exchange for Data and AI Driven Business Models and Use Cases

Big Data Analytics and AI are the pillars of digitalization. Combined with other groundbreaking technologies they enable new service-oriented business models, create new markets and disrupt old ones. Companies have been investing in internal or external teams or startups, software and architecture. Many exciting use cases have been created. Purely data-driven startups disrupted whole industries. Some established companies try to keep pace, but struggle to develop compelling strategies. There is a thin line between agility and disorientation. Other traditional businesses adopted early and became digital frontrunners in their industries. Swiss Leaders Dialog is an executive-only live and virtual networking event to exchange views and experiences across industries.

Topics:

Digitalisation Strategy
Data and AI Strategy
Big Data Use Cases
Data Organisation (Built? Buy? Found?)
Big Data Analytics / Use Cases
AI Use Cases / AI Business Models
Data Ecosystems
Strategic Initiatives in Switzerland and Europe
Data Science
Integrating BI and DWH
Platform Economy
New Management Concepts (Agile, Scrum, Design Thinking, Holacracy etc.)
Data Governance
Problems of Data Ownership
Compliance Strategy (with regard to data)
Regulatory Frameworks
Data Governance
Data Sourcing
Data Integration
IoT and Big Data / Industry 4.0
Big Data and “Servitization”
Partner Strategy
Big Data Architecture
Big Data & Blockchain

Format

Best practices: Selected business cases
Matchmaking: Interests and competencies of all attendees
Networking: Matched and pre-arranged one on one meetings
Workshops: Confidential peer discussions
High level: Colleagues in similar positions
Decision support: Knowledge of an elite network for your projects, strategies and ideas
Ultra efficient: Personal agenda – optimal time invest
Chill out: Informal networking at after-event bar meeting
Online community: Access new contacts and knowledge after the event

Speaker at the next Future Labs 2021

by 2021-01-29

I am honored to attend the FUTURE LABS Live 2021. Wonderful experience with talents around the world to talk about digital transformation, data science, Artificial Intelligence, Machine Learning.

FUTURE LABS Live 2021

Labs, be they focused on research and development or on the creation of products and services, are the backbone of industry. The former turn money into knowledge while the latter turn knowledge back into money. Despite this, the potential of labs remains underutilised.

Technological, organisational, and cultural change remains a challenge across multiple industries. Transforming the lab and those who use it must happen now!

Future Labs LIVE is essential and timely. Held over two days, with over eighty global experts Future Labs will provide comprehensive and interactive coverage of the key issues and technologies.

FUTURE LABS LIVE FEATURES

Our vision is to create the world’s most important R&D lab event focusing on digitalisation, connectivity, and automation.

Future Labs unites leading stakeholders from across the entire laboratory eco-system to create insight and to promote collaboration. We guarantee to give you an experience that is engaging, collaborative and global.

Link to screenshot in PDF format.

Interpreting deep learning models

by 2020-07-17

With the fast development of sophisticated machine learning algorithms, artificial intelligence has been gradually penetrating a number of brand new fields with unprecedented speed.

One of the outstanding problems hampering further progress is the interpretability challenge.
This challenge arises when the models built by the machine learning algorithms are to be used by humans in their decision making, particularly when such decisions are subject to legal consequences and/or administrative audits.
For human decision makers operating in those circumstances, to accept the professional and legal responsibility ensuing from decisions assisted by machine learning, it is critical to comprehend the models.
For areas like the healthcare domain, business, crime prediction, etc., mistakes in these areas can be catastrophic. For instance, to develop safe self-driving cars we need to understand their rare but costly mistakes. Therefore, it is imperative to explain the learned representations, relationships between the inputs and the dependent variables and decisions made by these models. To trust the model, decision makers need to first understand the model’s behavior, and then evaluate and refine the model using their domain knowledge. One critical issue associated with future automated systems based on machine learning is its misalignment with the objectives of its stakeholders. That is, whether these systems really behave reliably in unforeseen situations. They may perform pretty well on test cases, but might do the wrong thing in deployment in the wild. Ironically, it could also be revealed later that they were doing the right things for the wrong reasons. Hence, interpretability plays a significant role in assisting us to reduce errors.

Properties of Interpretability

In the machine learning community, recently, interpretability is defined as “the ability to explain or to present in understandable terms to a human”.
Besides definition, a much harder task is to quantify and measure interpretability. Hence, the effort is extended beyond typical machine learning research into human-computer interaction. There are also other studies on aspects of interpretability such as the plausibility of models: the likeliness that a user accepts it as an explanation for a prediction.

For evaluation metrics, there are no mutually agreed standards. Sometimes the evaluation methods are only applicable to a specific model.
I present some general evaluation metrics: fidelity, comprehensibility and accuracy, which are frequently used by some state-of-art works.

Fidelity: It is not realistic for the interpretation model to be entirely faithful to the black-box model. Fidelity demands that the interpretation model’s prediction should match that of the black-box model as closely as possible. In other words, the interpretation model tries to mimic the behavior of the model itself on the instance being predicted.

Comprehensibility: Comprehensibility requires that the interpretation results are understandable to the users. When building an interpretation method, we should take into consideration the limitation of human cognition. For instance, decision trees involving thousands of nodes and decision rules having hundreds of levels of if-then conditions are not interpretable in this sense, although they are commonly regarded as inherently interpretable algorithms for textual representations.

Accuracy: Accuracy measures the performance of the interpretation model on the original training data used to train the black-box model to check if the interpretation model could outperform the black-box model. The measurements could be traditional evaluation metrics in machine learning such as accuracy score, AUC score, F1-score, etc.

Interpretations of models

One interpretation method is a visualization technique that interprets deep Convolutional Neural Networks (CNN) via meta-learning, named CNN-INTE . Compared to LIME which provides local interpretations for the entire model in specific regions of the feature space, this method provides global interpretation for any test instances on the hidden layers in the whole feature space.

The second interpretation method applies the Knowledge Distillation technique to distill Deep Neural Networks into decision trees in order to attain good performance and interpretability simultaneously.

To know more about these interpretations, please look at these research papers:

Xuan Liu, Xiaoguang Wang, and Stan Matwin. “QDV: Refining Deep Neural Networks with Quantified Interpretability.” In 2020 European Conference on Artificial Intelligence (ECAI), submitted.
Liu, Xuan, Xiaoguang Wang, and Stan Matwin. “Improving the Interpretability of Deep Neural Networks with Knowledge Distillation.” In 2018 IEEE International Conference on Data Mining Workshops (ICDMW), pp. 905-912. IEEE, 2018.
Liu, Xuan, Xiaoguang Wang, and Stan Matwin. “Interpretable deep convolutional neural networks via meta-learning.” In 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1-9. IEEE, 2018.

Why Meta-learning is important

by 2020-03-09

This powerful technique equips machines with the ability to learn how to learn, a leap forward that has the potential to revolutionize industries, from healthcare to finance

What is meta-learning

Meta-learning, in essence, is teaching machines to become better learners. It’s about going beyond traditional machine learning, where models are trained for specific tasks with vast amounts of data. With meta-learning, we aim to create models that can adapt quickly to new tasks with minimal data. Imagine a world where AI systems can learn from a handful of examples, just like humans. Meta-learning brings us closer to that reality. By exposing models to a diverse range of tasks during training, they acquire the ability to generalize and apply their knowledge to unseen tasks effectively.

Indeed, traditionally machine learning models require vast amounts of data to become proficient at specific tasks. However, in real-world scenarios, obtaining abundant labeled data is often impractical or costly. This is where meta-learning steps in, offering a remarkable departure from the norm. Meta-learning is the art of training models to learn how to learn. Instead of focusing solely on one task, these models are designed to generalize from a diverse set of tasks, enabling them to adapt swiftly to new challenges. This is akin to equipping AI with the ability to learn from its own learning experiences – a game-changer in the world of artificial intelligence.

The Mechanics of Meta-Learning

At its core, meta-learning involves two key stages: meta-training and meta-testing.
During the meta-training phase, the AI models are exposed to multiple tasks or datasets. They learn not only how to perform these tasks but also how to learn from them. This involves capturing higher-level features or representations that are transferable across tasks.

In the meta-testing, the magic happens. When faced with a new, unseen task, the meta-trained model adapts rapidly. It leverages the knowledge gained during meta-training to make predictions or classifications, even when provided with minimal task-specific data.

Few-shot learning and Meta-lerning

Few-shot learning is a machine learning paradigm that focuses on training models to make accurate predictions or classifications when provided with very limited examples or “shots” of each class or category. In traditional machine learning, models often require a large amount of labeled data for training, but few-shot learning seeks to address scenarios where obtaining extensive labeled data is impractical or costly. In few-shot learning, the training dataset for each class or category typically contains only a few examples, often ranging from one to a few dozen samples. This contrasts with traditional machine learning, where hundreds or thousands of examples per class are common. Few-shot learning often leverages meta-learning techniques, where models are trained on a variety of tasks or categories. The goal is to enable the model to quickly adapt to new tasks with only a small amount of training data. Meta-learning algorithms aim to learn a good initialization or prior knowledge that facilitates rapid adaptation.
Few-shot learning has practical applications in various domains, such as computer vision (e.g., recognizing rare objects or faces with limited training samples), natural language processing (e.g., text classification with few labeled documents), and recommendation systems (e.g., suggesting products with minimal user interaction history). It addresses the challenge of making accurate predictions in scenarios where collecting abundant training data is difficult or expensive.

Applications for Meta-lerning

The implications of meta-learning are vast and promising. Here are a few areas where it’s making a significant impact:

– Healthcare: Meta-learning can aid in the rapid development of AI systems for disease diagnosis. With minimal patient data, models can become proficient at identifying various medical conditions.
– Finance: In the world of finance, where market conditions change rapidly, meta-learning enables AI-driven trading systems to adapt swiftly to new trends and scenarios.
– Education: Meta-learning has the potential to revolutionize personalized learning. AI tutors can understand individual student needs and adapt their teaching methods accordingly.

Meta-lerning in Pharma

In the pharmaceutical industry, one of the major challenges is the rapid and cost-effective discovery of new drugs. Traditional drug discovery processes are time-consuming and expensive, often taking years and substantial financial resources. Identifying potential drug candidates with high efficacy and low toxicity is a complex task that requires extensive experimentation and data analysis.

Meta-learning offers a transformative solution to accelerate and optimize the drug discovery process in the pharmaceutical industry.

Meta-learning allows models to learn from a limited set of drug-related data, including chemical structures, biological properties, and historical drug development data. Meta-learned models can also predict the efficacy of potential drug candidates, enabling researchers to focus their efforts on compounds with higher chances of success. It helps optimize drug formulations by considering various factors such as drug stability, solubility, and bioavailability, leading to the development of more effective medications. It can identify existing drugs with potential applications in new therapeutic areas, saving both time and resources.

AGROVOC: An Introduction to the World’s Largest Agricultural Vocabulary

by 2018-07-20

Discover AGROVOC, the concept scheme that organizes the domains related to agriculture, and learn how it is organized, the difference between concepts and terms, and how to use the AGROVOC RDF file.

Love is a very complex emotion that can mean different things to different people.
Personally, I believe that love is a powerful force that can bring people together and help them overcome difficult challenges.

ChatGPT

ver. 3.5

The RDF concept

Resource Description Framework (RDF) is a standard framework for describing resources on the web, which is used to model and store metadata about resources. RDF provides a structured way to represent data in a graph format that consists of nodes and edges, where nodes represent resources, and edges represent the relationships between them.

RDF is based on a subject-predicate-object model, where the subject represents the resource being described, the predicate represents the relationship between the subject and object, and the object represents the value or resource that is related to the subject.

RDF data is typically expressed in triples, which consist of three parts: the subject, the predicate, and the object. The subject and object can be either a URI or a literal value, while the predicate is always a URI that defines the relationship between the subject and object.

RDF also supports the use of ontologies, which are formal descriptions of concepts and their relationships, to provide a standardized vocabulary for describing resources. These ontologies can be used to create domain-specific vocabularies, such as FOAF (Friend of a Friend) for describing people and their relationships, or Dublin Core for describing resources on the web.

RDF data can be serialized into various formats, including RDF/XML, Turtle, N-Triples, and JSON-LD, which enables it to be easily integrated with other web technologies and applications.

AGROVOC: An Introduction to a Concept Scheme for Agriculture

AGROVOC is a concept scheme that aims to organize the domains related to agriculture. In simpler terms, AGROVOC is like a big dictionary of terms and concepts that relate to agriculture.

AGROVOC is a full-fledged web-oriented resource available in RDF (Resource Description Framework). RDF is a standard model used for data interchange on the web. In other words, RDF allows machines to understand and process data on the web. AGROVOC is organized as a hierarchy of concepts that have names in various languages. These concepts are represented by terms, which are words used to name a concept. For example, “maize” is a term used to represent the concept of a type of crop.

AGROVOC concepts are hierarchically organized under 25 general top concepts such as activities, processes, and methods. These concepts may also be linked by non-hierarchical relations. AGROVOC is an RDF/SKOS-XL concept scheme, which means that it uses the SKOS (Simple Knowledge Organization System) extension for labels. This allows for more expressivity when describing concepts and terms.

So what can you do with the AGROVOC RDF file? The AGROVOC RDF file is a file containing the AGROVOC data in RDF-SKOS format. It is not a program to install but rather a file that is meant to be read by machines. You can load it into a triple store or an RDF editor/viewer, or parse it with an application. Once in your triple store, you can manipulate the data and use it in your application.

If you simply want to look at the AGROVOC content, you can use the online AGROVOC browsing tool or the web-based AGROVOC editing tool. These tools allow you to browse through the terms and concepts in AGROVOC and learn more about agriculture-related topics.

Example of RDF/XML from Agrovoc

<?xml version=”1.0″?>
<rdf:RDF
xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#”
xmlns:skos=”http://www.w3.org/2004/02/skos/core#”
xmlns:agrovoc=”http://aims.fao.org/aos/agrovoc/”>

<skos:Concept rdf:about=”http://id.agrisemantics.org/gacs/C16010″>
<skos:prefLabel xml:lang=”en”>Agricultural machinery</skos:prefLabel>
<skos:altLabel xml:lang=”en”>Agricultural equipment</skos:altLabel>
<skos:altLabel xml:lang=”en”>Farming equipment</skos:altLabel>
<skos:broader rdf:resource=”http://id.agrisemantics.org/gacs/C161″/>
<skos:narrower rdf:resource=”http://id.agrisemantics.org/gacs/C160″/>
<skos:inScheme rdf:resource=”http://id.agrisemantics.org/gacs/”/>
<agrovoc:note xml:lang=”en”>Includes all kinds of machines used in agricultural production, processing and handling of crops and livestock, as well as equipment for land preparation and maintenance.</agrovoc:note>
</skos:Concept>

</rdf:RDF>

In this example, the RDF/XML represents a concept in the Agrovoc agricultural thesaurus, specifically the concept for “Agricultural machinery”. The rdf:RDF element is the root element of the RDF document, which contains the namespaces for RDF, SKOS, and Agrovoc.

The skos:Concept element represents the concept, and the rdf:about attribute specifies the URI for the concept. The skos:prefLabel element represents the preferred label for the concept, while the skos:altLabel elements represent alternative labels.

The skos:broader and skos:narrower elements represent broader and narrower relationships between this concept and other concepts in the Agrovoc thesaurus. The skos:inScheme element indicates that the concept belongs to the Agrovoc concept scheme.

Finally, the agrovoc:note element provides additional information about the concept, specifically a note about what kinds of machinery and equipment are included in the concept.

Download my book

by 2017-09-12

Unlocking the Power of Your Website: A Guide to measuring impact content with data analytics.
I wrote a book for United Nations about it. You can download it below.

A guide for UN and CGIAR Centers to evaluate the usage, usability and usefulness of their websites.

This guide provides essential insights into measuring the usage, usability, and usefulness of your website. Learn how to evaluate the impact of your site, tailor it to meet audience needs, and improve its functionality and user experience. With the help of web analytics, surveys, and other tools, you can evaluate the 3Us (usage, usability, and usefulness) to achieve your objectives and make a meaningful impact. Whether you are a communication specialist, information manager, or IT technical specialist, this guide is a must-read for anyone looking to enhance their website’s impact.

Measuring the 3Us (usage, usability and usefulness) of your website is key to making sure that you are meeting the objectives and impact you set out to achieve when you built your website. Knowing how many people visit your site, who they are and what they do while they are there (usage) will help you tailor your site to deliver, share or pull in the information or messages your audiences most need, in the way audiences want to receive and contribute to it. Knowing how easily visitors find what they are looking for and their perception of your site (usability) will help you improve its functionality and the user-experience—encouraging more use of your site. And knowing how well your site meets your visitors’ information needs (usefulness) will help you improve both your content and its organization to meet those needs.

This guide provides an introduction to measuring these 3Us together with suggestions about how you can use Web analytics, surveys and other instruments to improve the 3Us and ultimately the impact of your website. Examples provide a step-by-step guide to putting this knowledge into practice.
This document is targeted at the broad range of people who contri- bute to the CGIAR’s websites, including communication specialists, information managers, management and IT technical specialists.

This guide provides an introduction to measuring the impact of your website based on the 3Us—usage, usability and usefulness.
Each section presents a general discussion of how to measure each characteristic—‘the theory’—and, examples of how to apply this knowledge—‘putting it into practice’—and suggests approaches and resources for further learning.

it must be stressed that using web analytics, usability testing and feedback surveys, to name the principal tools of these respective undertakings, is definitely a non-trivial pursuit. it is not something that can be immediately mastered simply by reading a short guide. scores, if not hundreds, of books have been written on these subjects. nevertheless, a journey of a thousand miles begins with a single step. hence this brief introduction.
it also must be emphasized that none of these activities will be useful unless they occur in the context of what the owners of the website want the website to achieve, both in general terms and in terms of any specific aspects. As Jason Burby of the web business consultancy ZaZ inc. said in a clickZ article, the three reasons that web analytics most often fail companies are:

the goals for the analytics program are ill- or un-defined.
information generated from analytics projects is not effectively shared within the UN organization.
companies don’t take action based on the data they collect.

the key then is for the website to be purpose-driven. only then will evaluating the impact be meaningful. this means maintaining a constant creative process involving both those who measure the impact of a website and those who possess the tools, access and authority to improve it.

Bias in statistics

by 2017-06-07

in statistics, sampling bias is a bias in which a sample is collected in such a way that some members of the intended population are less likely to be included than others.

Wikipedia

The word bias could mean:

in sampling: give preference to selecting some individuals over others
in statistic: certain responses are more likely to occur in the sample than in the population (statistic is the numerical summary of a sample)

Sources of bias:

Sampling bias
Nonresponse bias
Response bias

Sampling bias means that the technique used to obtain the sample’s individuals tends to favor one part of the population over another. Sampling bias also results due to under-coverage, which occurs when the proportion of one segment of the population is lower in a sample than it is in the population.

Nonresponse bias exists when individuals selected to be in the sample who do not respond to the survey have different opinions from those who do.

Response bias exists when the answers on a survey do not reflect the true feelings of the respondent.

Nonsampling errors result from data-entry error, undercoverage, nonresponse bias, response bias. Such errors could also be present in a complete census of the population. Sampling error results from using a sample to estimate information about a population. This type of error occurs because a sample gives incomplete information about a population.

Data Management Plans (DMP)

by 2017-06-07

A Data management platform (DMP) is a complex piece of software used to collect, store, classify, analyze, and distribute large quantities of data. It is a cornerstone technology for larger organizations when it comes to advertising data management, with rapidly increasing adoption. Data collection is a core capability of every data management platform. Being able to pull data from various disparate sources into one place may unlock enormous value. Typically, DMPs can ingest first-party data, second-party data from contracted partners, as well as third-party data from external providers.What differentiates various DMPs is the range of available data sources and integrations out of the box, data collection implementation, and speed of data transfer. The best DMPs have a large number of reliable (ideally lossless) and fast data integrations with other technology and data vendors. In addition, they offer an easy implementation with customization options.

(altro…)

Data Science and Governance

Project Overview

NLP Techniques and Models Applied

Development of Sophisticated Ontology

Outcomes and Impacts

Conclusion

Data Science is a research activity mostly

Data-driven scientific discovery is regarded as the fourth science paradigm

What is Future Labs Live?

Use serverless computing for Artificial Intelligence solutions

Cloud computing has always been about abstraction and virtualization.

Microservices architecture

High Level Executive Exchange for Data and AI Driven Business Models and Use Cases

FUTURE LABS Live 2021

FUTURE LABS LIVE FEATURES

Properties of Interpretability

Interpretations of models

What is meta-learning

The Mechanics of Meta-Learning

Few-shot learning and Meta-lerning

Applications for Meta-lerning

Meta-lerning in Pharma

Discover AGROVOC, the concept scheme that organizes the domains related to agriculture, and learn how it is organized, the difference between concepts and terms, and how to use the AGROVOC RDF file.

The RDF concept

AGROVOC: An Introduction to a Concept Scheme for Agriculture

Example of RDF/XML from Agrovoc

Queue