Data analysis for beginners

Data analysis for beginners

When you build a website it is easy to get caught up in details that seem important.

You can spend days tweaking logos and color schemes looking for that perfect solution. It’s fun and there is a visible and tangible change and result.

But is it time well spent?

How do you know where to invest your time? And will time spent help you compete more effectively?

In corporate life we often get caught in the H.I.P.P.O syndrome. Here the Highest Paid Person’s Opinion matters most. But it is often still feelings, emotions and experience that speaks. Not facts.

As a solopreneur you only have yourself to rely on.

How can you possibly know where to invest your time for the best return?

Data analysis to help make better decisions

Data is important and all decisions should be based on facts rather than feelings, emotions and hunches.

You collect, report and analyze data via, for example, observation.

Using images, tables, graphs and data analysis tools you visualize data to create a better understanding.

Here today we will look at different types of data and how we can work with data.

The article will help you create an understanding for central concepts, tools and methods.

And this understanding will help you apply information in future articles and make smarter decisions.

5 quick facts about data

  1. Data can simply be defined as facts, figures and information and is often numerical.
  2. Data can be defined as Qualitative (why) or Quantitative (how many).
  3. Raw data is information that has not been processed or cleaned for context or suitability.
  4. Raw data is processed to remove so called outliers. Outliers can for example be errors that would skew the results. An example can be a customer entering date of birth as 1079 rather than 1979.
  5. Data is often processed in several steps. The result from processing data in one step will consequently yield a result that can be used as raw data for further processing. 

Taking a closer look at Qualitative data

Qualitative data is non-numerical data that describes qualities or characteristics.

Qualitative data can be collected and registered via different methods such as

  • Interviews (one-to-one)
  • Focus groups (structured interviewer led group discussion)
  • Public records (using existing data)
  • Observation
  • Longitudinal study (data collected at intervals from same respondents over a longer period of time)
  • Case studies (in depth analysis using both qualitative and quantitative data)

Qualitative data is also referred to as categorical data as the data can be divided into groups or categories based on its qualities or attributes.

Examples of qualitative data are hair color, feelings and emotions. 

Advantages with qualitative data

  • Provides depth and nuances
    Records attitudes, sentiments and feelings.
  • Dynamic
    As respondents are encouraged to offer details we can learn about new areas to be analysed.
  • Detailed records
    Simulates individual experiences in detail, painting a more complete picture of why someone acts a certain way and their emotions as they interact and take action.

Disadvantages with qualitative data

  • Smaller samples
    It is costly and time consuming to process qualitative data which is why we have to work with smaller data samples.
  • Smaller truths
    Smaller samples make it harder and less reliable to use results as a representation of a larger population.
  • Hard to compare
    Answers can be very different and detailed and therefor it is difficult to draw systematic conclusions.
  • Risk of bias
    Skill, frame of reference and shared experiences between interviewer and respondent are just a couple of examples of factors that can sway results.

How to analyse qualitative data

There are two main methods or approaches to analyzing qualitative data

  • Deductive approach (from general to specific)
  • Inductive approach (from specific to general)

Deductive approach (from general to specific)

The deductive approach is based on analyzing qualitative data using a preset structure decided by the user analyzing the data.

This approach works well when the user analyzing the data has a fair and valid understanding of the data received from the respondents.

Inductive approach (from specific to general)

The inductive approach follows no preset rules or frameworks and is a lot more time consuming but also more thorough.

The inductive approach is used when the user analysing the data has little or no understanding of the possible outcomes.

The difference between the two approaches

The deductive approach starts with a theory whereas the inductive method is focused on forming a theory.

Deductive approach looks something like this:
Theory  > Hypothesis > Observation > Conclusion

The Inductive approach looks quite different:
Observation > Pattern > Hypothesis > Theory

Examples of the deductive and inductive approach

Deductive:
All dogs are mortal  > Sparky is a dog > Sparky is mortal

Inductive:  
Bob always leaves at 6AM > Bob is always on time > Bob will always be on time if he leaves at 6AM

Taking a closer look at Quantitative data

Quantitative data is numerical data and each data-set has a unique numerical value.

We use quantitative data in mathematical calculations and statistical surveys.

There are several different methods to collect and register quantitative data:

  • Forms and surveys
  • Correlation analysis (measures the strength of a linear relationship between two variables, simply calculates the level of change in one variable due to a change in the other.)
  • Causal analysis or studies (patterns of cause and effect between two variables).
  • Experiments (often used in the natural sciences where a theory not yet proven is tested.)

The difference between correlation and causal

It is important to understand the difference between correlation and causal.

They may seem similar but are actually quite different.

Causal means that if A happens then B will follow. Period.

Correlation shows that there is a relationship between A and B. But it does not mean that A will automatically lead to B.

Quantitative data can be discrete or continuous

Discrete data can be counted whereas continuous data has to be measured. 

Examples of discrete data could be the number of visitors to a city per year or the number of cars per household in an area.

Discrete data will only take on certain values. A household would not own 1.75 cars.

Continuous data is measured on a scale and examples could be the temperature in Miami or your level of cholesterol as measured by your doctor.

Discrete data is visualized by plotting the data in a diagram to show each count or instance.

Data analysis for beginners: Discrete data is plotted in diagrams

Continuous data can on the other hand display change over time in a linear graph where each reading can be plotted and linked together.

Data analysis for beginners:: Continuous data is presented in linear graphs to show change over time

Advantages with quantitative data

  • Larger samples
    Makes results more reliable and more suitable for systematic conclusions
  • Anonymous
    Hard to identify any one data source
  • Reliable
    Results and data can be measured, processed and analyzed by others to verify results.

Disadvantages with quantitative data

  • Lack of social or soft attributes 
    Will not take emotions or motive into account. No answer to questions like why or how.
  • Handling errors 
    Large data sets introduce a greater risk of errors and mistakes.

How to measure quantitative and qualitative data

Nominal scale – name (qualitative)

Used for qualitative data and quantitative value is assigned to the variable. Calculations make no sense as variables are simply organized into groups or categories without any specific order.

Examples of groups can be nationality, gender, age and profession.

The value of the variables can only be assigned as words like american or female. 

When we use the nominal scale we measure the number of times a particular variable is represented. 

Ordinal scale – name and order (qualitative)

The ordinal scale measures the order of the variables.

The scale simply shows the order without assigning any numerical value to any particular variable. 

Interval scale – name, order, constant spacing between variables (quantitative)

Variables are assigned a numerical value and there is a fixed interval between the variables. 

There is no fixed true zero and there can or could be values below zero. 

Time is a value on the interval scale. You cannot measure when time started because there is no meaningful zero point.

The interval scale is often used to measure level of satisfaction or likelihood. It is often used in question-type surveys where options are given a numerical value to show level of agreement with the statement.

Example: How happy are you with your overall experience? (1=not happy, 5=extremely happy)

Ratio scale – name, order, constant spacing between variables and absolute zero (quantitative)

The ratio scale measures data that have a meaningful zero point. Ratio scale is ideal for measuring age, weight and height but also sales and number of customers.

Duration is an example of the ratio scale as there is a meaningful point zero when something starts. 

How to measure quantitative and qualitative data using scales

Primary data

Primary data is data that is collected by a researcher for a specific purpose.

Advantages

  • Flexible 
    As researcher is collecting the data adjustments can be made as needed
  • Quality
    Generally of better quality as data is collected by the researcher
  • Access to additional data
    Possible to gather more data as needed

Disadvantages

  • Researcher bias as researcher makes all decisions
  • Time consuming

Secondary data

Data collected by someone for a specific purpose but used by someone else for another purpose.

Advantages

  • Cheaper
  • Researcher not responsible for quality

Disadvantages

  • Additional data not possible to gather if more data is needed

What is a population

A population is simply a group of objects that share one or several attributes that we want to analyse.

Examples of a population could be:

  • NBA players (individuals)
  • Windshield wipers (objects)
  • Traffic accidents (occurrences)

Using samples for a representative result

Studying the entire population is too expensive, difficult, time consuming and ethically questionable. 

Instead we work with samples large enough to represent the entire population while maintaining anonymity.

There are several different approaches to samples where the most popular are:

Simple Random Sampling

All objects in the population have the same probability of making the sample.

Systematic Random Sampling.

All individuals in a given list or directory have the same probability of making the sample. 

For example, you could decide to select individuals listed in a directory with an interval of 30.

Stratified Random Sampling

Population is divided into subgroups or strata. The sample is then generated at random from the strata.

A sports club has 1000 supporters. 700 are men and the other 300 are women.

To receive a result that reflects gender distribution, you divide the supporters into two strata based on their gender.

From the 700 men you randomly select 70 supporters. From the other strata of 300 supporters you randomly select 30 female supporters.

You now have a sample of 100 that reflects the gender distribution in the supporter club.

Frequently asked questions