JSON and CSV Formats in Python: Reading, Writing, and Data Analysis

We've already covered the basics of working with files and text files in Python. Now we're moving on to structured data — formats that are used to store and exchange data in a structured way. 🧩

We'll focus on two of the most popular formats:

  • JSON (JavaScript Object Notation) — a data exchange format widely used in web applications and APIs
  • CSV (Comma-Separated Values) — a simple format for representing tabular data

These formats are very common and are used in all areas of programming — from web development to data analysis.

JSON: JavaScript Object Notation

JSON (JavaScript Object Notation) is a text-based data exchange format, similar to dictionaries and lists in Python. It is easy to read by both humans and machines.

JSON Data Structure

JSON supports the following data types:

  • Objects (dictionaries): {"name": "Alice", "age": 30}
  • Arrays (lists): [1, 2, 3, 4]
  • Strings: "Hello, world!"
  • Numbers: 42, 3.14
  • Boolean values: true, false
  • null (corresponds to None in Python)

The json Module in Python

Python provides a built-in json module for working with this format:

Python 3.13
import json

# Simple Python dictionary
person = {
    "name": "Anna",
    "age": 28,
    "city": "Moscow",
    "languages": ["Python", "JavaScript"]
}

# Converting a Python dictionary to a JSON string
json_string = json.dumps(person, ensure_ascii=False, indent=2)
print("JSON string:")
print(json_string)
JSON string:
{
  "name": "Anna",
  "age": 28,
  "city": "Moscow",
  "languages": [
    "Python",
    "JavaScript"
  ]
}
# Converting a JSON string back to a Python object
parsed_data = json.loads(json_string)
print("\nConverted back to Python:")
print(f"Name: {parsed_data['name']}")
print(f"Age: {parsed_data['age']}")
print(f"Programming languages: {', '.join(parsed_data['languages'])}")
Converted back to Python:
Name: Anna
Age: 28
Programming languages: Python, JavaScript

Writing JSON to a File and Reading from a File

Here's how to save data to a JSON file and then read it:

Python 3.13
import json

# Student data
students = [
    {"id": 1, "name": "Ivan", "scores": [85, 90, 78]},
    {"id": 2, "name": "Maria", "scores": [92, 88, 95]}
]

# Writing to a file
with open('students.json', 'w', encoding='utf-8') as file:
    json.dump(students, file, ensure_ascii=False, indent=2)
    print("Data written to students.json file")
Data written to students.json file
# Reading from a file
with open('students.json', 'r', encoding='utf-8') as file:
    loaded_students = json.load(file)
    print(f"\nLoaded {len(loaded_students)} students:")

    for student in loaded_students:
        avg_score = sum(student['scores']) / len(student['scores'])
        print(f"  {student['name']}: average score {avg_score:.1f}")
Loaded 2 students:
  Ivan: average score 84.3
  Maria: average score 91.7

Main Methods of the json Module

MethodDescription
json.dumps(obj)Converts a Python object to a JSON string
json.loads(str)Converts a JSON string to a Python object
json.dump(obj, file)Writes a Python object to a JSON file
json.load(file)Reads JSON from a file into a Python object

The ensure_ascii=False parameter allows correct saving of non-ASCII characters and other Unicode symbols, and indent makes the output more readable.

CSV: Comma-Separated Values

CSV (Comma-Separated Values) is a simple text format for representing tabular data, where table rows are file lines, and columns are separated by commas (or other delimiters).

CSV looks something like this:

Python 3.13
Name,Age,City
Anna,28,Moscow
Ivan,35,Saint Petersburg

The csv Module in Python

Python provides a built-in csv module for working with this format:

Python 3.13
import csv

# Data to write
data = [
    ['Name', 'Age', 'City'],  # Headers
    ['Anna', '28', 'Moscow'],
    ['Ivan', '35', 'Saint Petersburg'],
    ['Maria', '22', 'Kazan']
]

# Writing to a CSV file
with open('people.csv', 'w', newline='', encoding='utf-8') as file:
    writer = csv.writer(file)
    writer.writerows(data)
    print("Data written to people.csv file")
Data written to people.csv file
# Reading from a CSV file
with open('people.csv', 'r', encoding='utf-8') as file:
    reader = csv.reader(file)

    # Reading headers (first line)
    headers = next(reader)
    print(f"\nHeaders: {headers}")

    # Reading data
    print("\nData:")
    for row in reader:
        print(f"  {row[0]}, {row[1]} years old, city {row[2]}")
Headers: ['Name', 'Age', 'City']

Data:
  Anna, 28 years old, city Moscow
  Ivan, 35 years old, city Saint Petersburg
  Maria, 22 years old, city Kazan

Using DictReader and DictWriter

For more convenient work with CSV, you can use DictReader and DictWriter, which allow you to work with data as dictionaries:

Python 3.13
import csv

# Writing to CSV using DictWriter
data = [
    {'Name': 'Alex', 'Profession': 'Engineer', 'Salary': 85000},
    {'Name': 'Kate', 'Profession': 'Designer', 'Salary': 75000},
    {'Name': 'Sergey', 'Profession': 'Programmer', 'Salary': 110000}
]

with open('employees.csv', 'w', newline='', encoding='utf-8') as file:
    fieldnames = ['Name', 'Profession', 'Salary']
    writer = csv.DictWriter(file, fieldnames=fieldnames)

    writer.writeheader()  # Writing headers
    writer.writerows(data)  # Writing data
    print("Employee data written to file")
Employee data written to file
# Reading from CSV using DictReader
with open('employees.csv', 'r', encoding='utf-8') as file:
    reader = csv.DictReader(file)

    print("\nEmployees:")
    for row in reader:
        print(f"  {row['Name']} - {row['Profession']}, salary: {row['Salary']} units")
Employees:
  Alex - Engineer, salary: 85000 units
  Kate - Designer, salary: 75000 units
  Sergey - Programmer, salary: 110000 units

Main Features of Working with CSV

  1. Delimiters: Although CSV stands for "Comma-Separated Values", in practice other delimiters (semicolon, tab) can be used
  2. Quotes: If a value contains a delimiter or quotes, it is enclosed in quotes
  3. Escaping: If there are quotes inside a value, they are escaped
Python 3.13
import csv

# Example with a different delimiter
data = [
    ['Product', 'Price', 'In Stock'],
    ['Laptop', '45000', 'Yes'],
    ['Smartphone', '25000', 'No']
]

# Writing using semicolon
with open('products.csv', 'w', newline='', encoding='utf-8') as file:
    writer = csv.writer(file, delimiter=';')
    writer.writerows(data)
    print("Data written with delimiter ';'")
Data written with delimiter ';'
# Reading with the correct delimiter
with open('products.csv', 'r', encoding='utf-8') as file:
    reader = csv.reader(file, delimiter=';')
    for row in reader:
        print('  '.join(row))
Product  Price  In Stock
Laptop  45000  Yes
Smartphone  25000  No

Practical Example: Sales Data Analysis

Let's consider an example where we first save sales data in CSV, then analyze it and save the results in JSON:

Python 3.13
import csv
import json

# Creating sales data
sales = [
    ['Date', 'Product', 'Category', 'Price', 'Quantity'],
    ['2023-01-05', 'HP Laptop', 'Electronics', '45000', '2'],
    ['2023-01-10', 'Apple Smartphone', 'Electronics', '85000', '3'],
    ['2023-01-15', 'Book "Python"', 'Books', '1200', '5'],
    ['2023-02-10', 'Microwave', 'Home Appliances', '7000', '1']
]

# Step 1: Save data to CSV
with open('sales.csv', 'w', newline='', encoding='utf-8') as file:
    writer = csv.writer(file)
    writer.writerows(sales)
    print("Sales data saved to CSV")
Sales data saved to CSV
# Step 2: Read and analyze data
with open('sales.csv', 'r', encoding='utf-8') as file:
    reader = csv.reader(file)
    headers = next(reader)  # Skip headers

    # Preparing variables for analysis
    total_revenue = 0
    sales_by_category = {}

    # Data analysis
    for row in reader:
        date, product, category, price, quantity = row
        revenue = float(price) * int(quantity)

        # Total revenue
        total_revenue += revenue

        # Revenue by category
        if category in sales_by_category:
            sales_by_category[category] += revenue
        else:
            sales_by_category[category] = revenue

    # Output analysis results
    print(f"\nTotal revenue: {total_revenue} units")
    print("\nRevenue by category:")
    for category, rev in sales_by_category.items():
        print(f"  {category}: {rev} units")
Total revenue: 354000.0 units

Revenue by category:
  Electronics: 345000.0 units
  Books: 6000.0 units
  Home Appliances: 7000.0 units
# Step 3: Save analysis results to JSON
results = {
    "total_revenue": total_revenue,
    "sales_by_category": sales_by_category
}

with open('sales_analysis.json', 'w', encoding='utf-8') as file:
    json.dump(results, file, ensure_ascii=False, indent=2)
    print("\nAnalysis results saved to JSON")
Analysis results saved to JSON
# Step 4: Check saved JSON
with open('sales_analysis.json', 'r', encoding='utf-8') as file:
    saved_results = json.load(file)
    print("\nContents of the JSON file with results:")
    print(json.dumps(saved_results, ensure_ascii=False, indent=2))
Contents of the JSON file with results:
{
  "total_revenue": 354000.0,
  "sales_by_category": {
    "Electronics": 345000.0,
    "Books": 6000.0,
    "Home Appliances": 7000.0
  }
}

In this example we:

  1. Created a CSV file with sales data
  2. Read the data and calculated revenue by category
  3. Saved the analysis results to a JSON file
  4. Read the saved JSON to ensure correctness

Understanding Check

Which code correctly reads data from a JSON file in Python?