This tutorial is just to illustrate how to save Python Pandas dataframe into one excel work SHEET . You can save it column-wise, that is side by side or row-wise, that is downwards, one dataframe after the other.
import sqlalchemy
import pyodbc
from pandas import DataFrame
from bokeh.plotting import figure, output_file, show
import pandas as pd
df = DataFrame
## connect to SqlServer Database and get information.
try:
import urllib
params = urllib.quote_plus("DRIVER={SQL Server Native Client 11.0};SERVER=.\MSSQLSERVER_ENT;DATABASE=Antrak;Trusted_Connection=yes;")
engine = sqlalchemy.create_engine('mssql+pyodbc:///?odbc_connect=%s"' % params)
resoverall = engine.execute("SELECT top 10 * FROM FlightBookingsMain")
df = df(resoverall.fetchall())
df.columns = resoverall.keys()
except (RuntimeError, TypeError, NameError):
print('Error in Conneccting')
print(RuntimeError, TypeError, NameError)
finally:
print("connected")
#lets look at the head of the dataframe
df.head()
#lets define the bokeh figure, and declare the x axis as type datetime
p = figure(width=800, height=350, x_axis_type="datetime")
#lots plot both a circle and line graph
p.circle(df['Booking Date'], df['AFP'], size=4, color='darkgrey', alpha=0.2, legend='close')
p.line(df['Booking Date'], df['Total Cost'], color='navy', legend='avg')
# show the results
#output_file('linegraph.html')
show(p)
# bokeh server plotting
#sample server plotting code from bokeh
#from bokeh.plotting import figure, show, output_server
#p = figure(title="Server Plot")
#p.circle([1, 2, 3], [4, 5, 6])
#output_server("hover")
#show(p)
I came across this brilliant analyis by pbpython. and decided to play around with it on this data. you can see the full code and explanation on his website: http://pbpython.com/pandas-pivot-table-explained.html. To I have added just a little bit of some extra exploration, the key and major analysis are from pbpython so you can check him out.
For convenience sake, let’s define the SEX column as a category and set the order we want to view.
This isn’t strictly required but helps us keep the order we want as we work through analyzing the data.
df['Sex'] = df['Sex'].astype('category')
df['Sex'].cat.set_categories(['m','f'], inplace=True)
The simplest pivot table must have a dataframe and an index . In this case, let’s use the Name as our index.
df['Booking Date'] = pd.to_datetime(df['Booking Date'])
#get year from date
df['year'] = df['Booking Date'].apply(lambda x: x.year)
#get month from date
df['month'] = df['Booking Date'].apply(lambda x: x.month)
df.head()
#get all data less than a particular date
df[df['Booking Date']<'2014-10-28']
year_plot = pd.pivot_table(df, index=['year'])
year_plot
p.line(df['year'], df['Total Cost'], color='darkgrey', alpha=0.2, legend='year plot')
output_file('linegraph.html')
show(p)
pv_result = pd.pivot_table(df, index=['Last Name'])
type(pv_result)
pv_result
You can have multiple indexes as well. In fact, most of the pivot_table args can take multiple values via a list.
pd.pivot_table(df, index=['Last Name','Booking Date','Departure Date'])
For this purpose, the Flight Booking ID, Insurance and Postcode columns aren’t really useful. Let’s remove it by explicitly defining the columns we care about using the values field.
#let
sum_all =pd.pivot_table(df, index=['Last Name'], values=['Total Cost','AFP','Age'], aggfunc= np.sum)
sum_all
The price column automatically averages the data but we can do a count or a sum. Adding them is simple using aggfunc and np.sum .
ttlcost = pd.pivot_table(df, index=['Last Name'], values=['Total Cost'], aggfunc=np.sum)
ttlcost
aggfunc can take a list of functions. Let’s try a mean using the numpy mean function and len to get a count.
len_sum = pd.pivot_table(df, index=['Last Name'], values=['Total Cost'], aggfunc =[np.sum,len])
len_sum
I think one of the confusing points with the pivot_table is the use of columns and values . Remember, columns are optional – they provide an additional way to segment the actual values you care about. The aggregation functions are applied to the values you list.
# funtion to put dataframes in same worksheet
def multiple_dfs(df_list, sheets, file_name, spaces):
writer = pd.ExcelWriter(file_name,engine='xlsxwriter')
row = 0
col = 0
for dataframe in df_list:
dataframe.to_excel(writer,sheet_name=sheets,startrow=0 , startcol=col)
#dataframe.to_excel(writer,sheet_name=sheets,startrow=row , startcol=0) put dataframe rowwise
#row = row + len(dataframe.index) + spaces + 1
col = col + len(dataframe.columns) + spaces + 1
writer.save()
# list of dataframes
# function to put dataframe on same worksheet # 2 frames per line
def multiple_dfs(df_list, sheets, file_name, spaces):
writer = pd.ExcelWriter(file_name,engine='xlsxwriter')
row = 0
col = 0
three = 0
rowtostart = 0
increase = 0
for dataframe in df_list:
if(increase%2==0):
row = rowtostart
col = 0
rowtostart = rowtostart- 4
dataframe.to_excel(writer,sheet_name=sheets,startrow=row , startcol=col)
#dataframe.to_excel(writer,sheet_name=sheets,startrow=row , startcol=0) put dataframe rowwise
#row = row + len(dataframe.index) + spaces + 1
col = col + len(dataframe.columns) + spaces + 1
rowtostart = rowtostart + len(dataframe.index) + spaces + 1
increase = increase + 1
writer.save()
# list of dataframes
#original function credit to TomDobbs on StackOverflow
I
n [166]:
dfs = [ttlcost,len_sum,sum_all, pv_result, year_plot]
multiple_dfs(dfs, 'Validation', 'test1.xlsx', 1)