Pandas Data Process Örneği

Jupyter kullanarak oluşturduğum html dosyasıyla blog yazısı yapabilir miyim diye test etmek istedim. Untitled
In [1]:
import os
os.listdir()
Out[1]:
['.ipynb_checkpoints',
 'supermarkets-commas.txt',
 'supermarkets-semi-colons.txt',
 'supermarkets.csv',
 'supermarkets.json',
 'supermarkets.xlsx',
 'Untitled.ipynb']
In [3]:
import pandas
In [32]:
df1=pandas.read_csv("supermarkets.csv");
df1
df1.set_index("ID")
Out[32]:
Address City State Country Name Employees
ID
1 3666 21st St San Francisco CA 94114 USA Madeira 8
2 735 Dolores St San Francisco CA 94119 USA Bready Shop 15
3 332 Hill St San Francisco California 94114 USA Super River 25
4 3995 23rd St San Francisco CA 94114 USA Ben's Shop 10
5 1056 Sanchez St San Francisco California USA Sanchez 12
6 551 Alvarado St San Francisco CA 94114 USA Richvalley 20
In [27]:
df2=pandas.read_json("supermarkets.json")
df2.set_index("ID")
Out[27]:
Address City Country Employees Name State
ID
1 3666 21st St San Francisco USA 8 Madeira CA 94114
2 735 Dolores St San Francisco USA 15 Bready Shop CA 94119
3 332 Hill St San Francisco USA 25 Super River California 94114
4 3995 23rd St San Francisco USA 10 Ben's Shop CA 94114
5 1056 Sanchez St San Francisco USA 12 Sanchez California
6 551 Alvarado St San Francisco USA 20 Richvalley CA 94114
In [33]:
df3=pandas.read_excel("supermarkets.xlsx",sheet_name=0)
df3
Out[33]:
ID Address City State Country Supermarket Name Number of Employees
0 1 3666 21st St San Francisco CA 94114 USA Madeira 8
1 2 735 Dolores St San Francisco CA 94119 USA Bready Shop 15
2 3 332 Hill St San Francisco California 94114 USA Super River 25
3 4 3995 23rd St San Francisco CA 94114 USA Ben's Shop 10
4 5 1056 Sanchez St San Francisco California USA Sanchez 12
5 6 551 Alvarado St San Francisco CA 94114 USA Richvalley 20
In [45]:
df4=pandas.read_csv("supermarkets-commas.txt") # aynı csv gibi kaydedilmiş.
df4
Out[45]:
ID Address City State Country Name Employees
0 1 3666 21st St San Francisco CA 94114 USA Madeira 8
1 2 735 Dolores St San Francisco CA 94119 USA Bready Shop 15
2 3 332 Hill St San Francisco California 94114 USA Super River 25
3 4 3995 23rd St San Francisco CA 94114 USA Ben's Shop 10
4 5 1056 Sanchez St San Francisco California USA Sanchez 12
5 6 551 Alvarado St San Francisco CA 94114 USA Richvalley 20
In [48]:
df5=pandas.read_csv("supermarkets-semi-colons.txt",sep=';') # aynı csv gibi kaydedilmiş.
df5
Out[48]:
ID Address City State Country Name Employees
0 1 3666 21st St San Francisco CA 94114 USA Madeira 8
1 2 735 Dolores St San Francisco CA 94119 USA Bready Shop 15
2 3 332 Hill St San Francisco California 94114 USA Super River 25
3 4 3995 23rd St San Francisco CA 94114 USA Ben's Shop 10
4 5 1056 Sanchez St San Francisco California USA Sanchez 12
5 6 551 Alvarado St San Francisco CA 94114 USA Richvalley 20
In [56]:
pandas.read_csv? # yardım almak istiyorsanız.
#sep parametresine bakmak istedim.
  File "<ipython-input-56-a94def789d5d>", line 1
    pandas.read_csv? # yardım almak istiyorsanız.
                   ^
SyntaxError: invalid syntax
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [66]:
df7=pandas.read_json("https://pythonhow.com/supermarkets.json") #access to web
df7
Out[66]:
Address City Country Employees ID Name State
0 3666 21st St San Francisco USA 8 1 Madeira CA 94114
1 735 Dolores St San Francisco USA 15 2 Bready Shop CA 94119
2 332 Hill St San Francisco USA 25 3 Super River California 94114
3 3995 23rd St San Francisco USA 10 4 Ben's Shop CA 94114
4 1056 Sanchez St San Francisco USA 12 5 Sanchez California
5 551 Alvarado St San Francisco USA 20 6 Richvalley CA 94114
In [67]:
df7.set_index("Address")
Out[67]:
City Country Employees ID Name State
Address
3666 21st St San Francisco USA 8 1 Madeira CA 94114
735 Dolores St San Francisco USA 15 2 Bready Shop CA 94119
332 Hill St San Francisco USA 25 3 Super River California 94114
3995 23rd St San Francisco USA 10 4 Ben's Shop CA 94114
1056 Sanchez St San Francisco USA 12 5 Sanchez California
551 Alvarado St San Francisco USA 20 6 Richvalley CA 94114
In [68]:
df7=df7.set_index("Address") #update data
In [70]:
df7.set_index?
In [72]:
df7.loc["735 Dolores St","Country"]
Out[72]:
'USA'
In [71]:
df7.loc["735 Dolores St":"332 Hill St","Country":"ID"]
Out[71]:
Country Employees ID
Address
735 Dolores St USA 15 2
332 Hill St USA 25 3
In [77]:
list(df7.loc[:,"Country"])
Out[77]:
['USA', 'USA', 'USA', 'USA', 'USA', 'USA']
In [80]:
df7.iloc[2,1:3+1] #kaç tane eleman istiyorsan bir fazlasını yaz.
Out[80]:
Country      USA
Employees     25
ID             3
Name: 332 Hill St, dtype: object
In [81]:
df7.iloc[:,1:3+1] #kaç tane eleman istiyorsan bir fazlasını yaz.
Out[81]:
Country Employees ID
Address
3666 21st St USA 8 1
735 Dolores St USA 15 2
332 Hill St USA 25 3
3995 23rd St USA 10 4
1056 Sanchez St USA 12 5
551 Alvarado St USA 20 6
In [83]:
df7.iloc[1:3+1,1:3+1] #kaç tane eleman istiyorsan bir fazlasını yaz.
Out[83]:
Country Employees ID
Address
735 Dolores St USA 15 2
332 Hill St USA 25 3
3995 23rd St USA 10 4
In [85]:
df7.ix[3,"Name"] # Çalışıyor ama daha kullanılmıyor.
C:\IntelPython3\lib\site-packages\ipykernel_launcher.py:1: DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.
Out[85]:
"Ben's Shop"
In [87]:
df7
Out[87]:
City Country Employees ID Name State
Address
3666 21st St San Francisco USA 8 1 Madeira CA 94114
735 Dolores St San Francisco USA 15 2 Bready Shop CA 94119
332 Hill St San Francisco USA 25 3 Super River California 94114
3995 23rd St San Francisco USA 10 4 Ben's Shop CA 94114
1056 Sanchez St San Francisco USA 12 5 Sanchez California
551 Alvarado St San Francisco USA 20 6 Richvalley CA 94114
In [96]:
#tabloda bir özelliği gösterme yada...
df7.drop("City",axis=1) #ikinci parametre axis parametresidir.
# axis=0 ise satır, axis=1 ise sütun anlamına gelir.
Out[96]:
Country Employees ID Name State
Address
3666 21st St USA 8 1 Madeira CA 94114
735 Dolores St USA 15 2 Bready Shop CA 94119
332 Hill St USA 25 3 Super River California 94114
3995 23rd St USA 10 4 Ben's Shop CA 94114
1056 Sanchez St USA 12 5 Sanchez California
551 Alvarado St USA 20 6 Richvalley CA 94114
In [98]:
df7.drop("332 Hill St",0)
# axis=0 ise satır, axis=1 ise sütun anlamına gelir.
Out[98]:
City Country Employees ID Name State
Address
3666 21st St San Francisco USA 8 1 Madeira CA 94114
735 Dolores St San Francisco USA 15 2 Bready Shop CA 94119
3995 23rd St San Francisco USA 10 4 Ben's Shop CA 94114
1056 Sanchez St San Francisco USA 12 5 Sanchez California
551 Alvarado St San Francisco USA 20 6 Richvalley CA 94114
In [99]:
df7.drop?
In [91]:
df7.drop(df7.index[0:3],0)
# axis=0 ise satır, axis=1 ise sütun anlamına gelir.
Out[91]:
City Country Employees ID Name State
Address
3995 23rd St San Francisco USA 10 4 Ben's Shop CA 94114
1056 Sanchez St San Francisco USA 12 5 Sanchez California
551 Alvarado St San Francisco USA 20 6 Richvalley CA 94114
In [100]:
df7.drop(df7.columns[0:3],1)
# axis=0 ise satır, axis=1 ise sütun anlamına gelir.
Out[100]:
ID Name State
Address
3666 21st St 1 Madeira CA 94114
735 Dolores St 2 Bready Shop CA 94119
332 Hill St 3 Super River California 94114
3995 23rd St 4 Ben's Shop CA 94114
1056 Sanchez St 5 Sanchez California
551 Alvarado St 6 Richvalley CA 94114
In [107]:
df7.drop(columns=["ID","State"])
# axis belirtmiyorsanız özellik ise columns ifadesiyle, değilse başına birşey koymadan yazabilirsiniz.
Out[107]:
City Country Employees Name
Address
3666 21st St San Francisco USA 8 Madeira
735 Dolores St San Francisco USA 15 Bready Shop
332 Hill St San Francisco USA 25 Super River
3995 23rd St San Francisco USA 10 Ben's Shop
1056 Sanchez St San Francisco USA 12 Sanchez
551 Alvarado St San Francisco USA 20 Richvalley
In [109]:
df7.drop(["3995 23rd St","551 Alvarado St"])
Out[109]:
City Country Employees ID Name State
Address
3666 21st St San Francisco USA 8 1 Madeira CA 94114
735 Dolores St San Francisco USA 15 2 Bready Shop CA 94119
332 Hill St San Francisco USA 25 3 Super River California 94114
1056 Sanchez St San Francisco USA 12 5 Sanchez California
In [121]:
print(df7.index) #tüm satırların adları
print("lenght: "+str( len(df7.index)) ) #tüm satırların adları
print(df7.columns) #tüm sütunların adları
print("lenght: "+str( len(df7.columns)) ) #tüm sütunların adları
print("shape(size in matlab): ",df7.shape)
Index(['3666 21st St', '735 Dolores St', '332 Hill St', '3995 23rd St',
       '1056 Sanchez St', '551 Alvarado St'],
      dtype='object', name='Address')
lenght: 6
Index(['City', 'Country', 'Employees', 'ID', 'Name', 'State', 'Continent'], dtype='object')
lenght: 7
shape(size in matlab):  (6, 7)
In [122]:
#Length of values does not match length of index. Satır sayısı kadar eklemek gerek.
#df7["Continent"]=["North Amerika"] # böyle bir özellik yok ve eklemek istiyorsunuz.
df7["Continent"]=["North Amerika"]*df7.shape[0] # şimdilik satır sayısı kadar tekrarladım.
df7 # oluşan yeni tabloda "Continent" özelliği var.
#biraz daha ilerletelim.
Out[122]:
City Country Employees ID Name State Continent
Address
3666 21st St San Francisco USA 8 1 Madeira CA 94114 North Amerika
735 Dolores St San Francisco USA 15 2 Bready Shop CA 94119 North Amerika
332 Hill St San Francisco USA 25 3 Super River California 94114 North Amerika
3995 23rd St San Francisco USA 10 4 Ben's Shop CA 94114 North Amerika
1056 Sanchez St San Francisco USA 12 5 Sanchez California North Amerika
551 Alvarado St San Francisco USA 20 6 Richvalley CA 94114 North Amerika
In [125]:
df7["Continent"]=df7["Country"]+","+"North Amerika"
df7
Out[125]:
City Country Employees ID Name State Continent
Address
3666 21st St San Francisco USA 8 1 Madeira CA 94114 USA,North Amerika
735 Dolores St San Francisco USA 15 2 Bready Shop CA 94119 USA,North Amerika
332 Hill St San Francisco USA 25 3 Super River California 94114 USA,North Amerika
3995 23rd St San Francisco USA 10 4 Ben's Shop CA 94114 USA,North Amerika
1056 Sanchez St San Francisco USA 12 5 Sanchez California USA,North Amerika
551 Alvarado St San Francisco USA 20 6 Richvalley CA 94114 USA,North Amerika
In [128]:
# Transpoze: staırları sütun sütunları satır yapar.
df7_T=df7.T
df7_T
Out[128]:
Address 3666 21st St 735 Dolores St 332 Hill St 3995 23rd St 1056 Sanchez St 551 Alvarado St
City San Francisco San Francisco San Francisco San Francisco San Francisco San Francisco
Country USA USA USA USA USA USA
Employees 8 15 25 10 12 20
ID 1 2 3 4 5 6
Name Madeira Bready Shop Super River Ben's Shop Sanchez Richvalley
State CA 94114 CA 94119 California 94114 CA 94114 California CA 94114
Continent USA,North Amerika USA,North Amerika USA,North Amerika USA,North Amerika USA,North Amerika USA,North Amerika
In [143]:
#transpoze satır eklemeyi kolaylaştırabilir.
df7_T["My Address"]=["Mersin","My Country",10,7,"My Shop","My State","My Continent"]
#olan bir satırı yada sütunu değiştirir. yoksa ekler.
df7_T["3666 21st St"]=["İstanbul","Your Country",10,7,"Your Shop","Your State","Your Continent"]
df7_T
Out[143]:
Address 3666 21st St 735 Dolores St 332 Hill St 3995 23rd St 1056 Sanchez St 551 Alvarado St My Address
City İstanbul San Francisco San Francisco San Francisco San Francisco San Francisco Mersin
Country Your Country USA USA USA USA USA My Country
Employees 10 15 25 10 12 20 10
ID 7 2 3 4 5 6 7
Name Your Shop Bready Shop Super River Ben's Shop Sanchez Richvalley My Shop
State Your State CA 94119 California 94114 CA 94114 California CA 94114 My State
Continent Your Continent USA,North Amerika USA,North Amerika USA,North Amerika USA,North Amerika USA,North Amerika My Continent
In [149]:
df7=df7_T.T #tekrar transpoze alarak satır eklemeyi tamamladım.
df7
Out[149]:
City Country Employees ID Name State Continent
Address
3666 21st St İstanbul Your Country 10 7 Your Shop Your State Your Continent
735 Dolores St San Francisco USA 15 2 Bready Shop CA 94119 USA,North Amerika
332 Hill St San Francisco USA 25 3 Super River California 94114 USA,North Amerika
3995 23rd St San Francisco USA 10 4 Ben's Shop CA 94114 USA,North Amerika
1056 Sanchez St San Francisco USA 12 5 Sanchez California USA,North Amerika
551 Alvarado St San Francisco USA 20 6 Richvalley CA 94114 USA,North Amerika
My Address Mersin My Country 10 7 My Shop My State My Continent
In [150]:
print(type(df7))
<class 'pandas.core.frame.DataFrame'>

Yorumlar

Popüler Yayınlar