SDDK

A Python library for managing data files on ScienceData

An SDDK session is initiated with

cloudSession(provider, shared_folder_name, owner, group_folder_name, user_name)

None of the arguments are mandatory. For in-depth info, see

https://github.com/sdam-au/sddk_py

Writing and reading objects to your ScienceData home directory

First we need to import the library and configure an endpoint. The endpoint "sciencedata" is mapped by the pod to the private IP address of your ScienceData home silo. You can access it w/o username/password.

In [1]:
from sddk import cloudSession

s = cloudSession("sciencedata")
endpoint variable has been configured to: https://sciencedata/files/
In [2]:
from IPython.display import Image, display

Create some objects.

In [3]:
### Python "str" object
string_object =  "string content"
### Python "list" object
list_object = ['a', 'b', 'c', 'd']
### Python "dictionary" object
dict_object = {"a" : 1, "b" : 2, "c":3 }
### Pandas dataframe object
import pandas as pd
dataframe_object = pd.DataFrame([("a1", "b1", "c1"), ("a2", "b2", "c2")], columns=["a", "b", "c"]) 
### Matplotlib figure object
import matplotlib.pyplot as plt
figure_object = plt.figure() # generate object
plt.plot(range(10)) # fill it by plotted values
Out[3]:
[<matplotlib.lines.Line2D at 0x7f345deaf490>]

Write the objects to a directory, tmp, on ScienceData. Replace tmp with an existing directory or go create tmp.

In [5]:
s.write_file("tmp/test_string.txt", string_object)
s.write_file("tmp/test_list.json", list_object)
s.write_file("tmp/test_dict.json", dict_object)
s.write_file("tmp/test_dataframe.json", dataframe_object)
s.write_file("tmp/test_figure.png", figure_object)
Your <class 'str'> object has been succesfully written as "https://sciencedata/files/tmp/test_string.txt"
Your <class 'list'> object has been succesfully written as "https://sciencedata/files/tmp/test_list.json"
Your <class 'dict'> object has been succesfully written as "https://sciencedata/files/tmp/test_dict.json"
Your <class 'pandas.core.frame.DataFrame'> object has been succesfully written as "https://sciencedata/files/tmp/test_dataframe.json"
Your <class 'matplotlib.figure.Figure'> object has been succesfully written as "https://sciencedata/files/tmp/test_figure.png"

Read back the objects from ScienceData.

In [4]:
string_object = s.read_file("tmp/test_string.txt", "str")

string_object
Out[4]:
'string content'
In [5]:
dataframe_object = s.read_file("tmp/test_dataframe.json")

dataframe_object
Out[5]:
a b c
0 a1 b1 c1
1 a2 b2 c2
In [6]:
figure_object = s.read_file("tmp/test_figure.png")

display(Image(figure_object))

The issue of speed

Using the hostname sciencedata w/o providing username/password is convenient and secure - traffic to/from ScienceData proceeds over a private, trusted network. Unfortunately it can also be slow. The transfer speed is not slow, but the authentication process is. This is because your username / access rights need to be inferred from the IP address of your pod, neccessitating a few lookups on every request. Some caching is done, but in general, it's rather heavy.

You can instead provide your username/password. This will greatly speed up things, but be careful not to write the password into your notebook. Instead, simply provide your username to sddkand you'll be queried for your password.

In [7]:
ss = cloudSession("sciencedata", None, None, None, "some_user")
Your ScienceData password: ········
endpoint variable has been configured to: https://sciencedata/files/
In [8]:
ss.read_file("tmp/test_string.txt", "str")
Out[8]:
'string content'

Yup - much faster.

Writing and reading objects to a folder shared with you

In [10]:
sss = cloudSession("sciencedata", "delemappe", "test3", None, "some_user")
Your ScienceData password: ········
connection with shared folder established with you as its ordinary user
endpoint variable has been configured to: https://silo3.sciencedata.dk/sharingout/test3/delemappe/
In [11]:
sss.write_file("test_string.txt", string_object)
Your <class 'str'> object has been succesfully written as "https://silo3.sciencedata.dk/sharingout/test3/delemappe/test_string.txt"
In [12]:
sss.read_file("test_string.txt", "str")
Out[12]:
'string content'
In [ ]: