'.', skip_folder_re='^[_.]', folder_re='core', file_glob='*.*py*', file_re='c') globtastic(
(#5) ['./fastcore/docments.py','./fastcore/dispatch.py','./fastcore/basics.py','./fastcore/docscrape.py','./fastcore/script.py']
Utilities (other than extensions to Pathlib.Path) for dealing with IO.
walk (path:pathlib.Path|str, symlinks:bool=True, keep_file:<built- infunctioncallable>=<function ret_true>, keep_folder:<built- infunctioncallable>=<function ret_true>, skip_folder:<built- infunctioncallable>=<function ret_false>, func:<built- infunctioncallable>=<function join>, ret_folders:bool=False)
Generator version of os.walk
, using functions to filter files and folders
Type | Default | Details | |
---|---|---|---|
path | pathlib.Path | str | path to start searching | |
symlinks | bool | True | follow symlinks? |
keep_file | callable | ret_true | function that returns True for wanted files |
keep_folder | callable | ret_true | function that returns True for folders to enter |
skip_folder | callable | ret_false | function that returns True for folders to skip |
func | callable | join | function to apply to each matched file |
ret_folders | bool | False | return folders, not just files |
globtastic (path:pathlib.Path|str, recursive:bool=True, symlinks:bool=True, file_glob:str=None, file_re:str=None, folder_re:str=None, skip_file_glob:str=None, skip_file_re:str=None, skip_folder_re:str=None, func:<built- infunctioncallable>=<function join>, ret_folders:bool=False)
A more powerful glob
, including regex matches, symlink handling, and skip parameters
Type | Default | Details | |
---|---|---|---|
path | pathlib.Path | str | path to start searching | |
recursive | bool | True | search subfolders |
symlinks | bool | True | follow symlinks? |
file_glob | str | None | Only include files matching glob |
file_re | str | None | Only include files matching regex |
folder_re | str | None | Only enter folders matching regex |
skip_file_glob | str | None | Skip files matching glob |
skip_file_re | str | None | Skip files matching regex |
skip_folder_re | str | None | Skip folders matching regex, |
func | callable | join | function to apply to each matched file |
ret_folders | bool | False | return folders, not just files |
Returns | L | Paths to matched files |
(#5) ['./fastcore/docments.py','./fastcore/dispatch.py','./fastcore/basics.py','./fastcore/docscrape.py','./fastcore/script.py']
maybe_open (f, mode='r', **kwargs)
Context manager: open f
if it is a path (and close on exit)
This is useful for functions where you want to accept a path or file. maybe_open
will not close your file handle if you pass one in.
For example, we can use this to reimplement imghdr.what
from the Python standard library, which is written in Python 3.9 as:
Here’s an example of the use of this function:
With maybe_open
, Self
, and L.map_first
, we can rewrite this in a much more concise and (in our opinion) clear way:
…and we can check that it still works:
…along with the version passing a file handle:
…along with the h
parameter version:
mkdir (path, exist_ok=False, parents=False, overwrite=False, **kwargs)
Creates and returns a directory defined by path
, optionally removing previous existing directory if overwrite
is True
with tempfile.TemporaryDirectory() as d:
path = Path(os.path.join(d, 'new_dir'))
new_dir = mkdir(path)
assert new_dir.exists()
test_eq(new_dir, path)
# test overwrite
with open(new_dir/'test.txt', 'w') as f: f.writelines('test')
test_eq(len(list(walk(new_dir))), 1) # assert file is present
new_dir = mkdir(new_dir, overwrite=True)
test_eq(len(list(walk(new_dir))), 0) # assert file was deleted
image_size (fn)
Tuple of (w,h) for png, gif, or jpg; None
otherwise
bunzip (fn)
bunzip fn
, raising exception if output already exists
loads (s, **kw)
Same as json.loads
, but handles None
loads_multi (s:str)
Generator of >=0 decoded json dicts, possibly with non-json ignored text at start and end
dumps (obj, **kw)
Same as json.dumps
, but uses ujson
if available
untar_dir (fname, dest, rename=False, overwrite=False)
untar file
into dest
, creating a directory if the root contains more than one item
If the contents of fname
contain just one file or directory, it is placed directly in dest
:
If rename
then the directory created is named based on the archive, without extension:
If the contents of fname
contain multiple files and directories, a new folder in dest
is created with the same name as fname
(but without extension):
repo_details (url)
Tuple of owner,name
from ssh or https git repo url
test_eq(repo_details('https://github.com/fastai/fastai.git'), ['fastai', 'fastai'])
test_eq(repo_details('[email protected]:fastai/nbdev.git\n'), ['fastai', 'nbdev'])
run (cmd, *rest, same_in_win=False, ignore_ex=False, as_bytes=False, stderr=False)
Pass cmd
(splitting with shlex
if string) to subprocess.run
; return stdout
; raise IOError
if fails
You can pass a string (which will be split based on standard shell rules), a list, or pass args directly:
run('echo', same_in_win=True)
run('pip', '--version', same_in_win=True)
run(['pip', '--version'], same_in_win=True)
'pip 23.3.1 from /Users/jhoward/miniconda3/lib/python3.11/site-packages/pip (python 3.11)'
Some commands fail in non-error situations, like grep
. Use ignore_ex
in those cases, which will return a tuple of stdout and returncode:
run
automatically decodes returned bytes to a str
. Use as_bytes
to skip that:
open_file (fn, mode='r', **kwargs)
Open a file, with optional compression if gz or bz2 suffix
save_pickle (fn, o)
Save a pickle file, to a file name or opened file
load_pickle (fn)
Load a pickle file from a file name or opened file
for suf in '.pkl','.bz2','.gz':
# delete=False is added for Windows
# https://stackoverflow.com/questions/23212435/permission-denied-to-write-to-my-temporary-file
with tempfile.NamedTemporaryFile(suffix=suf, delete=False) as f:
fn = Path(f.name)
save_pickle(fn, 't')
t = load_pickle(fn)
f.close()
test_eq(t,'t')
parse_env (s:str=None, fn:Union[str,pathlib.Path]=None)
Parse a shell-style environment string or file
testf = """# comment
# another comment
export FOO="bar#baz"
BAR=thing # comment "ok"
baz='thong'
QUX=quux
export ZAP = "zip" # more comments
FOOBAR = 42 # trailing space and comment"""
exp = dict(FOO='bar#baz', BAR='thing', baz='thong', QUX='quux', ZAP='zip', FOOBAR='42')
test_eq(parse_env(testf), exp)
expand_wildcards (code)
Expand all wildcard imports in the given code string.
inp = """from math import *
from os import *
from random import *
def func(): return sin(pi) + path.join('a', 'b') + randint(1, 10)"""
exp = """from math import pi, sin
from os import path
from random import randint
def func(): return sin(pi) + path.join('a', 'b') + randint(1, 10)"""
test_eq(expand_wildcards(inp), exp)
inp = """from itertools import *
def func(): pass"""
test_eq(expand_wildcards(inp), inp)
inp = """def outer():
from math import *
def inner():
from os import *
return sin(pi) + path.join('a', 'b')"""
exp = """def outer():
from math import pi, sin
def inner():
from os import path
return sin(pi) + path.join('a', 'b')"""
test_eq(expand_wildcards(inp), exp)
dict2obj (d, list_func=<class 'fastcore.foundation.L'>, dict_func=<class 'fastcore.basics.AttrDict'>)
Convert (possibly nested) dicts (or lists of dicts) to AttrDict
This is a convenience to give you “dotted” access to (possibly nested) dictionaries, e.g:
It can also be used on lists of dicts.
obj2dict (d)
Convert (possibly nested) AttrDicts (or lists of AttrDicts) to dict
obj2dict
can be used to reverse what is done by dict2obj
:
repr_dict (d)
Print nested dicts and lists, such as returned by dict2obj
is_listy (x)
isinstance(x, (tuple,list,L,slice,Generator))
mapped (f, it)
map f
over it
, unless it’s not listy, in which case return f(it)
The following methods are added to the standard python libary Pathlib.Path.
Path.readlines (hint=-1, encoding='utf8')
Read the content of self
Path.read_json (encoding=None, errors=None)
Same as read_text
followed by loads
Path.mk_write (data, encoding=None, errors=None, mode=511)
Make all parent dirs of self
, and write data
Path.relpath (start=None)
Same as os.path.relpath
, but returns a Path
, and resolves symlinks
Path.ls (n_max=None, file_type=None, file_exts=None)
Contents of path as a list
We add an ls()
method to pathlib.Path
which is simply defined as list(Path.iterdir())
, mainly for convenience in REPL environments such as notebooks.
path = Path()
t = path.ls()
assert len(t)>0
t1 = path.ls(10)
test_eq(len(t1), 10)
t2 = path.ls(file_exts='.ipynb')
assert len(t)>len(t2)
t[0]
Path('000_tour.ipynb')
You can also pass an optional file_type
MIME prefix and/or a list of file extensions.
lib_path = (path/'../fastcore')
txt_files=lib_path.ls(file_type='text')
assert len(txt_files) > 0 and txt_files[0].suffix=='.py'
ipy_files=path.ls(file_exts=['.ipynb'])
assert len(ipy_files) > 0 and ipy_files[0].suffix=='.ipynb'
txt_files[0],ipy_files[0]
(Path('../fastcore/shutil.py'), Path('000_tour.ipynb'))
Path.__repr__ ()
Return repr(self).
fastai also updates the repr
of Path
such that, if Path.BASE_PATH
is defined, all paths are printed relative to that path (as long as they are contained in Path.BASE_PATH
:
Path.delete ()
Delete a file, symlink, or directory tree
ReindexCollection (coll, idxs=None, cache=None, tfm=<function noop>)
Reindexes collection coll
with indices idxs
and optional LRU cache of size cache
This is useful when constructing batches or organizing data in a particular manner (i.e. for deep learning). This class is primarly used in organizing data for language models in fastai.
You can supply a custom index upon instantiation with the idxs
argument, or you can call the reindex
method to supply a new index for your collection.
Here is how you can reindex a list such that the elements are reversed:
['e', 'd', 'c', 'b', 'a']
Alternatively, you can use the reindex
method:
ReindexCollection.reindex (idxs)
Replace self.idxs
with idxs
['e', 'd', 'c', 'b', 'a']
You can optionally specify a LRU cache, which uses functools.lru_cache upon instantiation:
sz = 50
t = ReindexCollection(L.range(sz), cache=2)
#trigger a cache hit by indexing into the same element multiple times
t[0], t[0]
t._get.cache_info()
CacheInfo(hits=1, misses=1, maxsize=2, currsize=1)
You can optionally clear the LRU cache by calling the cache_clear
method:
ReindexCollection.cache_clear ()
Clear LRU cache
sz = 50
t = ReindexCollection(L.range(sz), cache=2)
#trigger a cache hit by indexing into the same element multiple times
t[0], t[0]
t.cache_clear()
t._get.cache_info()
CacheInfo(hits=0, misses=0, maxsize=2, currsize=0)
ReindexCollection.shuffle ()
Randomly shuffle indices
Note that an ordered index is automatically constructed for the data structure even if one is not supplied.
['a', 'd', 'h', 'c', 'e', 'b', 'f', 'g']
sz = 50
t = ReindexCollection(L.range(sz), cache=2)
test_eq(list(t), range(sz))
test_eq(t[sz-1], sz-1)
test_eq(t._get.cache_info().hits, 1)
t.shuffle()
test_eq(t._get.cache_info().hits, 1)
test_ne(list(t), range(sz))
test_eq(set(t), set(range(sz)))
t.cache_clear()
test_eq(t._get.cache_info().hits, 0)
test_eq(t.count(0), 1)
get_source_link (func)
Return link to func
in source code
get_source_link
allows you get a link to source code related to an object. For nbdev related projects such as fastcore, we can get the full link to a GitHub repo. For nbdev
projects, be sure to properly set the git_url
in settings.ini
(derived from lib_name
and branch
on top of the prefix you will need to adapt) so that those links are correct.
For example, below we get the link to fastcore.test.test_eq
:
assert 'fastcore/test.py' in get_source_link(test_eq)
assert get_source_link(test_eq).startswith('https://github.com/fastai/fastcore')
get_source_link(test_eq)
'https://github.com/fastai/fastcore/tree/master/fastcore/test.py#L35'
truncstr (s:str, maxlen:int, suf:str='…', space='')
Truncate s
to length maxlen
, adding suffix suf
if truncated
sparkline (data, mn=None, mx=None, empty_zero=False)
Sparkline for data
, with None
s (and zero, if empty_zero
) shown as empty column
data = [9,6,None,1,4,0,8,15,10]
print(f'without "empty_zero": {sparkline(data, empty_zero=False)}')
print(f' with "empty_zero": {sparkline(data, empty_zero=True )}')
without "empty_zero": ▅▂ ▁▂▁▃▇▅
with "empty_zero": ▅▂ ▁▂ ▃▇▅
You can set a maximum and minimum for the y-axis of the sparkline with the arguments mn
and mx
respectively:
modify_exception (e:Exception, msg:str=None, replace:bool=False)
Modifies e
with a custom message attached
Type | Default | Details | |
---|---|---|---|
e | Exception | An exception | |
msg | str | None | A custom message |
replace | bool | False | Whether to replace e.args with [msg] |
Returns | Exception |
msg = "This is my custom message!"
test_fail(lambda: (_ for _ in ()).throw(modify_exception(Exception(), None)), contains='')
test_fail(lambda: (_ for _ in ()).throw(modify_exception(Exception(), msg)), contains=msg)
test_fail(lambda: (_ for _ in ()).throw(modify_exception(Exception("The first message"), msg)), contains="The first message This is my custom message!")
test_fail(lambda: (_ for _ in ()).throw(modify_exception(Exception("The first message"), msg, True)), contains="This is my custom message!")
round_multiple (x, mult, round_down=False)
Round x
to nearest multiple of mult
set_num_threads (nt)
Get numpy (and others) to use nt
threads
This sets the number of threads consistently for many tools, by:
nt
: OPENBLAS_NUM_THREADS
,NUMEXPR_NUM_THREADS
,OMP_NUM_THREADS
,MKL_NUM_THREADS
nt
threads for numpy and pytorch.join_path_file (file, path, ext='')
Return path/file
if file is a string or a Path
, file otherwise
autostart (g)
Decorator that automatically starts a generator
EventTimer (store=5, span=60)
An event timer with history of store
items of time span
Add events with add
, and get number of events
and their frequency (freq
).
# Random wait function for testing
def _randwait(): yield from (sleep(random.random()/200) for _ in range(100))
c = EventTimer(store=5, span=0.03)
for o in _randwait(): c.add(1)
print(f'Num Events: {c.events}, Freq/sec: {c.freq:.01f}')
print('Most recent: ', sparkline(c.hist), *L(c.hist).map('{:.01f}'))
Num Events: 3, Freq/sec: 205.6
Most recent: ▁▁▃▁▇ 254.1 263.2 284.5 259.9 315.7
stringfmt_names (s:str)
Unique brace-delimited names in s
PartialFormatter ()
A string.Formatter
that doesn’t error on missing fields, and tracks missing fields and unused args
partial_format (s:str, **kwargs)
string format s
, ignoring missing field errors, returning missing and extra fields
The result is a tuple of (formatted_string,missing_fields,extra_fields)
, e.g:
utc2local (dt:datetime.datetime)
Convert dt
from UTC to local time
2000-01-01 12:00:00 UTC is 2000-01-01 22:00:00+10:00 local time
local2utc (dt:datetime.datetime)
Convert dt
from local to UTC time
2000-01-01 12:00:00 local is 2000-01-01 02:00:00+00:00 UTC time
trace (f)
Add set_trace
to an existing function f
You can add a breakpoint to an existing function, e.g:
Now, when the function is called it will drop you into the debugger. Note, you must issue the s
command when you begin to step into the function that is being traced.
modified_env (*delete, **replace)
Context manager temporarily modifying os.environ
by deleting delete
and replacing replace
# USER isn't in Cloud Linux Environments
env_test = 'USERNAME' if sys.platform == "win32" else 'SHELL'
oldusr = os.environ[env_test]
replace_param = {env_test: 'a'}
with modified_env('PATH', **replace_param):
test_eq(os.environ[env_test], 'a')
assert 'PATH' not in os.environ
assert 'PATH' in os.environ
test_eq(os.environ[env_test], oldusr)
ContextManagers (mgrs)
Wrapper for contextlib.ExitStack
which enters a collection of context managers
shufflish (x, pct=0.04)
Randomly relocate items of x
up to pct
of len(x)
from their starting location
console_help (libname:str)
Show help for all console scripts from libname
Type | Details | |
---|---|---|
libname | str | name of library for console script listing |
hl_md (s, lang='xml', show=True)
Syntax highlight s
using lang
.
When we display code in a notebook, it’s nice to highlight it, so we create a function to simplify that:
type2str (typ:type)
Stringify typ
dataclass_src (cls)
DC = make_dataclass('DC', [('x', int), ('y', Optional[float], None), ('z', float, None)])
print(dataclass_src(DC))
@dataclass
class DC:
x: int
y: Union[float, None] = None
z: float = None
Unset (value, names=None, module=None, qualname=None, type=None, start=1)
An enumeration.
nullable_dc (cls)
Like dataclass
, but default of UNSET
added to fields without defaults
Person(name='Bob', age=UNSET, city='Unknown')
make_nullable (clas)
@dataclass
class Person: name: str; age: int; city: str = "Unknown"
make_nullable(Person)
Person("Bob", city='NY')
Person(name='Bob', age=UNSET, city='NY')
flexiclass (cls)
Convert cls
into a dataclass
like make_nullable
. Converts in place and also returns the result.
Type | Details | |
---|---|---|
cls | The class to convert | |
Returns | dataclass |
This can be used as a decorator…
Person(name='Bob', age=UNSET, city='Unknown')
…or can update the behavior of an existing class (or dataclass):
class Person: name: str; age: int; city: str = "Unknown"
flexiclass(Person)
bob = Person(name="Bob")
bob
Person(name='Bob', age=UNSET, city='Unknown')
Action occurs in-place:
True
asdict (o)
Convert o
to a dict
, supporting dataclasses, namedtuples, iterables, and __dict__
attrs.
Any UNSET
values are not included.
To customise dict conversion behavior for a class, implement the _asdict
method (this is used in the Python stdlib for named tuples).
is_typeddict (cls:type)
Check if cls
is a TypedDict
is_namedtuple (cls)
True
if cls
is a namedtuple type
flexicache (*funcs, maxsize=128)
Like lru_cache
, but customisable with policy funcs
This is a flexible lru cache function that you can pass a list of functions to. Those functions define the cache eviction policy. For instance, time_policy
is provided for time-based cache eviction, and mtime_policy
evicts based on a file’s modified-time changing. The policy functions are passed the last value that function returned was (initially None
), and return a new value to indicate the cache has expired. When the cache expires, all functions are called with None
to force getting new values.
time_policy (seconds)
A flexicache
policy that expires cached items after seconds
have passed
mtime_policy (filepath)
A flexicache
policy that expires cached items after filepath
modified-time changes
@flexicache(time_policy(10), mtime_policy('000_tour.ipynb'))
def cached_func(x, y): return x+y
cached_func(1,2)
3
@flexicache(time_policy(10), mtime_policy('000_tour.ipynb'))
async def cached_func(x, y): return x+y
await cached_func(1,2)
await cached_func(1,2)
3
timed_cache (seconds=60, maxsize=128)
Like lru_cache
, but also with time-based eviction
This function is a small convenience wrapper for using flexicache
with time_policy
.
@timed_cache(seconds=0.05, maxsize=2)
def cached_func(x): return x * 2, time()
# basic caching
result1, time1 = cached_func(2)
test_eq(result1, 4)
sleep(0.001)
result2, time2 = cached_func(2)
test_eq(result2, 4)
test_eq(time1, time2)
# caching different values
result3, _ = cached_func(3)
test_eq(result3, 6)
# maxsize
_, time4 = cached_func(4)
_, time2_new = cached_func(2)
test_close(time2, time2_new, eps=0.1)
_, time3_new = cached_func(3)
test_ne(time3_new, time())
# time expiration
sleep(0.05)
_, time4_new = cached_func(4)
test_ne(time4_new, time())