Reputation
Badges 1
25 × Eureka!I think that by default the zipped package files are 0.5GB
(you can control it None look for --chunk-size)
I think the missing part of the api is understanding which chunk your specific file stored in.
You can do something like:
ds = Dataset.get(...)
the_artifact_chunk_I_need = ds.file_entries_dict["myt/file/here"].artifact_name
wdyt?
maybe worth to add an interface ?
Basically it gives it direct access to the host, this is why it is considered less safe (access on other levels as well, like network)
OddAlligator72 sure thing π
This should sort it out:Task.init('examples', 'train', continue_last_task=True)
If you want to continue a specific Task:continue_last_task='task_id_here'
Getting the previous model:last_checkopoint = task.models['output'][-1]
What do you think?
It should also work with host IP and two docker compose files.
I'm not sure where to push a for a unified docker compose?
But I do not know how it can help me:(
In your code itself after the Task.init
call add:task.set_initial_iteration(0)
See reply here:
https://github.com/allegroai/clearml/issues/496#issuecomment-980037382
RoughTiger69 I think you need the latest version (+1.3.0 with UI support)
If you are using an older version, you need to specify that you are continuing an execution (Change the "Configuration/Args/continue_pipeline" to True)
EDIT: clearml 1.3.x will work with clearml-server 1.2
Hi WackyRabbit7
Yes, we definitely need to work on wording there ...
"Dynamic" means you register a pandas object that you are constantly logging into while training, think for example the image files you are feeding into the network. Then Trains will make sure it is constantly updated & uploaded so you have a way to later verify/compare different runs and detect dataset contemplation etc.
"Static" is just, this is my object/file upload and store it as an artifact for me ...
Make sense ?
That is correct. Unfortunately though this is not part of the open source, this means that for the open source it might be a bit more hands-on to deploy an llm model
Nicely done DeterminedToad86 π
Wasn't this issue resolved by torch?
EnviousStarfish54 generally speaking the hyper parameters are flat key/value pairs. you can have as many sections as you like, but inside each section, key/value pairs. If you pass a nested dict, it will be stored as path/to/key:value (as you witnessed).
If you need to store a more complicated configuration dict (nesting, lists etc), use the connect_configuration, it will convert your dict to text (in HOCON format) and store that.
In both cases you can edit the configuration and then when ru...
okay but still I want to take only a row of each artifact
What do you mean?
How do I get from the node to the task object?
pipeline_task = Task.get_task(task_id=Task.current_task().parent)
For classification it's F1 score but for other task it maybe and I don't think that's problem. we just have to log it right?
Correct π
Give me few days, I will work on your sugestions and then let you know if I am not able to do this
Sounds good!
BTW:previous_tasks = Task.get_tasks(task_filter={'tags': 'best'}) local_model_file = previous_tasks[0].artifcats['my_model'].get_local_copy()
In theory it should have worked.
Can you send me the full Task log? (with cache and everything?)
I suspect since these are not the default folders, something is misconfigured / missing
(you can DM the log, so it won't end on a public the channel))
DeterminedToad86 I suspect that since it was executed on sagemaker it registered a specific package that is unique for Sagemaker (no to worry installed packages can be edited after you clone/reset the Task)
What do you already have working from the above steps ? and which parts are missing or we can think of automating ?
the hack doesn't work if conda is not installedΒ
Of course conda needs to be installed, it is using a pre-existing conda env, no?! what am I missing
Ideally it would just pull an experiment from a dedicated HPO queue and run it inplace
And the assumption is the code is also there ?
So you are uploading a local file (stored in a Dataset) into GS bucket? may I ask why ?
Regrading usage (I might have a typo but this is the gist):torageManager.upload_file( local_file=separated_file_posix_path, remote_url=remote_file_path + separated_file_posix_path.relative_to(files_rgb) )
Notice that you need to provide the full upload URL (including path and file name to be used on your GS storage)
No should be fine... Let me see if I can get a windows box π
- Could you explain how I can reproduce the missing jupyter notebook (i.e. the ipykernel_launcher.py)
Sure thing, hopefully I'll remember to ping tomorrow once GitHub is synced, I'd appreciate it if you could verify the fix works π
Hi UptightBeetle98
The hyper parameter example assumes you have agents ( trains-agent
) connected to your account. These agents will pull the jobs from the queue (which they are now, aka pending) setup the environment for the jobs (venv or docker+venv) and execute the job with the specific arguments the optimizer chose.
Make sense ?
Hi @<1546303293918023680:profile|MiniatureRobin9>
Im not sure to understand the difference between a worker and an agent.
hmm we should probably make that clearer π
agent = the clearml-agent instance running on the machine
worker is the system term representing the instance of the agent
You can have one machine with multiple agents (i.e. multiple workers) running on it.
Does that make sense ?
Sure thing, feel free to ping π