Reputation
Badges 1
25 × Eureka!SuperiorDucks36 from code ? or UI?
(You can always clone an experiment and change the entire thing, the question is how will you get the data to fill in the experiment, i.e. repo / arguments / configuration etc)
There is a discussion here, I would love to hear another angle.
https://github.com/allegroai/trains/issues/230
In your code, can you print the following:import os print(os.environ.keys())
There should be a few keys the Pycharm plugin is sending from the local machine, pointing to the git repo
JumpyPig73 you should be able to find in in the bottom pf the page, try scrolling down (it should be after the installed packages)
trains[azure] give you the possibility to do the following:from trains import StorageManager my_local_cached_file = StorageManager.get_local_copy('azure://bucket/folder/file.bin')
This means you do not have to manually download stuff/ and maintain the cache local cache, the StorageManager will do that for you.
If you do no need that ability, no need to install the trains[azure]
you can just install trains
Unfortunately, we haven't had the time to upgrade to the Azure storage v...
Exactly π
If you feel like PR-ing a fix, it will be greatly appreciated π
(This code sample should work on your setup with your installed packages without a problem)
Clearml 1.13.1
Could you try the latest (1.16.2)? I remember there was a fix specific to Datasets
UnevenDolphin73 following the discussion https://clearml.slack.com/archives/CTK20V944/p1643731949324449 , I suggest this change in the pseudo code
` # task code
task = Task.init(...)
if not task.running_locally() and task.is_main_task():
# pre-init stage
StorageManager.download_folder(...) # Prepare local files for execution
else:
StorageManager.upload_file(...) # Repeated for many files needed
task.execute_remotely(...) `Now when I look at is, it kinds of make sense to h...
Hi MiniatureCrocodile39
I would personally recommend the ClearML show π
https://www.youtube.com/watch?v=XpXLMKhnV5k
https://www.youtube.com/watch?v=qz9x7fTQZZ8
I have mounted my s3 bucket at the location /opt/clearml/data/fileserver/ but I can see my data is not being stored in s3 but its storing in ebs. How so?
I'm assuming the mount was not successful
What you should see is a link to the files server inside clearml, and actual files in your S3 bucket
JitteryCoyote63 how can I reproduce it? (obviously when I tested it was okay)
TenseOstrich47
I noticed that with one agent, only one task gets executed at one time
Yes you can π
Also, you are correct, a single agent will run a single Task at a time, that said you can have multiple agents running on the same machine, and when you launch them you specify which GPUs they use (in theory they can share the same GPU, but your code might not like it π )
You can see a few examples here:
https://github.com/allegroai/clearml-agent#running-the-clearml-agent
I'm guessing the extra index URL can be a URL to the github repo of interest?
The extra index URL is exactly what you would be passing to pip install, meaning it has to comply to pypi artifactory api.
Make sense ?
TenseOstrich47 FYI:
This might what you are looking for π
https://github.com/allegroai/clearml-agent/blob/822984301889327ae1a703ffdc56470ad006a951/docs/clearml.conf#L61
My pleasure π
Maybe we should do a webinar... I have a feeling the MLOps aspects are not as straight forward as we would like to think ...
Out of interest, is there a reason these are read-only?
Yes, we should probably change that... they are designed to be pre-populated, but there should not be any reason you could not remove them
The code for these tasks is on github right?
Correct
Actually it is better to leave it as is, it will just automatically mount the .ssh folder into the container, i will make sure the docs point to this option first
Any chance there is an env variable you set to get 1.5.0rc0? Because this is the version that is being used
BTW MagnificentSeaurchin79 just making sure here:
but I don't see the loss plot in scalars
This is only with Detect API ?
Thanks MagnificentSeaurchin79 !
Let me check what's the status with this one, could it be the same as this one?
https://github.com/allegroai/clearml/issues/322
MassiveHippopotamus56
the "iteration" entry is actually the "max reported iteration over all graphs" per graph there is different max iteration. Make sense ?
SlipperyDove40
FYI:args = task.connect(args, name="Args")
Is "kind of" reserved section for argparse. Meaning you can always use it, but argparse will also push/pull things from there. Is there any specific reason for not using a different section name?
Hi OutrageousGiraffe8
I was not able to reproduce π
Python 3.8 Ubuntu + TF 2.8
I get both metrics and model stored and uploaded
Any idea?
Hi TrickyRaccoon92
If you are reporting to tensor-board, then "iteration" equals step. Is this the case?
Hi UnsightlySeagull42
Could you test with the latest RCpip install clearml==1.0.4rc0
Also could you provide some logs?
Hi UpsetCrocodile10
First, I perform many experiments in one process, ...
How about this one:
https://github.com/allegroai/trains/issues/230#issuecomment-723503146
Basically you could utilize create_function_task
This means you have Task.init() on the mainn "controller" and each "train_in_subset" as a "function_task". Them the controller can wait on them, and collect the data (like the HPO does.
Basically:
` controller_task = Task.init(...)
children = []
for i, s in enumer...
UpsetCrocodile10
Does this method expectΒ
my_train_func
Β to be in the same file as
As long as you import it and you can pass it, it should work.
Child exp get's abortedΒ immediately ...
It seems it cannot find the file "main.py" , it assumes all code is part of a single repository, is that the case ? What do you have under the "Execution" tab for the experiment ?
Hi UpsetCrocodile10
execute them and return scalars.
This should be a good start (I hope π )
` for child in children:
put the Task into an execution queue
Task.enqueue(child, queue_name='my_queue_here')
wait for the task to finish
child.wait_for_status(status=['completed'])
reload all the metrics
child.reload()
get the metrics
print(child.get_last_scalar_metrics()) `