Reputation
Badges 1
25 × Eureka!(But in venv mode is also hangs the same way)
Hmm this is strange, could it be you are running out of storage ?
suppose I have an S3 bucket where my data is stored and I wish to transfer it to ClearML file server.
Then you first have to download the entire bucket locally, then register the local copy.
Basically:
StorageManager.download_folder("
", "/target/folder")
# now register the local "/target/folder" with Dataset.add_files
if I encounter the need for that, I will adapt and open a PRΒ
Great!
Can you see it on the console ?
Okay, let's take a step back and I'll explain how things work.
When running the code (initially) and calling Task.init
A new experiment is created on the server, it automatically stores the git repo link, commit ID, and the local uncommitted changes . these are all stored on the experiment in the server.
Now assume the trains-agent is running on a different machine (which is always the case even if it is actually on the same machine).
The trains-agent will create a new virtual-environmen...
Hi SteadyFox10
Yes we changed the Web UI, to something more intuitive (but after you get used to the original design , I guess not that obvious).
After selecting a bunch of experiment, right click one of them, you will be able to archive them all (it will display the number of experiments you are about to archive)
try:
None
docker_install_opencv_libs: true
SoggyBeetle95 the question is, where does clearml stores these arguments, and the answer is on the Task object (from there the agent will take them and apply to the docker execution). Now since all users see all the tasks, they also see these arguments. Wdyt?
My main query is do I wait for it to be a sufficient batch size or do I just send each image as soon as it comes to train
This is usually a cost optimization issue, generally speaking if GPU up time is not an issue that the process is stochastic anyhow, so waiting for a batch or not is not the most important factor (unless you use batchnorm layer, in that case this is basically a must)
I would not be able to split the data into train test splits, and that it would be very expensiv...
So dynamic or static are basically the same thing, just in dynamic, I can edit the artifact while running the experiment?
Correct
Second, why would it be overwritten if I run a different run of the same experiment?
Sorry, I meant in the same run, if you reuse the artifact name you will be overwriting it. Obviously different runs different artifacts :)
you can also just create a venv and run the tests there (with the latest python package) ?
Hi VirtuousFish83 ,
Is it throwing an exception? Are you seeing the plot in the UI but the title is incorrect?
(with older clearml versions thoughβ¦).
Yes, we added content type header for the files when uploading to S3 (so it is easier for users to serve them back). But it seems the python 3.5 casting from Path to str breaks it mimetype call....
at the end it's just another env var
It should work GIT_SSH_COMMAND
is used by pip
The pipeline itself is also a task, so this line works in a pipeline. Task.current_task is a class method that returns the running task (pipeline in our case), then then the usual interface. BTW what are you having in the conf file ?
Hi RoundMosquito25
Hmm I remember this is tricky ... What's the clearml version? also where is the line you had to hack ?
HealthyStarfish45 what exactly did you have in mind, in terms of the widget ?
Hi @<1532532498972545024:profile|LittleReindeer37>
Does Hydra support notebooks ? If it does, can you point to an exapmle?
Yep, automatically moving a tag
No, but you can get the last created/updated one with that tag (so I guess the same?)
meant like the best artifacts.
So artifacts get be retrieved like a dict:
https://github.com/allegroai/clearml/blob/master/examples/reporting/artifacts_retrieval.pyTask.get_task(project_name='examples', task_name='artifacts example').artifacts['name']
From creating the event to actually sending it ... 30 min sounds like enough "time"...
Can you clone the git with the .ssh credentials on the host machine ?
If so, can you do the same manually inside a docker (i.e. spin a docker with mount -v /home/hostuser/.ssh:/root/.ssh) ?
if in the "installed packages" I have all the packages installed from the requirements.txt than I guess I can clone it and use "installed packages"
After the agent finished installing the "requirements.txt" it will put back the entire "pip freeze" into the "installed packages", this means that later we will be able to fully reproduce the working environment, even if packages change (which will eventually happen as we cannot expect everyone to constantly freeze versions)
My problem...
Make sure you have the S3 credentials in your agent's clearml.conf :
https://github.com/allegroai/clearml-agent/blob/822984301889327ae1a703ffdc56470ad006a951/docs/clearml.conf#L210
Interesting, if this is the issue, a simple sleep after reporting should prove it. Wdyt?
BTW are you using the latest package? What's your OS?
PompousBeetle71 , the reason I'm asking is the warning you see is due to the fact it cannot detect the filename you are saving your model to ... I'm trying to figure out how that actually happened .
BTW: in the next version we will probably remove this warning altogether, but I'm still curious on how to reproduce π
Hi SubstantialElk6
We will be running some GUI applications so is it possible to forward the GUI to the clearml-session?
If you can directly access the machine running the agent, yes you could. If not reverse proxy is in the working π
We have a rather locked down environment so I would need a clear view of the network view and the ports associated.
Basically all connections are outgoing only, with the exception of the clearml-server (listening on ports 8008 8080 8081)
BTW: we are now adding "datasets chunks for a more efficient large dataset storage"