
Reputation
Badges 1
981 × Eureka!In execution tab, I see old commit, in logs, I see an empty branch and the old commit
ha sorry itβs actually the number of shards that increased
Ok, now I get ERROR: No matching distribution found for conda==4.9.2 (from -r /tmp/cached-reqscaw2zzji.txt (line 13))
yes but they are in plain text and I would like to avoid that
Downloading the artifacts is done only when actually calling get()/get_local_copy()
Yes, I rather meant: reproduce this behavior even for getting metadata on the artifacts π
/opt/clearml/data/fileserver
does not appear anywhere, sorry for the confusion - Itβs the actual location where the files are stored
CostlyOstrich36 good enough, I will fallback to sorting by updated, thanks!
so most likely one hard requirement installs the version 2 of pyjwt while setting up the experiment
SuccessfulKoala55 They do have the right filepath, eg:https://***.com:8081/my-project-name/experiment_name.b1fd9df5f4d7488f96d928e9a3ab7ad4/metrics/metric_name/predictions/sample_00000001.png
I can ssh into the agent and:source /trains-agent-venv/bin/activate (trains_agent_venv) pip show pyjwt Version: 1.7.1
You already fixed the problem with pyjwt in the newest version of clearml/clearml-agents, so all good π
TimelyPenguin76 , no, Iβve only set the sdk.aws.s3.region = eu-central-1
param
I am confused now because I see in the master branch, the clearml.conf file has the following section:# Or enable credentials chain to let Boto3 pick the right credentials. # This includes picking credentials from environment variables, # credential file and IAM role using metadata service. # Refer to the latest Boto3 docs use_credentials_chain: false
So it states that IAM role using metadata service should be supported, right?
SuccessfulKoala55 I was able to make it work with use_credentials_chain: true
in the clearml.conf and the following patch: https://github.com/allegroai/clearml/pull/478
Why is it required in the case where boto3 can figure them out itself within the ec2 instance?
I will go for lunch actually π back in ~1h
Yea I really need that feature, I need to move away from key/secrets to iam roles
There is no need to add creds on the machine, since the EC2 instance has an attached IAM profile that grants access to s3. Boto3 is able retrieve the files from the s3 bucket
SuccessfulKoala55 Could you please point me to where I could quickly patch that in the code?
AgitatedDove14 awesome! by "include it all" do you mean wizard for azure and gcp?
Thanks for the hack! The use case is the following: I have a controler that creates training/validation/testing tasks by cloning (so that the parent task id is properly set to the controler). Otherwise I could simply create these tasks with Task.init, but then I would need to set manually the parent task for each one of these tasks, probably with a similar hack, right?
Hi AgitatedDove14 , thanks for the answer! I will try adding 'multiprocessing_context='forkserver' to the DataLoader. In the issue you linked, nirraviv mentionned that forkserver was slower and shared a link to another issue https://github.com/pytorch/pytorch/issues/15849#issuecomment-573921048 where someone implemented a fast variant of the DataLoader to overcome the speed problem.
Did you experiment any drop of performances using forkserver? If yes, did you test the variant suggested i...
Hi AgitatedDove14 , so I ran 3 experiments:
One with my current implementation (using "fork") One using "forkserver" One using "forkserver" + the DataLoader optimizationI sent you the results via MP, here are the outcomes:
fork -> 101 mins, low RAM usage (5Go constant), almost no IO forkserver -> 123 mins, high RAM usage (16Go, fluctuations), high IO forkserver + DataLoader optimization: 105 mins, high RAM usage (from 28Go to 16Go), high IO
CPU/GPU curves are the same for the 3 experiments...
Yes, but a minor one. I would need to do more experiments to understand what is going on with pip skipping some packages but reinstalling others.