DepressedChimpanzee34 , this has been reported and should be solved in one of the following versions 🙂
DepressedChimpanzee34 which section are you referring to, can you provide a screenshot of what you mean?
Hi RoundMosquito25 , where is this error coming from? API server?
Hi @<1526734383564722176:profile|BoredBat47> , do you see any errors in the elastic container?
@<1526734383564722176:profile|BoredBat47> , that could indeed be an issue. If the server is still running things could be written in the databases, creating conflicts
Hi @<1529271085315395584:profile|AmusedCat74> , can you please provide the full log of the autoscaler?
Hi @<1523701601770934272:profile|GiganticMole91> , As long as experiments are deleted then their associated scalars are deleted as well.
I'd check the ES container for logs. Additionally, you can always beef up the machine with more RAM to give elastic more to work with.
SarcasticSquirrel56 , you're right. I think you can use the following setting in ~/clearml.conf
: sdk.development.default_output_uri: <S3_BUCKET>
. Tell me if that works
Hi @<1580367711848894464:profile|ApprehensiveRaven81> , I'm not sure what you mean. Can you please elaborate?
Hi DilapidatedDucks58 , I think this might be a bug. Please open a GitHub issue to follow this 🙂
does curl https://<WEBSITE>.<DOMAIN>/v2.14/debug/ping
work for you?
Hi @<1671689458606411776:profile|StormySeaturtle98> , I'm afraid that's not possible. You could rerun the code on the other workspace though 🙂
Hi @<1745616566117994496:profile|FantasticGorilla16> , under the hood the google API is being used - None
Regarding getting machines faster, I think that really depends on availability on Google's side 🙂
Hi @<1544853695869489152:profile|NonchalantOx99> , I think this is the environment variable you're looking for - CLEARML_AGENT_FORCE_SYSTEM_SITE_PACKAGES
None
You can also use agent.package_manager.system_site_packages: true
in your clearml.conf
Discussion moved to internal channels
WackyRabbit7 ,I am noticing that the files are saved locally, is there any chance that the files are over-written during the run or get deleted at some point and then replaced?
Also, is there a reason the files are being saved locally and not at the fileserver?
I couldn't manage to reproduce it on my end. But also in my cases it always saves the files to the fileserver. So I'm curious what's making it save locally in your case
an example of the part of you saving the files and loading the files. I'm assuming that all files are saved locally?
Hi @<1535793988726951936:profile|YummyElephant76> , did you use Task.add_requirements
?
None
Hi @<1582179661935284224:profile|AbruptJellyfish92> , connectivity issues should not affect training and should cache everything until connection is restored and everything should be sent to the server. Did you encounter a different behavior?
That's strange indeed. What if you right click one of the pipeline executions and click on run?
Hi BroadSeaturtle49 , what versions of clearml-agent
& clearml
are you using? What OS is this?
@<1590514584836378624:profile|AmiableSeaturtle81> , I would suggest opening a github feature request then 🙂
@<1523701132025663488:profile|SlimyElephant79> , it looks like you are right. I think it might be a bug. Could you open a GitHub issue to follow up on this?
As a workaround programmatically you can set Task.init(output_uri=True)
, this will make the experiment outputs all to be uploaded to whatever is defined as the files_server
in clearml.conf
.
Hi BoredBat47 , this happens only when you use the --foreground
flag?
Is this the full error? What version of clearml-agent
are you using? What OS are you on?
DepressedChimpanzee34 , I see. Regarding the things that are not currently implemented, please open a github issue so we can track this 🙂
Hi @<1523704207914307584:profile|ObedientToad56> , the virtual env is constructed using the detected packages when run locally. You can certainly override that. For example use Task.add_requirements
- None
There are also a few additional configurations in the agent section of clearml.conf
I would suggest going over